Gray's Laws Overview:
1. Scientific computing is becoming increasingly data intensive.
Data analysis is severely limited by the relatively low I/O performance or I/O bottleneck. This problem can be solved by dividing the data into multiple parts which can be solved without experiencing I/O bottleneck.
2. The solution is in a “scale-out” architecture.
Network/interconnect speeds are not growing fast enough to cope with the yearly doubling of the necessary storage and as a result, the "scale-out" architecture model aims to solve this by creating locally partitioned nodes which have their own CPU and networking. These are known as "CyberBricks". They perform better, are simpler in architecture and more cost effective.
3. Bring computations to the data, rather than data to the computations.
Transferring data to perform computations is the traditional approach to data analysis, however, it is ineffective and due to the large-scale size of the data. Therefore it is preferred to compute the data near the storage of the data itself; bring the computations to the data.
4. Start the design with the “20 queries.”
Focusing on the 20 most important questions asked by analytical scientists in regards to data systems is important for the reason of efficacy. This is because it allows for database builders to create a more accessible data system for the analyst to use. Therefore, answering and understanding the 20 questions or "queries" makes the system more user friendly so to speak.
5. Go from “working to working.”
Since data driven systems are always changing, it is important to allow for a means of expansion and upgrade. This is done by using modular systems which allow for certain parts and units to be replaced and/or removed. The modular system helps prevent rebuilding a given data system from the ground up because the system is becoming more aged and eventually obsolete.