Big Data describes a new era in the digital age in which the volume, velocity and variety of data created across a wide range of fields – from Internet search and social media to finance and healthcare to defense and the sciences – are increasing at a rate well beyond our ability to analyze the data. Tools such as spreadsheets, databases, matrices and graphs have been developed to address these challenges. The common theme among these tools is the need to store and operate on data as whole sets instead of as individual data elements.
Fortunately, these diverse data sets also share common mathematical foundations that apply across a wide range of applications and technologies, according to ISTC Researcher and mathematician Jeremy Kepner of MIT Lincoln Laboratory and MIT CSAIL, creator of D4M .
Common mathematics unify and simplify data, leading to rapid solutions to volume, velocity and variety problems, says Dr. Kepner. By understanding the common mathematical foundations of data, one can see past the differences that lie on the surface of these tools and leverage datasets’ mathematical similarities to solve the hardest data challenges, he notes.
Specifically, understanding the mathematics…
- reduces the effort required to pass data between steps in a data processing system
- allows steps to be interchanged with full confidence that the results will be unchanged, and
- makes it possible to recognize when steps can be simplified or eliminated.
For example, recognizing mathematical similarities can be used to provide common interfaces to data that are independent of how the data are stored. The functions can be used to manipulate data whether stored in files or in a variety of different databases.
More information on this work can be found in this paper, presented earlier this year at New England Database Day. Dr. Kepner will offer a tutorial on this topic at the IEEE-HPEC 2015 Conference in September that provides a deep-dive introduction to these concepts, which will also be developed in his forthcoming book on the topic.