The ISTC for Big Data is focused on five major research themes:
Big Data Databases and Analytics
We are developing new software platforms for storing and processing massive amounts of data and for applying analytics beyond what conventional relational systems can do. We see a “sea change” happening as analysis moves from the simple SQL aggregation capabilities to much more complex routines to perform data clustering, predictive modeling, and complex statistics. Relational systems are not good at these linear algebra operations, because they are specified on arrays not tables. Therefore, we’re focused on building array-oriented DBMSes. In addition, we are investigating graph-based DBMSes for social-network-style analysis.
Big Data Math and Algorithms
We’re designing and implementing algorithms for linear algebra, signal processing, search, and machine learning that scale to tens or hundreds of machines and petabytes of data. To date, most algorithm work has focused on complexity of issues, assuming that the data for the algorithm is main-memory resident and runs on a single computing thread. Therefore, our focus here is on algorithm development for parallel execution and for data that does not necessarily fit in main memory.
Big Data Visualization
We’re designing visualizations and interfaces that allow users to interact with massive data sets, on displays ranging from phones to video walls. We assume that there is a DBMS behind such a visualization program. Moreover, when the visualization system runs a query, it may get back a fire hose of data that it was not expecting. Hence, visualizations have to be made scalable to large amounts of data. As well, we have to find ways to speed up visualization systems through prefetching and caching.
Big Data Architecture
We’re trying to understand how next-generation hardware innovations – such as many-core chips, non-volatile random-access memories, and reconfigurable hardware – affect the design of data processing systems. A significant fraction of computing cycles go to supporting Big Data. Hence, it is important to optimize computer architectures for this task. This extends to memory systems as well as specialized chips, such as collections of GPSes.
Streaming Big Data
We’re building data processing systems that facilitate rapid processing and ingest of data streams. Behind every Big Data problem is a “Big Velocity” problem that requires data ingest and data conditioning at high rates, including the abilities to aggregate data at high speeds and load it into database management systems.