ISTC for Big Data: 2015 and Beyond


By Sam Madden, ISTC for Big Data Co-Director

Happy New Year, ISTC for Big Data blog readers!

It’s been a busy year here at the ISTC for Big Data.  In this post, I call out a few of our highlights for 2014 and what you can expect to see from us in the future. (See our list of research papers here for a more complete picture.)

1.  In August, we had our second annual meeting, which was a great way to bring together our team and reflect on what’s been working and what hasn’t.  It was particularly fun to see all the collaborations that have launched: We’ve got about 10 teams inside of Intel using our technology in various ways, including groups working on graphs, array databases, new programming languages like Julia, and more.

We also proposed a vision, called BigDawg, for a unified data processing system that combines streaming, analytics and in-memory transactions. (BigDawg will be a topic of future blog posts, but it’s related to the language effort described here: Towards a Common Programming Model for Big Data.)  Although we’ve tweaked the focus and make-up of the center a bit, everyone left the meeting feeling pleased with how much we’ve accomplished in the last two years and optimistic about the coming three in our five-year program.

2.  As a part of our effort leading up to the annual meeting, we started thinking seriously about the long-term impact of the ISTC.  We all want to work on applications that matter to the world and that are a good fit for the technology we’re developing.

To that end, we decided to devote some of our resources to investigating the challenges and opportunities in using medical data.  Big data has the potential to transform medicine, by allowing us to develop better predictive models for both acute and chronic disease, discover unknown drug interactions, use sensors to improve outpatient care, and many other applications. We are particularly interested in new medical data applications that involve a large variety of data types, including array-oriented signal data, tables of lab reports, and textual data such as doctors’ and nurses’ notes (see Medical Data and the Learning Healthcare System).  Building systems that can handle all of these types of data at scale requires combining many different technologies under development by various groups on the ISTC team. To further explore the challenges in this area, we are organizing a free, one day workshop in Cambridge, Mass., on January 13.

We’ll continue building the biggest, baddest, fastest Big Data systems on the planet, while working to inform Intel about what their new hardware should do and working with them to optimize our software systems for their platforms.

3. As a result of our focus on combined programming models and a unified application area, we’ve started several very tight collaborations among different research groups.  For example, Jack Dongarra’s group at University of Tennessee, with our teams at MIT and Intel, are now working very closely to add support for sparse linear algebra to our array database offerings. Teams from MIT, Brown and University of Washington are collaborating closely to combine our work on data visualization to automatically synthesize visualizations and couple underlying data processing systems with visualization engines to improve interactivity and scalability of in-browser visualization.

4. We launched a number of other cool new research projects, including DataHub (Beyond Data Lakes: The DataHub), Tupleware (Tupleware: An Inside Look), and Vertexica (Graph Analytics: The New Use Case for Relational Databases). What all these projects have in common is that they are working toward our vision of building novel software systems designed to process a variety of data, and then figuring out how to build optimized implementations of them that take advantage of Intel hardware.

In 2015,  you can expect to see these trends continue.  I’m optimistic that our efforts to build a unified platform and query language for multiple types of data will significantly reduce the complexity that developers face today when choosing a data storage platform. Our focus as a group on medical data processing will result in tighter, more cohesive collaborations on a problem of broad societal importance. And of course, we’ll continue building the biggest, baddest, fastest Big Data systems on the planet, while working to inform Intel about what their new hardware should do and working with them to optimize our software systems for their platforms.


This entry was posted in Analytics, Big Data Applications, Big Data Architecture, Data Management, Databases and Analytics, DBMS, Graph Computation, ISTC for Big Data Blog, Math and Algorithms, Streaming Big Data, Visualizing Big Data and tagged , , , , , , , , , , , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *

one + = 3