At the upcoming Very Large Data Bases 2015 Conference in Hawaii, the Intel Science and Technology Center for Big Data will unveil its first big “capstone” project: a federated reference architecture that enables query processing over multiple databases, where each of the underlying storage engines may have a distinct data model.
The project, the first from the ISTC’s Big Data Analytics Working Group (BigDAWG), was created to tackle “the challenges associated with building federated databases over multiple data models, specialized storage engines, and visualizations for Big Data.” The BigDAWG team will demonstrate its reference architecture on a use case based on MIMIC II (short for “Multiparameter Intelligent Monitoring in Intensive Care II”), a publicly accessible intensive-care unit (ICU) data set that covers about 26,000 ICU admissions at Boston’s Beth Israel Deaconess Hospital.
The group calls its reference architecture a “polystore” to distinguish it from previous federation efforts that used only the relational model. In a recent post in the ACM SIGMOD blog, ISTC co-founder Michael Stonebraker explained why previous attempts at federation – middleware that runs on top of (perhaps several) local DBMSs and presents a seamless interface to disparate systems with (perhaps) independently constructed DBMS schemas – failed…. And why a related construct, polystores, looks poised to have its “day in the sun.”
The reference architecture will be released under an MIT open source license so others worldwide can experiment with it.
Although the ISTC for Big Data was launched in May 2012, the BigDAWG project has its roots in years of previous independent and collaborative research.
In 2005, database researchers Michael Stonebraker of MIT CSAIL and Uğur Çetintemel of Brown University published their seminal research paper “ ‘One Size Fits All’: An Idea Whose Time has Come and Gone.” Stonebraker and Çetintemel (now an ISTC Principal Investigator), argued that the relational database, originally designed for business data processing, was reaching its limits as the go-to engine for diverging data applications and analytics. The authors made a convincing case for the emergence of new, purpose-built database engines for stream-processing, data warehousing, scientific data processing, and other applications.
Fast-forward to 2015. The paper has won the 2015 ICDE 10-Year Most Influential Paper Award. Professor Stonebraker has been awarded the 2014 ACM Turing Award for his pioneering contributions to database research and to facilitating adoption of technological advancements, including founding nine startup companies to commercialize new database technologies. Purpose-built database engines are a fact of life for many organizations, who are now seeking ways to better capitalize on Big Data Variety with analytics and applications that span multi-variate data.
The BigDAWG polystore reference architecture includes contributions from researchers from a half-dozen institutions, including:
Aaron Elmore, University of Chicago
Jennie Duggan, Northwestern University
Michael Stonebraker, MIT CSAIL
Magdalena Balazinska, University of Washington
Uğur Çetintemel, Brown University
Vijay Gadepally, MIT Lincoln Laboratory
Jeffrey Heer, University of Washington (Interactive Data Lab)
Bill Howe, University of Washington (UW eScience Institute)
Jeremy Kepner, MIT Lincoln Laboratory
Tim Kraska, Brown University
Sam Madden, MIT CSAIL
David Maier, Portland State University
Timothy Mattson, Intel
Stavros Papadopoulos, Intel/MIT
Jeff Parkhurst, Intel
Nesime Tatbul, Intel/MIT
Manasi Vartak, MIT
Stan Zdonik, Brown University
Their paper, “A Demonstration of the BigDAWG Polystore System,” will be presented at VLDB 2015 on Wednesday, September 2, at 10:30 AM (Track: Demo 3 – Systems, User Interfaces +Visualization); Location: Kona 4; and again on Thursday, Sept 3, at 15:30 (Track: Demo 3 – Systems, User Interfaces +Visualization) in Kona 4.
ACM SIGMOD blog post, July 13, 2015: “The Case for Polystores”
ISTC for Big Data blog post, October 24, 2014: “Building a New Application-to-Hardware Management Stack for Big Data”