ISTC for Big Data principal Investigators, researchers, and their students will present a number of papers at the 2017 International Conference on Very Large Databases (VLDB 2017) in Munich, Germany, August 28-September 1, 2017. (To read any of the papers, click on the links provided below.)
Some highlights:
- Professor Andy Pavlo and his students at Carnegie Mellon are working to make sure database management systems (DBMSs) can take advantage of new and evolutionary hardware. PhD candidate Joy Arulraj will present the paper “Write-Behind Logging”: how to change a DBMS’s logging and recovery algorithms to use emerging non-volatile memory (NVM). Their new protocol enables the DBMS to recover nearly instantaneously from system failures and reduces the wear-down on NVM devices. This work is in collaboration with Intel Labs and is part of CMU’s extensive work on non-volatile memory databases. Read their blog post about Write-Behind Logging here.
- Intel’s Stavros Papadopoulos and colleagues present “The TileDB Array Data Storage Manager,” a novel storage system that captures cases where the data are naturally modeled as multi-dimensional arrays, which typically arise in scientific applications. TileDB has been used in genomics as the storage layer of GenomicsDB, which is maintained by the Intel Health and Life Sciences group and is used in the Broad Institute‘s GATK 4.0 software. Both TileDB and GenomicsDB are available in open source. Read more about TileDB in this blog post.
- Analytics require visual data exploration tools that can quickly gather and display insights from datasets at “human speed.” Techniques such as Approximate Query Processing (AQP) are increasingly important in meeting the interactivity guarantees promised by these tools—but even these techniques are straining as datasets get ever larger and more complex. Researchers in Brown University’s Data Management Research Group will present “Revisiting Reuse for Approximate Query Processing” by Alex Galakatos, Andrew Crotty, Emanuel Zgraggen, Carsten Binnig and ISTC PI Tim Kraska. They’ve developed a formulation for Approximate Query Processing that maximizes result reuse in order to improve interactivity for visual data exploration tasks. Their AQP formulation can provide low-error approximate results at interactive speeds, even for queries over rare subpopulations.
Other work being presented from ISTC researchers and participating universities includes:
- “Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads.” Parmita Mehta, Sven Dorkenwald, Dongfang Zhao, Tomer Kaftan, Alvin Cheung, Magdalena Balazinska, Ariel Rokem, Andrew Connolly, Jacob Vanderplas, Yusra AlSayyad
- “The End of a Myth: Distributed Transactions Can Scale” by Erfan Zamanian, Carsten Binnig, Tim Kraska, Tim Harris. Read the blog post.
- “Probabilistic Database Summarization for Interactive Data Exploration.” Laurel Orr, Dan Suciu, Magdalena Balazinska
- “Exploring Big Volume Sensor Data with Vroom.” Oscar Moll, Samuel Madden, Michael Stonebraker, Vijay Gadepally, Aaron Zalewski (demonstration)
- “Clay: Fine-Grained Adaptive Partitioning for General Database Schemas.” Marco Serafini, Rebecca Taft, Aaron J. Elmore, Andrew Pavlo, Ashraf Aboulnaga, Michael Stonebraker
- “An Evaluation of Distributed Concurrency Control.” Rachael Harding, Dana Van Aken, Andrew Pavlo, Michael Stonebraker
- “SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints.” Dong Deng, Albert Kim, Samuel Madden, Michael Stonebraker
- “Price-Optimal Querying with Data APIs.” Prasang Upadhyaya, Magdalena Balazinska, Dan Suciu
- “Fast and Adaptive Indexing of Multi-Dimensional Observational Data.” Sheng Wang, David Maier, Beng Chin Ooi
- “BlueCache: A Scalable Distributed Flash-based Key-value Store.” Shuotao Xu, Sungjin Lee, Sang-Woo Jun, Ming Liu, Jamey Hicks, Arvind
- “SMCQL: Secure Query Processing for Private Data Networks.” Johes Bater, Greg Elliott, Craig Eggen, Satyender Goel, Abel Kho, Jennie Rogers