Research Updates from the ISTC for Big Data

In August, researchers from Intel and participating institutions gathered at the Intel Science and Technology Center for Big Data’s annual Research Retreat at Intel’s Jones Farm campus in Hillsboro, Oregon to present their latest work and describe their progress. Here are two summaries.*

Intel/Parallel Computing Lab Big Data Project

Tim Mattson of Intel presented research on Big Data conducted at the Intel Parallel Computing Lab (PCL). This research has three axes: (1) workloads and benchmarks, (2) algorithms, and (3) programming systems and run-times. Specifically, PCL performs data-driven research, focusing on particular use cases and consulting domain experts. Tim emphasized the importance of various graph benchmarks PCL uses, such as GraphLab and Galois.

He described extensions of the DBSCAN clustering algorithm and focused on the representation of graphs with matrices and the usage of linear algebra libraries to solve big graph problems. A concluding remark stressed the need for merging the big data and high performance computing societies, in order to lead to better software and hardware products for Intel.

Nonvolatile Memory (NVM)

Donning his elegantly crafted Jim Gray Doctoral Dissertation Award Championship Belt, Andy Pavlo of Carnegie Mellon University described his current research, which focuses on the evaluation of storage and recovery mechanisms for in-memory DBMSs in the context of nonvolatile memory (NVM). In order to facilitate this investigation, his team began with a bare-bones, modular system architecture designed to allow plug-and-play interchangeability of different storage and recovery models. Andy outlined three alternatives: (1) in-place updates, (2) copy-on-write updates, and (3) log-based updates. They benchmarked a standard implementation of each approach, as well as a pointer-oriented version specifically geared toward the characteristics of NVM.

Benchmarking demonstrated that the pointer-oriented versions achieved up to a 4x improvement in throughput compared to the standard implementations. Surprisingly, they discovered that the in-place update approach works best, offering the greatest speedups and near-instant recovery.

Andy plans to apply the lessons learned in this study to build N-Store, a hybrid OLTP/OLAP system designed from scratch specifically to leverage NVM. One quotation from Andy truly epitomizes the significance of his ongoing work: “NVM is good.”

*Contributors: Stavros Papadopoulos and Andrew Crotty

This entry was posted in Benchmarks, Big Data Architecture, DBMS, Graph Computation, ISTC for Big Data Blog, Storage and tagged , , , , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *

× 6 = forty eight