Making Big Data Management Easier

Today at the O’Reilly Strata Conference in Santa Clara, Calif., ISTC Researcher Magdalena Balazinska of the University of Washington is presenting a talk entitled, “Can We Make Big Data Management Easier?” This post summarizes her talk. It highlights her team’s current and ongoing work to simplify the management of Big Data by using cloud infrastructure and delivering Big Data management as a service.

Today’s Big Data management systems and services are increasingly fast, but they are not always easy to use. Fortunately, research into facilitating the management and processing of Big Data has produced several solutions that increase ease of use. For example:

With Personalized Service Level Agreements, a new type of SLAs for cloud services, users can forget about numbers of instances and bytes processed. Instead they simply upload their data and are shown a personalized menu of fixed hourly prices associated with different query performance and capability choices.  Read more here:

“A Vision for Personalized Service Level Agreements in the Cloud.” Jennifer Ortiz, Victor Teixeira de Almeida, and Magdalena Balazinska, Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA. ACM SIGMOD/PODS 2013, June 2013

Users writing SQL queries to explore data can use SnipSuggest, an autocompletion tool for SQL that provides context-aware assistance.

“SnipSuggest: Context-Aware Autocompletion for SQL.” Nodira Khoussainova, YongChul Kwon, Magdalena Balazinska, and Dan Suciu, Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA. VLDB 2011, August 2011

And when performance is different from what users expected, they can use PerfXplain, a tool for explaining the performance of MapReduce jobs running on a shared-nothing cluster.

“PerfXplain: Debugging MapReduce Job Performance.” Nodira Khoussainova, Magdalena Balazinska, and Dan Suciu, Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA. VLDB 2012, August 2012

These solutions are part of a larger initiative at the university. The university’s Database Group has developed a new engine for managing Big Data and offering that management as a service. The engine, called Myria, has been tested on 100-node Amazon EC2 deployments and on data from domain sciences including astronomy, as well as from standard benchmarks and from social media sources.

According to Professor Balazinska, she and her team built Myria for two main reasons: their discontent with the performance of Hadoop and their desire to build their own platform for Big Data Management research.  She said: “Our concrete goal is to build a ‘Big Data Management Service’ that meets the needs of today’s users, especially in domain sciences, and to understand what it takes to operate such a service in the Cloud in a manner that is cost-effective and intuitive for the provider and the users. We plan to deploy a permanent, public Myria service that will enable scientists to analyze and query big data in the browser without installing any software.”

More information about the Myria engine is available through the Myria project web site.  The Myria project is partially supported by the National Science Foundation through NSF grant IIS-1247469, gifts from EMC, and the Intel Science and Technology Center for Big Data.

 

 

This entry was posted in Big Data Architecture, Data Management, DBMS, ISTC for Big Data Blog and tagged , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *


five − 4 =