Unleashing Big Data for Scientists

At the recent Intel Science and Technology Center for Big Data annual Research Retreat in Hillsboro, Oregon, researchers presented demos of two projects that will remove complexity and improve productivity for scientists using Big Data.

Data Management as a Service with Myria

Magda Balazinska of the University of Washington presented her team’s research into using Myria to improve productivity. Current big data systems require their users to run complex set-up procedures and then function as database administrators in addition to their roles as users. Myria is a database management system structured as a layered cloud service. MyriaQ handles the input of queries, compilation and optimization. MyriaQ submits those queries to an execution engine. The front end to Myria is displayed to the user as a web interface, and additional web services can be built on Myria’s REST interface.

For instance, for astronomy N-body simulations, a service called MyMergerTree was created through which astronomers can interactively query and visualize their data using the Myria system.

 

MyMergerTree is a service for astronomers that lets them interactively query and visualize their data through the Myria system. (Source: University of Washington.)

To address the difficulties of configuring and predicting the costs of a cloud service provider, the team has developed a system to generate Personalized Service Level Agreements (PSLAs) for its users. Given a dataset, Myria can generate a simplified selection of PSLAs from which the user can choose.

Making MODIS Data More Malleable

James Frew of the University of California, Santa Barbara, presented and demonstrated his team’s use of SciDB to avoid a common problem in map projections.

The analysis of geographical surveys, such as the NASA Moderate Resolution Imaging Spectroradiometer (MODIS) survey, requires interpreting gridded sensor data into a coordinate system. Because the coordinate system is defined over a curved surface, the sensor data follows scan geometry. As the scan angle from the sensor increases, pixels stretch along the scan angle and overlap across the scan angle. Without properly accounting for this geometry, naively projected maps will display image artifacts.

To avoid these complications, many science teams use images preprocessed by NASA with the projection already incorporated. But these canned projections may create highly skewed images in the region of interest. Producing even coordinate transformations over the region of interest is an obstacle to data analysis.

By using SciDB to store coordinates along with the observed data, science teams can apply re-sampling functions over the survey to better suit a particular use case.

Professor Frew mentioned that future work will involve adapting a coordinate transformation directly into SciDB to automatically populate and update projected arrays.

Ryan W. Maas, a graduate student in computer science and engineering at the University of Washington, contributed to this post. 

Learn more:

Myria Project Site

“Myria: Making Big Strides in Big Data as a Service,” ISTC for Big Data Blog, July 31, 2014

“Unleashing NASA Modis Data for Earth and Ocean Scientists, ISTC for Big Data Blog,  September 4, 2013

“Improving Query Speeds on Vital Industry Data Sets,” ISTC for Big Data Blog, November 21, 2013

This entry was posted in Big Data Applications, Data Management, ISTC for Big Data Blog, Tools for Big Data, Visualizing Big Data and tagged , , , , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *


5 × = fifteen