Making Big Data Visualization More Accessible

A Multi-Institution Initiative

The Intel Science & Technology Center for Big Data is devoting a lot of time and people to data visualization. We want to make data visualization more interactive.

The process of exploring large datasets is inherently interactive; the user reacts to aspects of the data, which in turn help determine future queries.  However, just because a Ph.D. in a computer science lab is able to do visualization doesn’t mean that other scientists or professionals will be able to do it easily. For example, earth scientists, genomists, financial analysts and accountants could all benefit from interactive visualization of big data sources.

We’re designing visualizations and interfaces that allow users to interact with massive data sets, on displays ranging from phones to video walls.  We assume that there is a DBMS behind such a visualization program.  Moreover, when the visualization system runs a query, it may get back a fire hose of data that it was not expecting.  Hence, visualizations have to be made scalable to large amounts of data.  As well, we have to find ways to speed up visualization systems through prefetching and caching.

Our visualization effort is a multi-institution initiative. It involves professors and students at Brown University, MIT CSAIL, Stanford University (the Stanford Visualization Group), the University of California, Santa Barbara (the Bren School of Environmental Science & Management) and the University of Washington.

Below are brief descriptions of some current projects:

imMens:  Scalable Visual Summaries

As datasets expand in size, they challenge traditional methods of interactive visual analysis, forcing data analysts and enthusiasts to spend more time on “data munging” and less time on analysis. Or to abandon certain analyses altogether. Researchers at Stanford are developing imMens, a system that enables real-time interaction of billion+ element databases by using scalable visual summaries. Read more

ScalaR:  A Web-Based, Map-Style Interface

Many visualization interfaces assume that much of the data resides in memory, but with larger datasets that assumption often doesn’t hold. However, interactivity requires low-latency access. And some of these datasets can’t be meaningfully summarized (i.e., imMens wouldn’t apply to them). To meet these challenges, we have implemented ScalaR, which provides a web-based, map-style interface. Read more

Query Steering:  Improved Query Performance

Query Steering is a set of techniques for building and leveraging user profiles (i.e., models of user interests, goals, and database interaction patterns) to improve query performance and offer customized data navigation and visualization support to users. A long-term goal of this project is to build a “data navigation system” that would guide non-expert users. Read more

TupleWare:  Both Expressiveness and Data Scalability

Today users are forced to make a choice between expressiveness (e.g., R, MATLAB, Python) and data scalability (e.g., Hadoop). With TupleWare, a new data processing system for complex interactive analytics and visualization, we aim to eliminate this artificially enforced choice and make it easy for users to incorporate big data processing primitives within their preferred computing environment. Read more

MapD:  Large-Scale Parallelism on Inexpensive Hardware

Massively Parallel Database (MapD), a new approach to querying and visualizing big data, meets the need for real-time query, visualization and analysis of massive data sets by achieving large-scale parallelism on inexpensive commodity hardware. We’re developing the solution to run on Intel Many Core processors (such as Xeon Phi) and commodity Graphics Processing Units (GPUs) instead of traditional CPUs. Read more

Unleashing NASA MODIS Data for Earth and Ocean Scientists

Terabytes of data streamed from the National Aeronautics and Space Administration (NASA) Moderate Resolution Imaging Spectroradiometer (MODIS) provide a crucial resource to earth scientists; for example, scientists monitoring mountain snow cover to manage water supplies. They can determine snow cover by comparing the color of each satellite-image pixel to thousands of landscape reference colors. Simple to describe but computationally intensive to execute.

And they have access only to samples of MODIS data. For better precision, they need direct access to the massive raw data. A first step in this direction was the EarthDB system, which can load low-level MODIS land data directly into the array database SciDB. The image below was generated from query results from EarthDB; far better images are possible with EarthDB.

Visualization of the Normalized Difference Vegetation Index (NDVI) over southern California and Mexico, with NDVI values calculated using EarthDB (from the EarthDB paper). Source: http://dl.acm.org/citation.cfm?id=2447483

The biggest challenge in extending the EarthDB concept, and our current focus, is re-implementing domain-specific variables necessary for research analysis. Read more

 

 

 

This entry was posted in Analytics, Big Data Applications, DBMS, ISTC for Big Data Blog, Visualizing Big Data and tagged , , , , , , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *


6 + = ten