Scaling Interactive Visualization Techniques to High-Volume Data

By Ted Benson, MIT CSAIL

Jeff Heer of the University of Washington and his group, the UW Interactive Data Lab, are continually enhancing “people’s ability to understand and communicate data through the design of new interactive systems for data visualization and analysis.”

At the recent Intel Science and Technology Center for Big Data Research Retreat, Jeff discussed the scaling of interactive visualization techniques to high-volume data, a topic his group is currently studying.

An example of the group’s recent work is a system called imMens, which provides binning, aggregation, and fast interactive brushing and linking over spatial data. But imMens relies on significant pre-computation in advance of deploying a new interactive user-interface.

One important question to ask: “Is this pre-computation worth what it costs? For example, what if latency isn’t so bad?” Literature on the question provides conflicting evidence: in some cases, high latency encourages outcome-damaging shortcuts, but in other cases it causes users to plan more, thereby improving outcomes. A user study of imMens suggests that for interactive data visualizations, low latency is an attribute worth working toward.

Data visualization people were asked to use imMens to understand data sets about mobile check-ins and FAA flight delays.  When latency was artificially added to imMens, users exploring datasets were exposed to less coverage of the dataset, performed fewer brush-and-link operations (which is a key method for understanding multivariate data), and generated fewer hypotheses and generalizations about the data and explored the data less.

The study also revealed ways scarce resources might be allocated for low-latency UIs. Usage of operations like map panning that lend themselves to incremental, asynchronous result refinement had low sensitivity to latency, while all-or-nothing operations like brushing took big hits once latency increased.

Another important question to ask is: “How can we provide interactive visualizations without any setup cost?” To answer this question, the group has been building tools that attach to relational databases and provide interfaces to support high-level data understanding and bug-fixing. This work incorporates automatically generated visualizations to support several ways people try to understand their data.

Some diagnostic activities involve examining a database field-by-field, such as looking for missing ranges or odd spikes in a distribution. Others involve hunting for sets of tuples that are at odds with their peers in some way. And still others involve comparing multiple fields to each other, such as searching for products that appear (in the database) to have shipped before they were ordered.

The end-goal is to develop a drop-in tool that helps users understand and diagnose relational data in the same way that existing drop-in tools help them browse and search it.

Jeff’s group intends to continue the thread of moving from batch pre-processing to interactive database support. They are also exploring automation topics, such as visualizing design decisions based on what we know about human perception.

 

This entry was posted in ISTC for Big Data Blog, Tools for Big Data, Visualizing Big Data and tagged , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *


four + 4 =