Enabling Expert Data Visualization Interfaces by Non-Visualization Professionals

By Ted Benson, MIT CSAIL

An important problem in the era of data-driven science and business is that skilled professionals like geneticists or economists cannot create and publish high-quality interactive data visualizations without technical assistance.  Even simple tasks, such as publishing a dataset for interactive browsing, often require knowledge of computer programming.

The usability of creating and publishing high-quality data exploration interfaces is particularly important for big data research. As data sets grow too large to inspect manually, these interfaces are necessary to extract meaning from the data. By making these interfaces easy to construct, big data workers can extract and publish insights themselves without having to hire a programmer. This also shortens the feedback loop, enabling faster iterations of data analysis.

However, while programmers and visualization professionals create the kinds of data-exploring artifacts we have grown to expect, end-users, even scientific ones, are largely relegated to rich text and static graphic generation.  A recipe blogger, for example, is unlikely to have a blog with the interactive sophistication of Epicurious.com – a professional recipe site with faceted browsing, categorization, and search functionality. Likewise, a cell biologist is unlikely to be able to easily produce a hand-made interactive data visualization to explore and publish her data. For that, she needs to hire or collaborate with a visualization professional.

It is tempting to assume that end-users simply lack the sophistication to produce professional-quality visualizations. Specialization, after all, is the reason such high-quality visualizations exist in the first place. But our experiences working with end-user visualization on the web suggest a different cause: that end-users are quite sophisticated in their intent, but they are held back by the high learning cost of current tools.

Our experiences working with end-user visualization on the web suggest a different cause: that end-users are quite sophisticated in their intent, but they are held back by the high learning cost of current tools.

If you believe this claim, then figuring out how to improve our tools is a clear first step to enabling scientists and creative professionals to build high-quality data browsing interfaces themselves without having to hire help. And that’s where our recently published study comes in.

Each year, the human-computer interaction (HCI) community gathers at the CHI conference to discuss ways in which the design of human-computer systems affects the way in which we use them. I just finished presenting ISTC-sponsored work at this year’s gathering in Toronto, Canada about how people are authoring and publishing data visualizations online, and how we can translate this knowledge into better authoring tools.

We performed a mixed-method ethnographic study of the community of users who build websites with the Exhibit framework. Exhibit is a toolkit published at the World Wide Web conference seven years ago, and since that time it has been used to create thousands of interactive visualizations published across the web.

Exhibit’s original paper was based on the observation that end users would be greatly helped if HTML had built-in tags for common visualization elements, like <map>, <timeline>, and <scatterplot>, and also interactivity components like <searchbox> and <facet>. Studying Exhibit seven years later enables us to observe the full pipeline of data visualization as well as revisit Exhibit’s claims with seven years of hindsight.

Exhibit is a toolkit published at the World Wide Web conference seven years ago, and since that time it has been used to create thousands of interactive visualizations published across the web.

We scraped 1,900 Exhibit-based visualizations from the web, hand-coded the most popular 100 (based on server logs), and then further worked with the authors of 56 Exhibits to to collect 266k distinct data points from web users interacting with these published visualizations online. Finally, we performed phone interviews with 12 authors about their experience. All together, this dataset represents the full lifecycle of visualization creation, including idea generation, data authoring, visualization creation, web publishing, and finally usage by others.

Our paper extracts lessons from this multi-method dataset for each of these stages, and we focus on those that generalize to all visualization frameworks. Among the higher-level results we find are that there is a strong need for non-tabular data editing in spreadsheet software, that content management systems lack a few key features that visualizations authors need, and that aspects of Exhibit’s programming model could be used by other frameworks to facilitate easy reuse and adoption by non-professionals.

For more on Exhibit, read the original paper, our analysis of the framework as presented at CHI , or the slides from that presentation. Or, or try the tool for yourself at http://www.simile-widgets.org.

Ted Benson is a Ph.D. student at MIT CSAIL advised by David Karger. His research is driven by a desire to see better use of structured data on the web.

 

 

This entry was posted in ISTC for Big Data Blog, Tools for Big Data, Visualizing Big Data and tagged , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *


eight − = 5