Wanted: Big Data Interfaces for End Users

As Big Data proliferates, our community unfortunately is not taking full advantage of the web to provide better information-management tools to help end users. We need to follow the philosophy behind the DataHub project:

The value of data is directly proportional to the degree to which it is accessible. As this data grows in size, in type, in dimensionality and in complexity, accessibility becomes of paramount importance.

Accessibility means a very low barrier of entry: allowing product designers, innovators and non-technical users to explore and navigate the store without requiring them to be familiar with Big Data tools or in-depth understanding of data schemas and information models.” (Boldface and underscoring in original.)

In other words: We have to design with the end user in mind.

In 2013, David R. Karger, Professor of Computer Science at CSAIL MIT, delivered a highly relevant three-part keynote address to the European Semantic Web Conference, on the topic of “A Semantic Web for End Users.” In that address, he complained that:

  • The current state of tools for end users to capture, communicate, and manage their information is terrible, and
  • The Semantic Web presents a key part of the answer to building better tools, but
  • Not enough work is being directed toward this problem by the community.

Don’t be distracted by the date. Professor Karger (an ISTC Principal Investigator) says that he still stands by everything he said in that keynote address. In other words, virtually no progress has been made toward solving the problem.

So, it would be time well spent to watch that keynote address (streaming video here), or to read Professor Karger’s adaptation of the address in three posts on the Haystack Blog. Below are summaries of the three posts.

The State of End User Information Management

In this post, Professor Karger declares that “the situation is dire” for end users. “Schema diversity” is a big problem: Traditional applications are designed with hard-coded schemas and interfaces. This design causes the users who want to use their own schemas or connect information from different schemas to settle for generic tools and spread their data over multiple tools. In short, the current state of tools for end users to capture, communicate, and manage their information is terrible.

Just how terrible is described in Voida et al.’s Homebrew Databases paper, which describes typical office work in volunteer-driven nonprofit organizations; office workers are forced into a baroque assemblage of Excel spreadsheets, Outlook lists, paper, index cards, and binders. They have terrible versioning problems, waste inordinate amounts of time on data entry and transfer, and struggle to organize, query, and visualize their information. Professor Karger says it is “a major embarrassment for all of us in databases (and the Semantic Web) that this is the current state of the art…. we’ve got our heads in the clouds while people are stuck in the dirt.”

How the Semantic Web Can Help End Users

In this post, Professor Karger discusses the promise of The Semantic Web. It can be a key part of building better tools and overcoming “schema diversity.” The Semantic Web had an early progenitor, Haystack, a tool that allowed the designer to create something that looked like a traditional application, over any schema. It was, in effect, a Semantic Desktop. Applications like Haystack can effectively present and manipulate information in any schema that their user encounters or creates, even if there is a different schema on each web site. The database community has not yet tackled this problem; it has only tackled the lesser problem of combining a few large, known, corporate databases, such as when two companies merge.

Professor Karger presents three flexible-schema Semantic Web applications: Related WorksheetsExhibit, and Datapress; plus Atomate, which allows end users to author automation rules to reduce their effort handling incoming social media and other information streams.

What’s Wrong with Semantic Web Research, and Some Ideas to Fix It

In this post, Professor Karger complains that research on end user applications is almost completely absent from Semantic Web conferences. Back in the early days, we convinced ourselves that a web of structured data would be useful. Now, we’re devoting all our energy to a hypothetical infrastructure for that web. But we have to do a much better job of demonstrating, to ourselves and to others, the more immediate benefits of the Semantic Web.

And we can only do this by showing how the Semantic Web can solve problems that end users have right now. If we fail to do that, someone else will solve those problems without using Semantic Web tools, and the Semantic Web will be left behind. In short, more of our research should start with the identification of a current, specific end-user problem.

The Take-Away

The Semantic Web reflects some of the key insights, on schema variability, that are critical to improving people’s ability to manage information. But we aren’t doing the work. We’re devoting far too much energy to studies of knowledge representation, reasoning, and information extraction that have traditionally appeared in artificial intelligence conferences, and perhaps should continue to do so. We build applications, but we call them demos and don’t evaluate them. Many of them aren’t really Semantic Web applications; they’re just traditional applications that happen to be storing their data in RDF.  We’re letting a great opportunity pass us by; we must think about ways to seize it.

(Professor Karger and his students are working directly on projects, including DataHub, aimed at helping end users manage their data more easily and effectively.)

Additional reading:


This entry was posted in ISTC for Big Data Blog, Tools for Big Data, Visualizing Big Data and tagged , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *

6 − five =