MapD: A Way to Map Big Data Faster

How a Fortuitous “Hack” Led to a Database Breakthrough

“Necessity is the mother of invention” may be an old saying, but it’s so often true – even today. Which is why creative ideas for new technologies are coming not just out of computer science labs, but also from determined and ingenious non-computer scientists who have hacked together technologies so they could get their work done.

Todd Mostak of MIT CSAIL

This was the case with the Massively Parallel Database (MapD), a new approach to querying and visualizing big data. MapD was created by Todd Mostak, then a Harvard graduate student in Middle Eastern Studies, back in 2012. Mostak is now a researcher at MIT CSAIL, and MapD is being further developed by Mostak and his colleagues at the Intel Science and Technology Center for Big Data at MIT CSAIL.

MapD is generating a lot of interest within the mapping community, for the innovative way it meets the need for real-time query, visualization and analysis of massive data sets, achieving large-scale parallelism on inexpensive commodity hardware. Here’s the ingenuity: Mostak and his team are developing their solution to run on Intel Many Core processors (such as Xeon Phi) and commodity Graphics Processing Units (GPUs) instead of traditional CPUs.

MapD processes spatial and geographic information system (GIS) data as well relational data in milliseconds, and performs up to 70 times faster than CPU-based solutions.  (Click here for a live demo of MapD.)

The Catalyst:  Mapping Egyptian Politics

Back in 2012, Mostak was working on his master’s thesis at Harvard’s Center for Middle Eastern Studies. As described in an article in DataInformed, he was “mapping tweets… on Egyptian politics during the Arab Spring uprising.”

He was frustrated because he needed to geolocate tens of millions of tweets in near-real-time, but the available solutions, including the popular MapReduce, could not give him the speed he needed.

Mostak was experiencing what a great many scientists have experienced – a desire for new computer architecture that can handle big data in a way that can meet their particular research needs. And, like many of these scientists, he had very little academic background in computer science.

So, taking a hacker-like approach, he created a new, off-beat and highly effective solution.

MapD Under the Hood

In his technical overview on MapD, Mostak describes MapD as “a vertically-integrated end-to-end solution for data querying, visualization and analysis. It uses the immense computational power of next generation parallel hardware like Intel Many Core processors and commodity graphics processing units (GPUs) as the backbone of a data processing and visualization engine that marries the data processing and querying features of a traditional RDBMS with advanced analytic and visualization features.”

MapD uses an SQL column store database, which relies on the massive parallel processing abilities of this new hardware for acceleration and scales to any number of machines. It generates maps in real time, rendering point and heat maps of query results in milliseconds. It includes a WMS web server that can serve out of the box as the backend for a web mapping client, allowing for querying and visualization of billions of features. It’s fast and cost-effective: for example, four commodity GPUs provide more than 12 Teraflops of compute power and nearly 1 TB/sec of memory bandwidth.

MapD is also modular, so it can work on hardware ranging from sub-US$1,000 commodity laptops and desktops all the way to High Performance Computing (HPC) clusters with hundreds or thousands of nodes.

MapD-generated point map, heat map, and time graph showing flu outbreak in December 2012 in the American South. (Courtesy of Todd Mostak, MIT CSAIL)

And…a New Career

Mostak’s path to MIT CSAIL researcher from undergraduate studying anthropology and economics was unconventional. But as reported in the DataInformed article, MIT CSAIL Professor and ISTC co-director Sam Madden said that when a talented person with an unusual background presents himself, it’s key to recognize what that person can accomplish, not what track he or she took to get there. “When you find somebody like that you’ve got to nurture them and give them what they need to be successful,” Madden said of Mostak. “He’s going to do good things for the world.”

You can follow Todd Mostak on Twitter, Google+ and LinkedIn, and follow the progress of MapD on its Google Group.

MapD-generated heat map showing the percentage of Tweets containing the word “tornado” from April 12, 2013 to May 7, 2013. (Courtesy of Todd Mostak, MIT CSAIL)

This entry was posted in Big Data Architecture, Computer Architecture, Databases and Analytics, DBMS, ISTC for Big Data Blog, Tools for Big Data, Visualizing Big Data and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*


6 × five =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>