Computer scientists and clinical researchers gathered at MIT CSAIL January 13 to participate in a half-day workshop and brainstorm new data sets and software to improve data-driven diagnosis and treatment.
The idea of the workshop originally came out of a decision made last year by the ISTC for Big Data. As ISTC Co-director Sam Madden explained, “We all want to work on applications that matter to the world and that are a good fit for the technology we’re developing. To that end, we decided to devote some of our resources to investigating the challenges and opportunities in using medical data.”
The goals of the workshop were to exchange ideas, come up with potential capstone/graduate projects, and learn more about the state of the art. The workshop was a great success. About 40 participants, with a good mix of medical researchers and clinicians and computer science people, shared their ideas and views. Some medical people volunteered to collaborate with computer scientists to help guide and focus future research.
What We Learned
There were five major takeaways from the workshop:
Collaboration with Clinicians. Whatever we develop, we have to have clinicians in the loop the entire time. We were pleased that some clinicians already have volunteered to help us out with that.
Speaking One Language. Data scientists on the one hand and clinical researchers or clinical doctors on the other, need to have more interaction with each other. There are a lot of problems that clinicians face that can be solved by data science, and there’s a lot of unknown among data scientists about what’s going on. If those two groups can speak the same parlance, it will be a great step forward.
MIMIC: Popular But Not Alone. There is significant interest in the MIMIC II medical data set. It has been very popular, especially because it is so easy to access. However, there are many other data sets. As good as MIMIC is, it is not the be-all and end-all of medical data.
Make the API Universal. As we in the ISTC for Big Data plan and develop our API, our key goal is to provide an easy way for researchers to simultaneously access heterogeneous database technologies. Therefore we need to stay in touch and test the API against completely different data sets. As ISTC Co-director Mike Stonebraker is fond of reminding us, “One size does not fit all.”
Save the Users! Because there are so many databases and technologies, we want to help keep scientists, researchers, clinical faculty, and clinical scientists away from dealing with the muck of data science. They shouldn’t have to understand how different systems operate. We want to have data scientists who can speak medical, and medical practitioners who can speak data science.
Project Genres Identified
As we had hoped, we were able to come up with a few genres for new research projects.
- Automated preprocessing of information, including outlier detection and substitution. This idea resonated with quite a few people at the workshop. Outlier detection is a common problem, and sometimes it causes large errors.
- “Market Basket Analysis,” or being able to find cohorts of patients, such as people who took a certain cluster of medications.
- Visualization; for example, we need to build a MIMIC explorer, to allow someone to visualize the whole data set at one time.
We were hoping to see the group start some cooperation and networking, and it already has. For example, a couple of participants have already stepped forward and volunteered to help start and run a seminar series, and a doctor at the Veterans Administration wants to have a student crunch his data. That’s a great outcome: to help medical researchers and clinicians recognize that computer and data scientists are interested in the work that the former are doing.
Author Bio: Vijay Gadepally is currently a researcher at MIT’s Lincoln Laboratory and Computer Science and Artificial Intelligence Laboratory (CSAIL). He holds a M.Sc. and PhD in Electrical and Computer Engineering from Ohio State University and a B.Tech degree in Electrical Engineering from the Indian Institute of Technology, Kanpur. He is interested in the technical and social aspects of big data, data exploration, cyber security and autonomous (self-driving) vehicles and the role of technology in society.
“Medical Data and the Learning Health Care System,” ISTC for Big Data Blog, November 12, 2014
“Building a New Application-to-Hardware Management Stack for Big Data,” ISTC for Big Data Blog, October 22, 2014
“Towards a Common Programming Model for Big Data,” ISTC for Big Data blog, August 20, 2014
“Unfolding Physiological State and the Big Data Variety Challenge,” ISTC for Big Data Blog, April 2, 2014