New York Times: Because of the ever-increasing amounts of data being generated by the Web, smartphones, and other technologies, data scientists are having to wrangle with the vast output to pare it down and organize it into a usable format. “You spend a lot of your time being a data janitor, before you can get to the cool, sexy things that got you into the field in the first place,” said Matt Mohebbi, a data scientist and cofounder of Iodine, a new health startup. Several companies are writing computer software to automate the data-wrangling process. Among other challenges, the programs must be able to merge many different data formats. In much the same way that spreadsheets revolutionized data analysis in business and finance, machine-learning technology could help free data scientists from the more mundane sorting tasks so they can concentrate on the bigger picture.
Skip Nav Destination
© 2014 American Institute of Physics

Data science tackles massive digital output Free
18 August 2014
DOI:https://doi.org/10.1063/PT.5.028180
Content License:FreeView
EISSN:1945-0699
Q&A: Tam O’Shaughnessy honors Sally Ride’s courage and character
Jenessa Duncombe
Ballooning in Albuquerque: What’s so special?
Michael Anand
Comments on early space controversies
W. David Cummings; Louis J. Lanzerotti