Data Janitors & Data Carpentry.

Data Janitors and Data Carpentry: value in the nitty gritty?

"Report on the investigation of engineer and janitor service, Board of education, city of Chicago" (1913)

The fantastic machine learning-oriented podcast, Talking Machines, had an interview with computer scientist, David Mimno (also: his course syllabus on text mining for historians is awesome.). They spent some time discussing a recent essay by Mimno, that riffed off the New York Times "data janitorial" piece, arguing that data wrangling, data munging, or data janitorial work is not trivial grunt labor, but rather integral to the craft of research--especially in fields utilizing machine learning, etc.. Particularly, the intensive process of creating usable data sets is much less janitorial work and more akin to carpentry (a term already rolling through the data science lexicon):

From "Data Carpentry,"
Every data set has its idiosyncrasies. You can streamline the process, but you can’t avoid it. To draw out the analogy a bit more: sure, there’s Ikea, but the best furniture is still made by Amish carpenters.

More broadly, on Talking Machines Mimno argues that knowing intimate minutiae of data---in the same manner than humanists know an obscure corpus of work or the finalities of administrative Dutch---has broader benefits to scholars. This intimacy can inform the questions we ask.

I couldn't help thinking this view has application in economics, especially since I have been reading work from the great Zvi Griliches, who was a champion of the insights to be gleaned from the process of data collection. His emphasis on the value of data in economics is reflected in his 1994 presidential address to the American Economic Association:

"We ourselves do not put enough emphasis on the value of data and data collection in our training of graduate students and in the reward structure of our profession." --Zvi Griliches

With that in mind, it's hard to imagine many of Griliches insights, the least of which his work on productivity and technological adoption, without a granular appreciation weighing through the muck of data. You certainly get a sense of this from his interview with Alan Krueger.