“Big data” and “data science” may be some of the bigger buzzwords this decade, but they aren’t necessarily new concepts. The idea of data science spans many different fields, and has been slowly making its way into the mainstream for over fifty years. In fact, many considered last year the fiftieth anniversary of its official introduction. While many proponents have taken up the stick, made new assertions and challenges, there are a few names and dates you need know.
1962. John Tukey writes “The Future of Data Analysis.” Published in The Annals of Mathematical Statistics, a major venue for statistical research, he brought the relationship between statistics and analysis into question. One famous quote has since struck a chord with modern data lovers:
“For a long time I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt…I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.”
1974. After Tukey, there is another important name that any data enthusiast should know: Peter Naur. He published the Concise Survey of Computer Methods, which surveyed data processing methods across a wide variety of applications. More importantly, the very term “data science” is used repeatedly. Naur offers his own definition of the term: “The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.” It would take some time for the ideas to really catch on, but the general push toward data science started to pop up more and more often after his paper.
1977. The International Association for Statistical Computing (IASC) was founded. Their mission was to “link traditional statistical methodology, modern computer technology, and the knowledge of domain experts in order to convert data into information and knowledge.” In this year, Tukey also published a second major work: “Exploratory Data Analysis.” Here, he argues that emphasis should be placed on using data to suggest hypotheses for testing, and that exploratory data analysis should work side-by-side with confirmatory data analysis. In 1989, the first Knowledge Discovery in Databases (KDD) workshop was organized, which would become the annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).
In 1994 the early forms of modern marketing began to appear. One example comes from the Business Week cover story “Database Marketing.” Here, readers get the news that companies are gathering all kinds of data in order to start new marketing campaigns. While companies had yet to figure out what to do with all of the data, the ominous line that “still, many companies believe they have no choice but to brave the database-marketing frontier” marked the beginning of an era.
In 1996, the term “data science” appeared for the first time at the International Federation of Classification Societies in Japan. The topic? “Data science, classification, and related methods.” The next year, in 1997, C.F.