Evolution might be considered to be an unusual word to describe the advancement of the Data Scientist. After all, evolution is defined as: ‘The way in which living things change and develop over millions of years’. I’m certainly not claiming that the Homoerectus could code. However, what we can clearly see, is that there is an evolution in the methods, process and technology used by a Data Scientist. Many would contest the true beginnings of statistical modelling, but fewer would argue what that evolutionary lifecycle looks like. From the early days of scratching numbers into papyrus, up to the modern day punching of numbers into a keyboard, Data Science has come a long way. The technology may have changed, the methods may also have changed, but what hasn’t changed going as far back past the industrial revolutions of the 19th and 20th centuries, or past the Renaissance, as far back as the dawn of human kind, is that we’ve always sought to leverage mathematics and statistics to improve the world around us.
Data Science in the form we know it today has only been around since the new millennium, when statisticians who felt that they had unique sets of skills chose to separate themselves from traditional mathematicians and computer scientists. Data Science in its purest form started out as statistics in 800 A.D, when Iraqi mathematician Al Kindi used his own method of statistical analysis for cryptography, also known as code breaking. His work is credited as the first recognized example of frequency analysis, and led the way for other thinkers. During the 1300s, Florentine banker Giovanni Villani used his extensive records and knowledge of Florence, including population, geography, trade, education, to build a comprehensive guide of the city, which has since been described as the first use of statistics for philanthropic ends. In the 17th century, John Graunt and William Petty created the first life table after studying the population of London. Using only the rates of mortality of London as a marker, Graunt and Petty were able to calculate that the population of London was somewhere around 384,000 people, and that the average family size in London during the 17th Century was 8. These are extraordinarily accurate figures, as despite there being a census in place, there was fluid mobility of groups in and out of the major cities almost every day, with many residents not having one fixed abode.
In the 20th century, statistics became a recognized and prominent field, being used to help quantify the increasingly diverse societies of the 1900s. Some of this work was led by Karl Pearson and Francis Galton, two revered mathematicians who studied societal diversity in terms of height, weight, race, hair colour and more. Galton contributed his knowledge of deviation, correlation and regression analysis, while Pearson pioneered the ‘Pearson product-moment correlation coefficient’ and the ‘Pearson distribution’, which became key in helping to measure a degree of linear dependency. This research was continued by Ronald Fisher, who was credited with writing the textbooks that defined the academic discipline of statistics. His most famous work, the 1918 paper, ‘The Correlation between Relatives on the Supposition of Mendelian Inheritance’ became one of the cornerstones of statistical academic research at universities all over the world.;