From the moment Hans Rosling entertained us with his energetic TEDTalk of 2006, breathlessly commentating on the elegant motion of a screen full of bubbles, the interest in and awareness of visualisation began to reach a mainstream audience.
‘The best stats you’ve ever seen” is the tag line associated with this famous talk, one that has now been viewed over 8 million times.
Aside from Rosling’s entertaining oratory, the success of this data presentation comes from the power of seeing the graphical portrayal of global health and population data, observing the patterns and stories that unfold in front of us. The key word here is ‘seeing’.
If statistics can be said to describe and quantify the characteristics of data, visualisation is what enables us to actually see the data. In harmony, they give us the most thorough understanding of data.
“Visualization may not be as precise as statistics, but it provides a unique view onto data that can make it much easier to discover interesting structures than numerical methods.”
The person widely attributed as being the father of visual methods is John W Tukey, the prominent statistician who pioneered Exploratory Data Analysis. He championed techniques for visually exploring data to unearth discoveries that are otherwise indiscernible in the original data form or potentially masked by the aggregating nature of some statistical treatments.
One of Tukey’s most enduring visual devices is the ‘Box Plot’ (or ‘Box and Whiskers Plot’) used to graphically depict the classic five-number summary of minimum value, lower quartile, median value, upper quartile and maximum value. The Box Plot packs a lot of statistical information into a single graphic device and helps us to see the range of values as well as get a sense of the distribution (the degree of dispersion, clustering and skew) of these values.
Of course, depending on the subject matter and the analysis being undertaken we may wish to explore the statistical attributes of our data in different ways, to try and see it from new perspectives. Thankfully, there is a broad repertoire of graphical approaches that can help us to familiarise with and discover new insights from our data.
Here are some of the most relevant and useful ways to help see your statistics:
The histogram shows the distribution of data, presenting frequency counts across a range of categorical values or intervals. In the example below we see analysis of the total appearances made by footballers during a given season. The height of the bar indicates the number of players who managed each group of appearances numbers. Whilst in this data set the average value (25.9) and the median value (26) are very close, the shape of the histogram would help to show the potential degree of skewness in your data.
This approach facilitates the comparison of two distributions. In the example below from the Office for National Statistics we see the shape of the population for England and Wales as at 2011, with the length of the bars indicating the population size by age.
The simplest view of data range is to show the minimum and maximum readings for different categorical variables. When you want to see the spread or tightness of a set of values – and don’t require all the dimensions of a box plot – the floating bar chart can be a useful approach. In the example below we see a breakdown of range of high and low temperatures by month, in this case for Rome. This particular example focuses on the dimensional changes in wood and includes data on high and low levels of humidity and moisture content.
The Dot Plot displays multiple data points along an axis with a mark – such as filled, semi- or fully-transparent circle – to demonstrate the range and spread of values across a set. In the example below we see a range of dots against different Athletic events. Each dot is a gold medal won at the Olympics.