Sourcing a data-driven story is a complicated process. The magnitude of the information that is now available in even medium-sized datasets makes it difficult to know exactly what vein of information contains the most impressive or significant story, no matter how good a person may be at pattern recognition or spotting trends. Even if you believe you have found something insightful and original, there might be an even more interesting take on the same data that was only visible when the data is viewed at its most granular level.
Databases and data visualisation tools have become invaluable when finding the narratives in big data; the technical constraints of such tools mean that the larger a dataset is, the further the data needs to be shrunk to become manageable and to reduce processing time. This sacrifices crucial data granularity to the extent that interesting stories which rely on high levels of detail are lost.
So, what can be done? Let’s look at a real-life example of how all of the data can come into play when you have the right tools available, and why the most detailed data can be the most valuable.
In November 2016, InterWorks’ Tableau Zen Master Robert Rouse took part in the ‘Iron Viz’ competition with several other experts at Tableau’s worldwide customer and partner conference. The challenge was for each contestant to demonstrate the best use of Tableau when creating a data driven story, with each contestant drawing from the same 14 Gigabytes and 161 million rows of business data detailing New York taxi journey details. Robert analysed how snowfall and public holidays affected trip counts and overall taxi fares, with an emphasis on visualising the effect that snow days had on New York taxi fares. The contest provided a perfect demonstration of Tableau’s utility when creating data-driven stories through visualisations.
For the competition, Robert worked from a reduced data set created by taking a daily aggregate of trips taken and fares earned over the year of 2014. For the map visualisation, the dataset was further shrunk to a three-day period, chosen to best highlight trip frequency changes that a single snow day caused across New York. The dataset needed to be reduced this much because it was simply not possible to address the complete dataset within the visualisation tool, and this diluted dataset lost much of its granularity and detail.
Robert chose to re-run the analysis from the competition for a webinar, again in order to demonstrate the effectiveness of visualising data, but this time the dataset would be handled by EXASOL’s in-memory analytic database instead of Tableau data extracts. In this re-run, Robert showcased the level of work required when identifying and extracting a story from such a large dataset, and demonstrated how by using a powerful analytic database he could draw from the complete dataset.
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017
20% off with code AIP17-7WDATA-20