Can Big Data algorithms tell better stories than humans?


http://www.thoughtsoncloud.com/2015/07/can-big-data-algorithms-tell-better-stories-than-humans/What if the computer algorithms could tell more compelling stories than journalists, writers or business analysts? Well, this is increasingly becoming a reality. A new generation of Big Data tools are being put to automate storytelling.

The ideas behind this application of analytics were first put to use generating automated news reports, covering sports and financial stories. Take the recent Wimbledon tennis championships as an example. The Slamtracker system developed by IBM monitors each game using sensors and cameras, generating millions of real-time data points covering speed of serve, forced and unforced errors, and even the social media sentiment surrounding each game. This data can then be turned into automated stories or Twitter messages to ensure Wimbledon are the first to break news stories about the results.

Already journalists have expressed worries that technology like that could put them out of a job. But the truth is, if it is possible to teach the process of structuring data into a narrative to a human, it can be taught to a computer too.

Kris Hammond, co-founder and chief scientist at Narrative Science, which has created the Quill natural language generation platform, realized early on that technology could be used turn information into easy to understand narratives. In fact, Quill is a regular contributor to Forbes–just like me. You can see its latest contributions here.

Read Also:
Coursera Launches Data Analysis Specialization Track

Quill, or competing apps like Automated Insights are used by other media outlets – but due to a lack of information over how trustworthy readers would consider reports created by algorithms, many news publishers may be reluctant to admit whether their stories, or parts of them, are generated by computers.

The implications of this technology go further than putting journalists out of work, however. In fact Hammond concedes Quill isn’t yet great at finding news stories–its strengths lie in putting stories together from specific data sources. Narrative Science is currently running one application which reads the stock market and attempts to spot when unusual highs, lows or volume spikes could have important implications, but Hammond calls this a “very controlled” instance of Quill digging up its own stories. He stands by his claim, made in 2012 that a computer would be able to write Pulitzer-prize quality journalism within five years–although he admits the clock is ticking!

No, the real value, Hammond says, is not in the scattershot approach of news publishing, where one article is created for a vast audience in the hope that some will find it interesting or useful. Natural language generation and automated narrative creation mean that one dataset can be interpreted in multiple ways, giving each targeted audience segment precisely what they need to know, without any confusing background noise.

Read Also:
The new transatlantic data Privacy Shield

This makes it ideal for corporate communications, where e.g. a company’s financial, customer and operations data can be interpreted and insights reported directly to whichever people in the organization are in the best position to make a change.

So, for example, if an algorithm running at a manufacturing company was to pick up on the fact that a bottleneck in production of one component was leading to an overall loss in revenue, it could create tailored reports for every department involved in the process, explaining the situation and the best course of action to correct it. Doing this manually would be a very time-consuming undertaking.

Just as with other high-tech developments of today – driverless cars spring to mind – earning the trust of humans is essential. The algorithms must allow for full sourcing and accountability. This is why although Natural Language Generation is the foundation of this sort of technology, the data and analytics which underpin it are just as important.

At the moment, automated narratives generally work well with structured data – information such as numbers and measurements which fit nicely into a spreadsheet and can be compared quantitatively. In the future, I would expect to see an increasing amount of the messy, unstructured data which we are increasingly generating and collecting included in these processes. For example video data could be analyzed and interpreted to add color and insight to reports. Going back to news reporting, CCTV footage could tell us if streets were empty or crowded with people at the time of an armed robbery. At the same time, social media analysis could bolster reports with an ad-hoc assessment of public sentiment towards any issue which is relevant.

Read Also:
Introducing Civis: The Data Science Platform

Narratives are one of the most important tools we have. Humans have always told stories – fictional, real or somewhere in between–as a way of passing on information and influencing events. Giving that power to computers may, to some, seem a step too far. But don’t we already often distrust the concept of “narrative”? The word is commonly used interchangeably with “spin” to suggest that someone is tailoring their depiction of events to suit their own needs. Computers can’t “spin” (unless they are programmed to, of course) so for news reporting, or conveying hard facts about a business, couldn’t they be seen as more trustworthy than humans? Read more…

Leave a Reply

Your email address will not be published. Required fields are marked *