Big data is getting bigger all the time, but that’s only half the story. The ever-growing amount of information streaming in from sensors, point-of-sales, social media and clickstreams means that enterprises must now, more than ever, have the capabilities to react quickly. Data, after all, has a shelf life. It’s all very well if your analytics framework can tell you how you should have kept your customers satisfied yesterday–but you’re likely to lose out to a competitor who has worked out how to keep them satisfied today and tomorrow.
This is the concept behind “fast data.” Of course “velocity” has always been one of the Vs of big data – along with volume, variety and veracity. But the explosion in the application of real-time, in-memory and edge analytics means that increasing efforts are going into tackling data as soon as it emerges from the firehose, where the insights which can be gleaned are at their most valuable.
For many of the most cutting-edge applications – for example demand forecasting, fraud detection and compliance reporting, data quickly loses its value if it can’t be analyzed and acted on immediately. For example, when data scientists at Walmart were putting together the latest iteration of the supermarket giant’s data framework, a decision was taken that only the previous few weeks’ worth of transactional data would be streamed through their pipelines – anything else was regarded as too untimely to have any real value in demand forecasting.
Likewise, in banking and insurance, enterprises are finding that immediate access to the most relevant data is vastly more valuable than petabytes of historical data that has sat in warehouses for years, gathering virtual dust (and incurring storage and compliance expense) because someone though that it may one day be useful.
The open source community has embraced the concept of “fast data” wholeheartedly, with platforms such as Spark, Kafka and Storm becoming popular in recent years due to their ability to process streams of data with lightning speed. To achieve this, data is often processed in-memory – cutting down the time needed to spin up physical hard disks and seek the information stored on them. An important differentiator is that “fast” Big Data is generally processed as a stream, while “slow” Big Data is processed in batches.
A company providing ‘fast data’ solutions is Nastel and of their customers, a Fortune 500 bank, is processing over $1T in funds per day. Several times during each day the bank is required to reconcile their vast accounting records with the Federal Reserve. Today, the bank is able to analyze (in-memory) these transactions and ensure that they are processed in priority order as some can be delayed while others must be processed immediately.