Big data is now a familiar term in most of the business world, and companies large and small are scrambling to take advantage of it. Data exhaust, on the other hand, is less widely known, and in some ways it's an evil twin brother. Here are five things you should understand about data exhaust's pros and cons.
1. It's essentially all the big data that isn't core to your business.
The "data exhaust" term has been around for more than a decade, and it arose with the new streams of data coming from smartphones, said Tye Rattenbury, director of data science and solutions engineering at Trifacta, which makes software for data preparation. Today, more accessible data tools are bringing exhaust to the fore.
If big data is "primary" data that relates to the core function of your business, data exhaust is secondary data, or everything else that's created along the way, Rattenbury explained.
For instance, a bank would consider primary all the data about debits and credits to its customers' accounts. Secondary data might include information like what percentage of customers' transactions are done at an ATM instead of a physical branch.
There are no standard definitions or schemas for data exhaust, which tends to be raw and unstructured, but in many ways, it's equivalent to the byproducts associated with a company's machines and core online activities. It can include streams coming in from Web browsers, plug-ins, log files, Internet of Things (IoT) devices, and more.
The term "big data" is itself a relative term, boiling down essentially to "anything that's so large that you couldn't manually inspect or work with it record by record," Rattenbury said. In general, data exhaust tends to be even bigger, primarily because there are few limits on what a company can collect.
"Google is the leader here," he said. "They literally collect everything, even before they know what they will do with it."
That brings up another interesting feature of data exhaust: It can become primary data once a use for it is found.
Data exhaust can be enormously useful. In that bank example, for instance, knowing where consumers conduct most of their transactions can help the bank do a better job.
"It's not core to the transaction, but it can still be hugely relevant to servicing customers at a better level," Rattenbury said. "It provides a level of understanding and contextualization to that primary transaction or service that's increasingly desired by customers."
Data exhaust can contain important elements of information that you may not be looking for today but that could prove useful in the future, noted Mary Shacklett, president of research firm Transworld Data.
"A lot of exhaust data isn’t immediately valuable," agreed Nik Rouda, senior analyst with Enterprise Strategy Group.