Hyperbole is the norm when talking about how businesses can transform themselves using Internet of Things and big data. What you hear less often is just how difficult it can be to get these projects right.
After more than 10 years of working on big data and Internet of Things (IoT) programs, analytics firm Teradata has seen how much effort is required to mine useful insights from webs of interconnected sensors.
Sprinkling IoT sensors throughout your firm won't necessarily give you a snapshot of what you need to know, at least not without a lot of work, said Martin Wilcox, who leads Teradata's centre of excellence.
"It's about the sensor data, stupid," he told the Strata + Hadoop World conference in London, going on to warn:
"The data those sensors produce is an unreliable, unwilling, and in some cases downright deceitful, witness to the events we care about."
Here are the five hard truths about IoT that Wilcox said businesses need to take on board.
"This often comes as a big surprise to business and IoT people, who tend to assume that because smart devices never come to work hungover or distracted after a row with a partner, that everything they record can be assumed to be complete, consistent and accurate," he said.
"But if you talk to the hardware engineers that maintain sensor networks, you'll discover that nothing could be further from the truth."
Given a large enough deployment of sensors, the accuracy of the data they collect will drift over time, as the hardware degrades, he said.
In harsh environments, for instance oil field sensors measuring temperature in a hot desert environment, this degradation can happen quite rapidly, he said.
These compromised sensors can't easily be replaced "because while the sensors themselves are so cheap they're almost free, the cost of the lost production incurred in replacing them most definitely is not".
One way to counter the increasing unreliability of sensor data over time is to corroborate each sensor's data with that of its neighbours, said Wilcox, who suggested creating a "virtual sensor from a neural network of adjacent sensor readings".
"The important thing to understand is that this sensor data needs to be managed. We can't assume that machine-generated data is complete, consistent and accurate, just because it was generated by a machine."
Sensors often sit behind machines that filter and aggregate the data they collect.
There are good reasons to ditch irrelevant data, but sometimes the data you thought was chaff later turns out to be valuable, particularly in regards to the information that can be gleaned when combining it with other data.
"It's precisely because what is noise for one application may be vitally important signals for another that at the very minimum we need to understand where and how sensor data has been summarised and filtered," he said.
"In very many cases, our ambition should be to try and capture this raw sensor data and avoid this kind of premature summarisation.