This two part blog is based on my forthcoming book: Data Science for Internet of Things.
It is also the basis for the course I teach Data Science for Internet of Things Course. I will be syndicating sections of the book on the Data Science Central blog. Welcome your comments. Please email me at ajit.jaokar at futuretext.com – Email me also for a pdf version if you are interested in joining the course
Here, we start off with the question: At which points could you apply analytics to the IoT ecosystem and what are the implications? We then extend this to a broader question: Could we formulate a methodology to solve Data Science for IoT problems? I have illustrated my thinking through a number of companies/examples. I personally work with an Open Source strategy (based on R, Spark and Python) but the methodology applies to any implementation. We are currently working with a range of implementations including AWS, Azure, GE Predix, Nvidia etc. Thus, the discussion is vendor agnostic.
I also mention some trends I am following such as Apache NiFi etc
As we move towards a world of 50 billion connected devices, Data Science for IoT (IoT analytics) helps to create new services and business models. IoT analytics is the application of data science models to IoT datasets. The flow of data starts with the deployment of sensors. Sensors detect events or changes in quantities. They provide a corresponding output in the form of a signal. Historically, sensors have been used in domains such as manufacturing. Now their deployment is becoming pervasive through ordinary objects like wearables. Sensors are also being deployed through new devices like Robots and Self driving cars. This widespread deployment of sensors has led to the Internet of Things.
Features of a typical wireless sensor node are described in this paper (wireless embedded sensor architecture). Typically, data arising from sensors is in time series format and is often geotagged. This means, there are two forms of analytics for IoT: Time series and Spatial analytics. Time series analytics typically lead to insights like Anomaly detection. Thus, classifiers (used to detect anomalies) are commonly used for IoT analytics to detect anomalies. But by looking at historical trends, streaming, combining data from multiple events(sensor fusion), we can get new insights. And more use cases for IoT keep emerging such as Augmented reality (think – Pokemon Go + IoT)
Meanwhile, sensors themselves continue to evolve. Sensors have shrunk due to technologies like MEMS. Also, their communications protocols have improved through new technologies like LoRA. These protocols lead to new forms of communication for IoT such as Device to Device; Device to Server; or Server to Server. Thus, whichever way we look at it, IoT devices create a large amount of Data. Typically, the goal of IoT analytics is to analyse the data as close to the event as possible. We see this requirement in many ‘Smart city’ type applications such as Transportation, Energy grids, Utilities like Water, Street lighting, Parking etc
Once data is captured through the sensor, there are a few analytics techniques that can be applied to the Data. Some of these are unique to IoT. For instance, not all data may be sent to the Cloud/Lake. We could perform temporal or spatial analysis. Considering the volume of Data, some may be discarded at source or summarized at the Edge. Data could also be aggregated and aggregate analytics could be applied to the IoT data aggregates at the ‘Edge’. For example, If you want to detect failure of a component, you could find spikes in values for that component over a recent span (thereby potentially predicting failure). Also, you could correlate data in multiple IoT streams. Typically, in stream processing, we are trying to find out what happened now (as opposed to what happened in the past). Hence, response should be near real-time.