A methodology for solving problems with DataScience for Internet of Things

A methodology for solving problems with DataScience for Internet of Things

A methodology for solving problems with DataScience for Internet of Things
This two part blog is based on my forthcoming book:  Data Science for Internet of Things.

It is also the basis for the course I teach  Data Science for Internet of Things Course.  I will be syndicating sections of the book on the Data Science Central blog.  Welcome your comments.  Please email me at ajit.jaokar at futuretext.com  – Email me also for a pdf version if you are interested in joining the course

Here, we start off with the question:  At which points could you apply analytics to the IoT ecosystem and what are the implications?  We then extend this to a broader question:  Could we formulate a methodology to solve Data Science for IoT problems?  I have illustrated my thinking through a number of companies/examples.  I personally work with an Open Source strategy (based on R, Spark and Python) but  the methodology applies to any implementation. We are currently working with a range of implementations including AWS, Azure, GE Predix, Nvidia etc.  Thus, the discussion is vendor agnostic.

Read Also:
How Artificial Intelligence Could Help Transform The Oil Industry

I also mention some trends I am following such as Apache NiFi etc

As we move towards a world of 50 billion connected devices,  Data Science for IoT (IoT  analytics) helps to create new services and business models.  IoT analytics is the application of data science models  to IoT datasets.  The flow of data starts with the deployment of sensors.  Sensors detect events or changes in quantities. They provide a corresponding output in the form of a signal. Historically, sensors have been used in domains such as manufacturing. Now their deployment is becoming pervasive through ordinary objects like wearables. Sensors are also being deployed through new devices like Robots and Self driving cars. This widespread deployment of sensors has led to the Internet of Things.

Features of a typical wireless sensor node are described in this paper (wireless embedded sensor  architecture). Typically, data arising from sensors is in time series format and is often geotagged. This means, there are two forms of analytics for IoT: Time series and Spatial analytics. Time series analytics typically lead to insights like Anomaly detection. Thus, classifiers (used to detect anomalies) are commonly used for IoT analytics to detect anomalies.  But by looking at historical trends, streaming, combining data from multiple events(sensor fusion), we can get new insights. And more use cases for IoT keep emerging such as Augmented reality (think – Pokemon Go + IoT)

Read Also:
Data science without statistics is possible, even desirable

Meanwhile,  sensors themselves continue to evolve. Sensors have shrunk due to technologies like MEMS. Also, their communications protocols have improved through new technologies like LoRA. These protocols lead to new forms of communication for IoT such as Device to Device; Device to Server; or Server to Server. Thus, whichever way we look at it, IoT devices create a large amount of Data. Typically, the goal of IoT analytics is to analyse the data as close to the event as possible. We see this requirement in many ‘Smart city’ type applications such as Transportation, Energy grids, Utilities like Water, Street lighting, Parking etc

Once data is captured through the sensor, there are a few analytics techniques that can be applied to the Data. Some of these are unique to IoT. For instance, not all data may be sent to the Cloud/Lake.  We could perform temporal or spatial analysis. Considering the volume of Data, some may be discarded at source or summarized at the Edge. Data could also be aggregated and aggregate analytics could be applied to the IoT data aggregates at the ‘Edge’. For example,  If you want to detect failure of a component, you could find spikes in values for that component over a recent span (thereby potentially predicting failure). Also, you could correlate data in multiple IoT streams. Typically, in stream processing, we are trying to find out what happened now (as opposed to what happened in the past).  Hence, response should be near real-time.

Read Also:
6 Reasons Why Big Data Career Is A Smart Choice

Read Full Story…


Leave a Reply

Your email address will not be published. Required fields are marked *