A methodology for solving problems with DataScience for Internet of Things

A methodology for solving problems with DataScience for Internet of Things

A methodology for solving problems with DataScience for Internet of Things

This two part blog is based on my forthcoming book:  Data Science for Internet of Things.

It is also the basis for the course I teach  Data Science for Internet of Things Course.  I will be syndicating sections of the book on the Data Science Central blog.  Welcome your comments.  Please email me at ajit.jaokar at futuretext.com  - Email me also for a pdf version if you are interested in joining the course

Here, we start off with the question:  At which points could you apply analytics to the IoT ecosystem and what are the implications?  We then extend this to a broader question:  Could we formulate a methodology to solve Data Science for IoT problems?  I have illustrated my thinking through a number of companies/examples.  I personally work with an Open Source strategy (based on R, Spark and Python) but  the methodology applies to any implementation. We are currently working with a range of implementations including AWS, Azure, GE Predix, Nvidia etc.  Thus, the discussion is vendor agnostic.

I also mention some trends I am following such as Apache NiFi etc

As we move towards a world of 50 billion connected devices,  Data Science for IoT (IoT  analytics) helps to create new services and business models.  IoT analytics is the application of data science models  to IoT datasets.  The flow of data starts with the deployment of sensors.  Sensors detect events or changes in quantities. They provide a corresponding output in the form of a signal. Historically, sensors have been used in domains such as manufacturing. Now their deployment is becoming pervasive through ordinary objects like wearables. Sensors are also being deployed through new devices like Robots and Self driving cars. This widespread deployment of sensors has led to the Internet of Things.

Read Also:
Putting data to work to drive business transformation

Features of a typical wireless sensor node are described in this paper (wireless embedded sensor  architecture). Typically, data arising from sensors is in time series format and is often geotagged. This means, there are two forms of analytics for IoT: Time series and Spatial analytics. Time series analytics typically lead to insights like Anomaly detection. Thus, classifiers (used to detect anomalies) are commonly used for IoT analytics to detect anomalies.  But by looking at historical trends, streaming, combining data from multiple events(sensor fusion), we can get new insights. And more use cases for IoT keep emerging such as Augmented reality (think – Pokemon Go + IoT)

Meanwhile,  sensors themselves continue to evolve. Sensors have shrunk due to technologies like MEMS. Also, their communications protocols have improved through new technologies like LoRA. These protocols lead to new forms of communication for IoT such as Device to Device; Device to Server; or Server to Server. Thus, whichever way we look at it, IoT devices create a large amount of Data. Typically, the goal of IoT analytics is to analyse the data as close to the event as possible. We see this requirement in many ‘Smart city’ type applications such as Transportation, Energy grids, Utilities like Water, Street lighting, Parking etc

Read Also:
How Much "Data Science" Do You Really Need? — Dennis D. McDonald's Web Site

Once data is captured through the sensor, there are a few analytics techniques that can be applied to the Data. Some of these are unique to IoT. For instance, not all data may be sent to the Cloud/Lake.  We could perform temporal or spatial analysis. Considering the volume of Data, some may be discarded at source or summarized at the Edge. Data could also be aggregated and aggregate analytics could be applied to the IoT data aggregates at the ‘Edge’. For example,  If you want to detect failure of a component, you could find spikes in values for that component over a recent span (thereby potentially predicting failure). Also, you could correlate data in multiple IoT streams. Typically, in stream processing, we are trying to find out what happened now (as opposed to what happened in the past).  Hence, response should be near real-time.

 



Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
How augmented reality is transforming e-commerce
Read Also:
Challenges in Data Science: How to Better Organize Enterprise Components

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
10 Online Big Data Courses and Where to Find Them 2016

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
How Much "Data Science" Do You Really Need? — Dennis D. McDonald's Web Site

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
5 Reasons Why Startups Must Use Business Intelligence Tools

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
The misanthrope’s vain struggle with big data

Leave a Reply

Your email address will not be published. Required fields are marked *