Can I predict the future ?
Predictive analytics is an umbrella term used to describe the process of applying various computational techniques with the objective of making some predictions about the future based on past data. This encompasses a variety of techniques including data mining, modelling, pattern recognition, and even graph analytics.
Does this mean we can predict future lottery numbers based on past lottery numbers? Sadly no, but, if anyone wants to prove us wrong, we will require at least 3 successful live demonstrations before we are convinced.
We're not going to get into too many details in this article as the field is quite large and we are far from an expert. We are just going to touch on the general process used when trying to make predictions using historical data. Then we are going to poke our head into some cool tech within this field.
Step 1: Get the Data
The first step in the process is usually all about data mining and filtering. Many data sources are often quite large and unstructured. So this step is all about extracting structured data from sources. On the topic of sources, be sure to select relevant and trusted sources. If we were trying to predict election results we would probably avoid using The Onion— although given political outcomes this year we may be wrong.
Step 2: Analyse the Data
Here we need to start focusing on the contents of the data. This alone can prove to be quite a challenge. For example, if you are trying to make predictions about your own health, what information should you take into account? Do you smoke? What is your favourite colour? Where do you work? Often determining what is relevant and what is not is its own challenge. Proper pre-processing and filtering techniques are a must when cleaning up your data.
You should also ensure your data is of good quality. A reliable source alone does not ensure quality. What if you scraped your data from wikipedia on the day someone thought it would be fun to vandalise the articles you were mining? Running your data through existing analysis pipelines could be quite informative and a simple method of spotting questionable data. More formally you can use confirmatory factor analysis to ensure your extracted data will at least fit your model. It is also recommend that you apply other statistical techniques to ensure your data can account for variance, false positives, and other issues which often crop up from real world data.
Step 3: Model the Data
This step is fundamental as it allows you to structure your data in such a way that you can start recognising patterns that potentially allow you to extract future trends. Models also allow you to formally describe your data. This is helpful in understanding the results you get from your data analysis but is also a good starting point when it comes time to visualise your results.
Similarly to data extraction, your models should undergo the same scrutiny. You should ensure that your models are valid representations of the issue you are trying to predict. Consulting with domain experts is often a good idea.