Three Things About Data Science You Won’t Find In the Books
- by 7wData
So here are my three principle experiences you won't effectively discover in books.
The main goal in data analysis/machine learning/data science,is to build a system which will perform well on future data. The distinction between supervised and unsupervised learning makes it hard to talk about what this means in general, but in any case you will usually have some data set collected on which you build and design your method. But eventually you want to apply the method to future data, and you want to be sure that the method works well and produces the same kind of results you have seen on your original data set.
A mistake often done by beginners is to just look at the performance on the available data and then assume that it will work just as well on future data. Unfortunately that is seldom the case. How about we simply discuss administered learning for the time being, the place the undertaking is to foresee a few yields in view of your contributions, for instance, group messages into spam and non-spam.
If you only consider the training data, then it’s very easy for a machine to return perfect predictions just by memorizing everything.Actually, this isn’t that uncommon even for humans. Remember when you were memorizing words in a foreign language and you had to made sure that you were testing the words out of order.
Still, a lot can go wrong, especially when the data is non-stationary, that is, the underlying distribution of the data is changing over time. Which often happens when you are looking at data measured in the real world. Sales figures will look quite different in January than in June.
There is a lot of correlation between the data points, meaning that if you know one data point you already know a lot about another data point. For instance, in the event that you take stock costs, they as a rule don't bounce around a great deal from one day to the next, so that doing the preparation/test split haphazardly by day prompts preparing and test informational indexes which are very associated.
2.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Shift Difficult Problems Left with Graph Analysis on Streaming Data
29 April 2024
12 PM ET – 1 PM ET
Read MoreCategories
You Might Be Interested In
How to find the silver lining in the EU’s looming General Data Protection Regulation penalties
5 Jun, 2017With less than a year remaining until the European Union’s new General Data Protection Regulation kicks in, the level of hyperventilation …
What are the real opportunities for big data in the digital world?
12 Nov, 2017Human beings are an innovative species and digitalisation is proof of this. It is undeniable that the world is becoming …
The rise of the cloud data platform
31 May, 2021The year 2020 is seared into CIOs’ collective memory as one of the most cataclysmic, consequential years this century. But …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.