Know when your big data is telling big lies

Know when your big data is telling big lies

Know when your big data is telling big lies

Data scientists use statistical analysis tools to find non-obvious patterns in deep data. But they know the universe is full of spurious correlations. Big data simply intensifies the problem.

Because, as the range of sources and the diversity of predictors continues to grow, the number of relationships that can potentially be modeled begins to approach infinity. As David G. Young pointed out, “predictive variables sometimes aren’t ....We’ve all seen variable interactions that change the significance, curvature, and even the sign of an important predictor.”

Thus, if you’re looking for a particular correlation in your data, you can probably find it if you’re clever enough to combine only the right data, specify only the right variables, and analyze at using only the right algorithm. Once you’ve hit on the right combination of modeling decisions, the patterns you seek may pop out like a genie from Aladdin’s lamp.

Yet the fact that you’ve supposedly discovered this correlation doesn’t mean it actually exists in the underlying real-world domain you’re investigating. It may simply be a figment of your specific approach to modeling the data you have at hand. You may have no fraudulent intent, and you may otherwise adhere to standard data-scientific methodologies, but you may choose to go no further if it appears you’ve already struck the pay dirt insight you were seeking.

Read Also:
Who Leads in the Race for Better Master Data Management?

If you’re a data scientist, the fact that you don’t realize you’re looking at non-existent statistical patterns may simply stem from the fact that you’re human.

 



SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
What is a Data Management Platform or DMP?

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
5 Steps for Creating a Scalable Data Security Plan

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
The Science of Data Governance Matter

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
5 Steps for Creating a Scalable Data Security Plan

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
The Science of Data Governance Matter

Leave a Reply

Your email address will not be published. Required fields are marked *