Machine Learning’s Poor Fit for Real Data

Machine Learning’s Poor Fit for Real Data

There’s a growing sentiment out there with all the wonderful things happening in artificial intelligence, machine learning, and data science that these technologies are ready to solve all the things (including how to kill all humans). The reality is there are still a bunch of significant hurdles between us and the AI dystopia/utopia. One big one that is the main impetus behind my research is the disconnect between the statistical foundations of machine learning and how real data works.

Machine learning technology is built on a foundation of formal theory. Statistical ideas, computer science algorithms, and information-theoretic concepts integrate to yield practical methods that analyze large, noisy data sets to train actionable and predictive models. The power of these methods has caused many to realize the value of data.

Yet, as data collection accelerates, weaknesses of existing machine learning methods reveal themselves. The nature of larger-scale data collection violates key assumptions in the foundation that made machine learning so effective. Most notably, statistical independence is no longer achievable with large-scale data. Data is being collected from highly interacting, entangled, complex systems. Human data describes people interacting in a single global social network; ecological data represents measurements of organisms inhabiting complex, shared ecosystems; and medical data measures the interconnected, biological systems that govern health.

The concept of statistical independence is a natural fit for laboratory experimentation. In laboratory experiments, scientists test hypotheses by running repeated experiments in closed environments. By design, the measurements taken during each experiment are independent. Because one experiment can’t affect another’s result, classical statistics can confidently quantify the effects of factors in the experiment, even in the presence of randomness.

For example, a typical pre-clinical laboratory drug trial would use a population of animal subjects, administering a drug to part of the population and giving no treatment to a separate control subpopulation. The two subpopulations would be managed to ensure that confounding factors, such as genetics, are equally distributed.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Airlines Reimagining Core Operations With Blockchain

7 Jul, 2019

Investment in blockchain is increasing across business sectors, thanks to compelling use cases and cautious optimism about the technology’s traction. …

Read more

The 9 Best Free Online Big Data And Data Science Courses

8 Jun, 2017

Demand for skilled data scientists continues to be sky-high, with IBM recently predicting that there will be a 28% increase …

Read more

6 Big Data Transformation Strategies for Telecom Industry

4 Feb, 2017

One of the highest priorities for telecoms is to retain its customers; therefore, it is critical for communication service providers …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.