Which Spark machine learning API should you use?

Which Spark machine learning API should you use?

You’re not a data scientist. Supposedly according to the tech and business press, machine learning will stop global warming, except that’s apparently fake news created by China. Maybe machine learning can find fake news (a classification problem)? In fact, maybe it can.

But what can machine learning do for you? And how will you find out? There’s a good place to start close to home, if you’re already using Apache Spark for batch and stream processing. Along with Spark SQL and Spark Streaming, which you’re probably already using, Spark provides MLLib, which is, among other things, a library of machine learning and statistical algorithms in API form.

Here is a brief guide to four of the most essential MLlib APIs, what they do, and how you might use them.  

Mainly you’ll use these APIs for A-B testing or A-B-C testing. Frequently in business we assume that if two averages are the same then the two things are roughly equivalent. That isn’t necessarily true. Consider if a car manufacturer replaces the seat in a car and surveys customers on how comfortable it is. At one end the shorter customers may say the seat is much more comfortable. At the other end, taller customers will say it is really uncomfortable to the point that they wouldn’t buy the car and the people in the middle balance out the difference. On average the new seat might be slightly more comfortable but if no one over 6 feet tall buys the car anymore, we’ve failed somehow. Spark’s hypothesis testing allows you to do a Pearson chi-squared or a Kolmogorov–Smirnov test to see how well something “fits” or whether the distribution of values is “normal.” This can be used most anywhere we have two series of data. That “fit” might be “did you like it” or did the new algorithm provide “better” results than the old one. You’re just in time to enroll in a Basic Statistics Course on Coursera.

What are you? If you take a set of attributes you can get the computer to sort “things” into their right category. The trick here is coming up with the attribute that matches the “class,” and there is no right answer there. There are a lot of wrong answers. If you think of someone looking through a set of forms and sorting them into categories, this is classification. You’ve run into this with spam filters, which use a list of words spam usually has. You may also be able to diagnose patients or determine which customers are likely to cancel their broadcast cable subscription (people who don’t watch live sports). Essentially classification “learns” to label things based on labels applied to past data and can apply those labels in the future.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How big data is busting traffic jams

17 Jul, 2018

This article was written by Jim McClelland. Jim is a sustainable futurist, speaker, writer and social-media commentator. His specialisms include built …

Read more

Artificial empathy: the upgrade AI needs to speak to consumers

15 Apr, 2022

In a proliferated, multi-channel world, every brand needs to win the heart and mind of the consumer to acquire and …

Read more

Boost Your Digital Marketing with Business Intelligence

16 Jun, 2017

With the emergence of social media, search engine optimization and social media have become the foundation of marketing strategies for …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.