Association Rule Mining – Not Your Typical Data Science Algorithm Blog

Association Rule Mining – Not Your Typical Data Science Algorithm

by 7wData
November 24, 2016

Many machine learning algorithms that are used for data mining and data science work with numeric data. And many algorithms tend to be very mathematical (such as Support Vector Machines, which wepreviously discussed). But,association rule miningis perfect for categorical (non-numeric) data and it involves little more than simple counting! That’s the kind of algorithm thatMapReduce is really good at, and it can also lead to some really interesting discoveries.

Association rule mining is primarily focused on finding frequent co-occurring associations among a collection of items. It is sometimes referred to as “Market Basket Analysis”, since that was the original application area of association mining. The goal is to find associations of items that occur together more often than you would expect from a random sampling of all possibilities. The classic example of this is the famous Beer and Diapers association that is often mentioned in data mining books. The story goes like this: men who go to the store to buy diapers will also tend to buy beer at the same time. Let us illustrate this with a simple example. Suppose that a store’s retail transactions database includes the following information:

If there was no association between beer and diapers (i.e., they are statistically independent), then we expect only 10% of Diaper purchasers to also buy beer (since 10% of all customers buy beer). However, we discover that 80% (=6000/7500) of diaper purchasers also buy beer. This is a factor of 8 increase over what was expected – that is called Lift, which is the ratio of the observed frequency of co-occurrence to the expected frequency. This was determined simply by counting the transactions in the database. So, in this case, the association rule would state that diaper purchasers will also buy beer with a Lift factor of 8. In statistics, Lift is simply estimated by the ratio of the joint probability of two items x and y, divided by the product of their individual probabilities: Lift = P(x,y)/[P(x)P(y)]. If the two items are statistically independent, then P(x,y)=P(x)P(y), corresponding to Lift = 1 in that case. Note that anti-correlation yields Lift values less than 1, which is also an interesting discovery – corresponding to mutually exclusive items that rarely co-occur together.

The above simple example was made up, and it is very rare in real world cases to have Lift factors as high as 8. But, there was a case where it did happen. That case was discovered by Walmart in 2004 when a series of hurricanes crossed the state of Florida. After the first hurricane, there were several more hurricanes seen in the Atlantic Ocean heading toward Florida, and so Walmart mined their massive retail transaction database to see what their customers really wanted to buy prior to the arrival of a hurricane. They found one particular item that increased in sales by a factor of 7 over normal shopping days. That was a huge Lift factor for a real-world case. That one item was not bottled water, or batteries, or beer, or flashlights, or generators, or any of the usual things that we might imagine. The item was strawberry pop tarts! One could imagine lots of reasons why this was the most desired product prior to the arrival of a hurricane – pop tarts do not require refrigeration, they do not need to be cooked, they come in individually wrapped portions, they have a long shelf life, they are a snack food, they are a breakfast food, kids love them, and we love them. Despite these “obvious” reasons, it was a still a huge surprise! And so Walmart stocked their stores with tons of strawberry pop tarts prior to the next hurricanes, and they sold them out.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Association Rule Mining – Not Your Typical Data Science Algorithm

Leave a Reply Cancel reply

Upcoming Events

The Role of Taxonomy and Ontology in Semantic Layers

Evolving Your Data Architecture for Trustworthy Generative AI

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

Categories

Tags

You Might Be Interested In

Six ways data is changing business as we know it

Why Augmented Data Analytics is the Future of Business Intelligence?

Quantum Physics And The Big Data Question

Recent Jobs

Associate Director for Impact and Analytics

Data Scientist: Support NYS Attorney General Investigations

Judiciary Research Manager (Court Executive 2B)

Cyber Security Engineer – P2

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Association Rule Mining – Not Your Typical Data Science Algorithm

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change