The Benefits of Building Predictive Analytics on Unified Customer Data
- by 7wData
Predictive customer lifetime value (CLV) is a key element in modern marketing analytics, allowing marketers to prioritize customers that have the highest predicted business value. The most popular data science approach to predicting CLV is the extended Pareto/NBD model (EP/NBD) generative model which leverages a few summary statistics about customer transactions: the frequency of repeat purchases, the total customer age, most recent purchase, and the historical average order value. Despite using only a few signals and being over fifteen years old, the EP/NBD models has maintained strong relative performance according to a recent comparison of several CLV prediction approaches.
There have been many attempts to substantially improve CLV prediction via more sophisticated modeling techniques (SVMs, boosted decision trees, and neural networks), but these models also assume the time-series of past customer transactions as the primary data signal. Further improvements to CLV prediction, and predictive analytics generally, are more likely to come from exploiting new sources of customer data rather than modeling techniques or feature engineering. To quote Rule #41 from Google’s Rule of Machine Learning: “When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.”
Luckily, following Rule #41 isn’t too hard since modern business collect more types of data than ever about their customers and their business interactions. For instance, retail businesses typically collect location information, itemized products purchases, marketing campaign response across an increasing number of channels. The diversity of data sources has grown so significantly that retailers and other direct-to-consumer businesses leverage dedicated customer data platforms (CDPs) to unify this data. Intuitively, these data sources can benefit prediction quality in many ways. Customers who purchase a specific product may be more likely to churn and not return. Customers living close to a high-performing store might have a higher lifetime spend. A customer who clicks on a marketing email and spends time browsing a catalogue may be more likely to make an in-store purchase.
While the significance of any of these kinds of data may vary from business to business, in total this explosion of customer data will have a substantial impact on how we approach predictive analytics. For instance, in experiments across multiple retailers and experimental settings, we found more than a 15% average improvement in CLV prediction (measured by root mean squared error) from using a diverse set of data signals over a model using only historical transactions.
Here are some of the key benefits to building on unified customer data.
One of the consequences of collecting customer data across multiple channels is that events associated with a given customer may be split across different records. For example, an in-store and online retail purchase from the same customer may not be resolved to a single unified customer profile if there is no shared customer primary key (e.g, an in-store purchase may not be associated with an email). This identity resolution failure has many important consequences, but one of them is that it compromises the quality of predictive analytics. If you are training a model to predict CLV spend by a customer, but your historical information is inaccurate, this will also impact the quality of the predictive model.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More