How to automatically segment customers using purchase data and a few lines of Python

阿新 • • 發佈：2018-12-29

How to automatically segment customers using purchase data and a few lines of Python

A small educative project for learning “Customer Segmentation” with a simple Data Analysis technique

Automatic Customer Segmentation using Recency/Monetary Matrix

Why should you care about customer segmentation? To deliver personalized experiences to customers, segmentation is key. It can provide insights into your customers behavior, habits, and preferences, allowing you to offer tailored marketing campaigns increasing your odds of success as well as improving your customers experience with tailored content.

What are we going to build? Using transactional purchasing data, we will be able to create a 2 x 2 value matrix to create 4 customers groups. Each group will defer f rom the other depending on 2 dimensions: (1) current customer value, and (2) potential customer value.

What technique are we going to use?

We are going to use the RFM model to create the required features from transactional purchasing data. The RFM model stands for:

Recency: When was the last time they purchased?
Frequency: How often and for how long have they purchased?
Monetary Value/Sales: How much have they purchased?

It’s usually used to identify the Highest Value Customer at the intersection of each 3 questions. To build the 2 x 2 matrix we will only use the R & the M from RFM.

What data are we using? We are going to use the purchased sample data set provided by Tableau also known as “Global Superstore”. It is often used for forecasting and time series analysis. It contains more than 1500 different customers and 4 years of purchase data. Since we are doing a behavioral segmentation and not a demographic segmentation, we will remove some potential demographic bias by filtering only on the B2C segment (consumer) and the USA country.

What approach are we taking?

Step 0: Load, filter, clean, and aggregate the data at the customer level,
Step 1: Create RFM Features for each customers,
Step 2: To automate the segmentation we will use the 80% quantile for Recency and Monetary (we could have also used k-mean clustering or leveraged business knowledge to create buckets — for example global superstore business users consider an active customer as someone whose last order is less than 100 days old),
Step 3: Calculate RM score and sort customers,
Step 4: Visualize the Value Matrix and explore some key numbers.

The Python way:

Step 0: Load, filter, clean, and aggregate the data at the customer level

import matplotlib as pltimport numpy as np%matplotlib inline  import warningswarnings.filterwarnings('ignore')import pandas as pdurl = 'https://github.com/tristanga/Data-Analysis/raw/master/Global%20Superstore.xls'df = pd.read_excel(url)df = df[(df.Segment == 'Consumer') & (df.Country == 'United States')]df.head()

Step 1: Create RFM Features for each customers

df_RFM = df.groupby('Customer ID').agg({'Order Date': lambda y: (df['Order Date'].max().date() - y.max().date()).days,                                        'Order ID': lambda y: len(y.unique()),                                          'Sales': lambda y: round(y.sum(),2)})df_RFM.columns = ['Recency', 'Frequency', 'Monetary']df_RFM = df_RFM.sort_values('Monetary', ascending=False)df_RFM.head()

Step 2: To automate the segmentation we will use 80% quantile for Recency and Monetary

# We will use the 80% quantile for each featurequantiles = df_RFM.quantile(q=[0.8])print(quantiles)df_RFM['R']=np.where(df_RFM['Recency']<=int(quantiles.Recency.values), 2, 1)df_RFM['F']=np.where(df_RFM['Frequency']>=int(quantiles.Frequency.values), 2, 1)df_RFM['M']=np.where(df_RFM['Monetary']>=int(quantiles.Monetary.values), 2, 1)df_RFM.head()

Step 3: Calculate RFM score and sort customers

# To do the 2 x 2 matrix we will only use Recency & Monetarydf_RFM['RMScore'] = df_RFM.M.map(str)+df_RFM.R.map(str)df_RFM = df_RFM.reset_index()df_RFM_SUM = df_RFM.groupby('RMScore').agg({'Customer ID': lambda y: len(y.unique()),                                        'Frequency': lambda y: round(y.mean(),0),                                        'Recency': lambda y: round(y.mean(),0),                                        'R': lambda y: round(y.mean(),0),                                        'M': lambda y: round(y.mean(),0),                                        'Monetary': lambda y: round(y.mean(),0)})df_RFM_SUM = df_RFM_SUM.sort_values('RMScore', ascending=False)df_RFM_SUM.head()

Step 4: Visualize the Value Matrix and explore some key numbers

# 1) Average Monetary Matrixdf_RFM_M = df_RFM_SUM.pivot(index='M', columns='R', values='Monetary')df_RFM_M= df_RFM_M.reset_index().sort_values(['M'], ascending = False).set_index(['M'])df_RFM_M

# 2) Number of Customer Matrixdf_RFM_C = df_RFM_SUM.pivot(index='M', columns='R', values='Customer ID')df_RFM_C= df_RFM_C.reset_index().sort_values(['M'], ascending = False).set_index(['M'])df_RFM_C

# 3) Recency Matrix

Some takeaways / quick wins with very simple sales & marketing tactics examples?

There are few customers in the “Disengaged” bucket and they have an average revenue higher than the “Star” bucket. Since there are very few of them, it should be easy to partner with the business to understand what happened at the customer level. Based on the analysis, there might be a simple quick win: reactivate few of them with a phone call or meeting to hopefully move them back to the “Star” bucket (e.g. engaged customers).
The average last order from the “Light” bucket is very old (more than 1 year vs. 60-70 days for ‘engaged’ customers). Launching a simple reactivation campaign with a coupon might be an initiative that could lead to some new orders and help some of these customers move to the “New” bucket (e.g. engaged customers).

The notebook is available on Github. Thank you for reading my post if you enjoyed it, please clap. Feel free to contact me if you want to make simple or more complex RFM segmentations within your organization.

Other interesting read to learn more about RFM with k-means for Python:

How to automatically segment customers using purchase data and a few lines of Python

How to automatically segment customers using purchase data and a few lines of Python

A small educative project for learning “Customer Segmentation” with a simple Data Analysis technique

How to automatically segment customers using purchase data and a few lines of Python

How To Configure VMware fencing using fence

How to Change Default Location for Outlook Data File (PST & OST)

how to publish an application using google app engine

How to perform CRUD operations using Blazor with MongoDB

A practical ES6 guide on how to perform HTTP requests using the Fetch API

MIT shows how to tackle fake news using AI and ML

How to impress interviewers by using recursion in JavaScript with ES6 features

Ask HN: How to implement caching for dynamic user data in sites like HN, Reddit?

How to serve HTTP/2 using Python

How to Automatically Generate Textual Descriptions for Photographs with Deep Learning

How to Better Understand Your Machine Learning Data in Weka

how to set the core dump file location(and name)?

How to setup kernel debug in Virtual Machine and redirect usermode debug sessions

How to make the impossible possible in CSS with a little creativity

Ask HN: How to find the most favorited HN posts and comments?

Predict flight delays using big data and R4ML

How to use Different Batch Sizes when Training and Predicting with LSTMs

Working Together to Bring Value to Managed Service Customers on AWS – CorpInfo and CloudCheckr

Letter to the MIT community regarding the MIT Stephen A. Schwarzman College of Computing

How to automatically segment customers using purchase data and a few lines of Python

How to automatically segment customers using purchase data and a few lines of Python

A small educative project for learning “Customer Segmentation” with a simple Data Analysis technique

相關推薦