RFM analysis scores customers on three dimensions: how recently they purchased, how often they purchase, and how much they spend. The technique dates back to the 1990s direct mail era and remains the standard approach for behavioral customer segmentation in ecommerce.

This playbook builds an RFM pipeline that processes order data from BigQuery or CSV, calculates quintile-based scores, and maps customers to 11 named segments. The output pipes to Claude Code for AI-generated action plans per segment.

0:00
/0:47

Why Build Your Own

Dedicated segmentation tools charge anywhere from $200-5,000+/month. They work, but your data leaves your warehouse, you pay per seat, and you can't pipe the output directly into other tools.

This playbook gives you the same output. The code runs locally or on your warehouse. You own the logic. And because the output is plain text, you can pipe it directly to Claude Code for instant action recommendations. No API integrations, no webhook configurations. Just hatch run pipeline | claude -p "what should I do" and you get a prioritized action plan in seconds.

🚀
The pipeline scores every customer from 1–5 on recency, frequency, and monetary value, maps them to named segments like Champions, At Risk, and Can't Lose, then runs health checks that flag when high-value customers are slipping away.

How It Works

The pipeline follows a standard ETL pattern. First, it extracts orders from either a local CSV file or BigQuery. Then it transforms the data by aggregating orders to the customer level, calculating recency (days since last purchase), frequency (number of orders), and monetary value (total spend). Each dimension gets scored from 1-5 using quintiles, meaning customers are ranked relative to each other rather than against fixed thresholds. Finally, the pipeline loads the results to either a CSV file or BigQuery table.

This modular design means you can swap the extract stage to read from Shopify API, Snowflake, or any other source without touching the transform or load logic.

The quintile approach matters because fixed thresholds break across different business models. A furniture retailer might see customers every 3 years while a coffee subscription renews weekly. "Purchased within 30 days" means something completely different for each. Quintiles normalize by ranking. The top 20% of your customers by recency get a 5 regardless of whether that means 7 days or 70 days.

The scoring works like this: a customer with scores of 5-5-5 (high recency, high frequency, high monetary) is a Champion. A customer with 1-5-5 (low recency but high frequency and monetary) is At Risk. They were valuable but stopped buying. The pipeline maps these score combinations to 11 named segments: Champions, Loyal, Potential Loyal, New Customers, Promising, Need Attention, About to Sleep, At Risk, Can't Lose, Hibernating, and Lost.

What It Produces

The pipeline outputs a segment summary table showing customer count, total revenue, revenue percentage, and average LTV for each segment. It also runs health checks, flagging when high-value segments like "Can't Lose" contain customers or when "At Risk" represents a dangerous percentage of revenue.

Segment Summary
╭─────────────────┬───────┬────────────┬───────┬─────────╮
│ Segment         │ Count │    Revenue │ % Rev │ Avg LTV │
├─────────────────┼───────┼────────────┼───────┼─────────┤
│ Champions       │   767 │ $6,313,950 │ 54.4% │  $8,232 │
│ Loyal           │   671 │ $1,946,425 │ 16.8% │  $2,901 │
│ Potential Loyal │ 1,077 │ $1,762,530 │ 15.2% │  $1,637 │
│ At Risk         │    55 │   $121,680 │  1.0% │  $2,212 │
╰─────────────────┴───────┴────────────┴───────┴─────────╯

╭─────────────────── Health Check ───────────────────╮
│ ✓ Champions represent 54.4% of revenue            │
│ ⚠️ CRITICAL: 1 Can't Lose customer ($5,475)        │
╰───────────────────────────────────────────────────╯

The results are saved to output/segments.csv with one row per customer:

customer_id  email               recency  frequency  monetary  segment
C001         alice@example.com   12       8          1250.00   Champions
C002         bob@example.com     145      2          89.00     Hibernating

This CSV is ready to import into Klaviyo. Upload it, map the email column, then create segments filtering by the segment field. Set up flows for each: Champions get loyalty rewards and early access, At Risk customers get winback campaigns with discount offers, Can't Lose customers get personal outreach.

From Analysis to Action with Claude Code

The segment table tells you who your customers are. Claude Code tells you what to do about it. Pipe the pipeline output directly to Claude and get a prioritized action plan based on your actual segment data.

Claude reads the segment summary, identifies the highest-impact opportunities, and generates specific recommendations for each segment. A single Can't Lose customer worth $5,475 gets flagged for personal outreach. The 55 At Risk customers with $2,212 average LTV get a structured winback campaign. Champions get loyalty program recommendations. The output is a ready-to-execute priority matrix with timelines and tactics.

Running the Pipeline

Clone the playbook and follow the README to set up the package structure:

The playbook includes test data with 5,000 customers and 41,000 orders so you can see results immediately. To use your own data, either replace data/orders.csv with your order export (keeping the same column format), or configure BigQuery in .env and run hatch run ingest to load your data first.

This post is for paying subscribers only

Sign up now and upgrade your account to read the post and get access to the full library of posts for paying subscribers only.

Sign up now Already have an account? Sign in