Most marketing teams store Meta Ads, GA4 events, and orders in separate tables. Without identity stitching, these remain reporting tables - useful for dashboards but not activation.

This playbook fixes that.

You'll build a working Meta + GA4 attribution pipeline that ties ad clicks to customer behavior and verified orders, unlocking deep, cross-journey insights for SQL agents from ad attribution to lifetime value optimization.

🧩
Foundation: This identity stitching creates the stitched data foundation that enables Customer 360 profiles, SQL-defined audiences, and automated attribution agents. Without this foundation, those capabilities are impossible.

Here is what we will do in this playbook:

  • Load Meta Ads, GA4 events, and orders from sample data using Polars
  • Parse FBCLIDs and UTM parameters from GA4 landing page URLs
  • Connect identifiers across sources with proper type handling
  • Load to BigQuery with nested schemas preserving campaign hierarchies
  • Run attribution analysis comparing platform-reported vs verified transactions

After running the pipeline, you'll see the over-reporting in Meta's conversion data compared to verified transactions. Real attribution revealing which conversions actually happened versus platform modeling.

Identity stitching transforms these tables into an activation foundation. Once Facebook click identifiers (FBCLIDs) map to customers and orders, BigQuery becomes more than a warehouse. SQL-defined audiences sync to ad platforms through server-side Google Tag Manager (GTM). Customer profiles serve from AlloyDB with sub-50ms response times.

SQL agents can query these connected tables to automatically surface insights - which campaigns underperform with specific segments, when attribution windows affect ROAS, where creative fatigue impacts conversion.

Running the Complete Pipeline

Let's dive straight in and see the pipeline work end-to-end:

Clone the repository and run everything:

git clone https://github.com/rabbitengine/playbook-meta-ads-ga4
cd playbook-meta-ads-ga4

hatch run local

The pipeline runs through all stages automatically:

🚀 Complete Multi-Source ETL Pipeline

📥 Extracting from multiple sources...
  → Meta Ads: 60 rows
  → GA4 Events: 209 rows
  → Orders: 42 rows
✅ Extracted 3 data sources

🔄 Transforming with identity stitching...
  → Meta Ads: 60 rows with nested STRUCTs
  → GA4 Events: 209 rows with 40 FBCLIDs extracted
  → Orders: 42 rows (flat table)
✅ Transformed 3 tables

📤 Loading to BigQuery...
  → meta_ads_bronze: 60 rows
  → ga4_events_bronze: 209 rows
  → orders_bronze: 42 rows
✅ Loaded 3 tables

🎉 Pipeline completed!

The pipeline processed 30 days of data across three sources. 40 FBCLIDs were extracted from 209 GA4 events - these are the connections between ad clicks and customer orders.

Now run the attribution analysis to see the real value:

hatch run analyze
📈 Attribution Comparison (7-day window):
  Total Meta Reported: 48 conversions, $3,120 revenue
  Total Verified: 30 conversions, $2,045 revenue
  Discrepancy: 37.5% over-reporting

Setup instructions (Python, hatch, GCP authentication) are in the repository README.

Understanding the Extraction

The pipeline extracted three different data formats:

This post is for paying subscribers only

Sign up now and upgrade your account to read the post and get access to the full library of posts for paying subscribers only.

Sign up now Already have an account? Sign in