Most data teams accumulate a graveyard of one-off scripts. Every pipeline ends up different and coding agents like Claude Code will happily reproduce whatever messy patterns it's seen before.

Claude Code skills change that. A skill teaches Claude how to do something, not just code to copy. Load this skill and Claude learns a production pipeline pattern: proper project structure, single-file config, environment-driven dev/prod switching, and built-in quality gates.

🚀
Modern pipelines follow a simple pattern: extract, transform, load. One config file controls everything and one command runs it. Dependencies install automatically. Type checking and linting catch bugs before production. This skill teaches Claude that pattern.

The skill in this playbook gives Claude the structure it needs to produce production-ready pipelines.

Running Claude with the Skill

Download the skill file and add it to your project:

your-project/
├── .claude/
│   └── skills/
│       └── pipelines/
│           └── SKILL.md    ← drop it here
└── ...

Claude Code automatically discovers skills in this folder. Open your project in Claude Code and ask it to build a pipeline. Claude reads the skill and applies the pattern. The skill teaches the structure, not just code to copy. You describe what you need and Claude generates the full project:

# Data pipeline
Build a pipeline that extracts orders from Shopify and loads them to BigQuery

# ML pipeline
Build a pipeline that trains a customer churn prediction model

# AI pipeline
Build a pipeline that enriches product descriptions using Claude

Modern Python Pipelines

Every pipeline follows the same pattern: extract, transform, load. The difference is what happens in between. Data pipelines move records from A to B. ML pipelines add a training or prediction stage. AI pipelines add a generation stage where an LLM enriches the data.

What Data Engineers Actually Do
This playbook breaks down how data pipelines work and when to use Python vs SQL. We’ll walk through a real Meta Ads to BigQuery pipeline using the bronze/silver/gold model, plus share a production template you can use to build your own.

The tooling has changed dramatically in the past few years. Hatch replaces the old virtualenv and pip dance with a single command that handles everything. You run hatch run pipeline and it installs dependencies, creates the environment, and executes your code. No activation scripts, no requirements.txt management, no dependency conflicts.

Polars has emerged as the modern alternative to pandas. It processes millions of rows without slowing down, uses lazy evaluation to only compute what you need, and has a cleaner API that catches errors at write time instead of runtime. For most data work, it's simply faster and more predictable.

Pydantic handles configuration in a way that eliminates an entire class of bugs. You define your settings as a typed class, and it automatically loads from environment variables and validates everything. Swap from dev to prod by changing a single .env file. No more scattered config files or magic strings buried in code.

The quality gates matter too. Ruff handles linting and formatting in milliseconds instead of seconds. Mypy catches type errors before they hit production. Together they run on every commit, so problems surface immediately instead of in production at 2am.

The pattern stays the same whether you're processing CSVs, training models, or calling APIs. Only the stages change.

Running the Pipeline

The skill teaches Claude to generate this structure:

project/
├── src/pipeline/
│   ├── cli.py         # Entry point
│   ├── config.py      # All settings in one file
│   ├── extract.py     # Read from source
│   ├── transform.py   # Process data
│   └── load.py        # Write to destination
├── data/              # Test data
├── tests/
└── pyproject.toml

The magic is in pyproject.toml. This isn't just a folder of scripts. It's a proper installable package:

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "pipeline"
version = "0.1.0"
dependencies = ["click", "polars", "pydantic-settings"]

[project.scripts]
pipeline = "pipeline.cli:main"

[tool.hatch.build.targets.wheel]
packages = ["src/pipeline"]

[project.scripts] creates a real command. No python src/main.py or PYTHONPATH hacks. Just hatch run pipeline.

Run it with one command:

hatch run pipeline

# Output:
# ✓ Extracted 1,234 records from data/
# ✓ Transformed (cleaned, validated)
# ✓ Loaded to output/results.parquet
# Done in 2.3s

Hatch installs all dependencies automatically on first run. No manual setup needed.

What Claude Learns

The skill teaches Claude more than folder structure. It covers:

  • Config pattern: One Pydantic settings file, environment-driven dev/prod switching
  • The /data pattern: Start with files, swap to API later without changing transform logic
  • Quality gates: Ruff for linting, mypy for type checking, both on every commit
  • Testing structure: Unit, integration, acceptance, and evals folders with clear purposes
  • CLI pattern: Click for parsing, Pydantic for validation, subcommands for different operations

Each pipeline Claude builds follows these patterns. No more guessing which structure to use or reinventing config management. The skill provides the guardrails, Claude fills in the business logic.

Downloads

This post is for paying subscribers only

Sign up now and upgrade your account to read the post and get access to the full library of posts for paying subscribers only.

Sign up now Already have an account? Sign in