Drift detection for Claude Code, packaged as two installable Agent Skills. Reads the JSONL session logs Claude Code already writes to ~/.claude/projects/ , detects whether the model has been drifting on your own work, and produces a shareable forensic report.
No network, no account, no telemetry, no background daemon. Runs on the data already on your disk.
Status: 0.x / pre-alpha — output format and metric set may change.
What you get
Skill Invocation Output cc-canary /cc-canary [window] forensic markdown writeup ( ./cc-canary-<date>.md ) — paste-ready for GitHub issues or gists cc-canary-html /cc-canary-html [window] same report as a dark-theme HTML dashboard ( ./cc-canary-<date>.html ), auto-opens in your browser
Window defaults to 60d . Accepts 7d / 14d / 30d / 60d / 90d / 180d .
Each report includes:
Verdict — HOLDING / SUSPECTED REGRESSION / CONFIRMED REGRESSION / INCONCLUSIVE
— HOLDING / SUSPECTED REGRESSION / CONFIRMED REGRESSION / INCONCLUSIVE Headline metrics table (pre vs post, with ????/????/???? band verdicts)
table (pre vs post, with ????/????/???? band verdicts) Weekly trend bars — cost (USD, verified against ccusage to the cent), read:edit ratio, reasoning loops, tokens/turn
— cost (USD, verified against ccusage to the cent), read:edit ratio, reasoning loops, tokens/turn Cross-version comparison — same user, different model versions, controlling for task mix
— same user, different model versions, controlling for task mix Auto-detected inflection date — composite health-score break
— composite health-score break Findings with model-side / user-side / ambiguous classification
with model-side / user-side / ambiguous classification Appendices — hour-of-day thinking depth, word-frequency shift, three-period thinking-visibility transition, per-turn behavior rates, and more
Install
npx skills add delta-hq/cc-canary
Install just one:
npx skills add delta-hq/cc-canary --skill cc-canary npx skills add delta-hq/cc-canary --skill cc-canary-html
Then from any Claude Code session:
/cc-canary 60d /cc-canary-html 30d
Requirements: python3 ≥ 3.8 on your PATH. macOS / Linux / WSL for the cc-canary-html auto-open step (it falls back to printing the path if open / xdg-open / start fails).
How it works
Scan. A bundled Python script (stdlib only — no pip, no Node) walks ~/.claude/projects/**/*.jsonl , filters by window and excludes subagent sessions by default. Dedupe. Assistant messages are deduped on (message.id, requestId) — same scheme ccusage uses, because Claude Code writes the same message into multiple JSONLs when sessions are resumed or branched. Aggregate. Per-session metrics: tool-mix, read:edit ratio, reasoning-loop phrases, self-admitted errors, premature stops, interrupts, token usage, cost (current Claude 4.x rates), hour-of-day thinking depth. Detect inflection. Composite health score per day; argmax of |before − after| over candidate dates with a 0.75σ floor. Falls back to median-timestamp split if no break clears. Pre-render the report. Script writes a markdown / HTML skeleton with every table and bar chart filled in. Only ~20 short narrative slots (marked <!-- C: ... --> ) are left for Claude to fill — verdict line, summary, per-finding reasoning, root-cause, appendix paragraphs. Fill & save. Claude reads the skeleton, writes the narrative, saves the final file.
Total runtime: ~2.5s for the script + 10–20s for Claude to fill narrative.
What each skill tracks
Metrics in the headline table (with published healthy/transition/concerning bands where applicable):
Read:Edit ratio — file reads per edit. Proxy for how thoroughly the model investigates before mutating.
— file reads per edit. Proxy for how thoroughly the model investigates before mutating. Write share of mutations — Write / (Edit + Write) . High share = model rewriting files instead of surgical edits.
— . High share = model rewriting files instead of surgical edits. Reasoning loops / 1K tool calls — phrases like "let me try again", "oh wait", "actually,".
— phrases like "let me try again", "oh wait", "actually,". Frustration rate — rate of frustration words in your prompts.
— rate of frustration words in your prompts. Thinking redaction rate — fraction of thinking blocks that are redacted vs visible.
— fraction of thinking blocks that are redacted vs visible. Mean thinking length — reasoning-depth proxy (via cryptographic signature length, r=0.97 with content length when visible).
— reasoning-depth proxy (via cryptographic signature length, r=0.97 with content length when visible). API turns per user turn — how many API calls the model makes per user message.
— how many API calls the model makes per user message. Tokens per user turn — total token volume (input + output + cache) per user message.
Plus appendices with additional signals: premature stopping, self-admitted errors, shortcut vocabulary, user interrupts, hour-of-day thinking depth, per-word frequency shift, three-period thinking-visibility transition, per-turn behavior rates.
Skill filters & flags
The script accepts flags you can pass via Bash(python3 scripts/compute_stats.py …) for custom runs:
Flag Default Purpose --window {Nd} 60d Window size ( 7d / 14d / 30d / 60d / 90d / 180d ) --include-agents off Include subagent sessions (default: excluded — they have no natural user prompts) --min-user-words N 10 Drop sessions with fewer user-prompt words than this (filters trivial sessions) --render-md PATH — Write the markdown skeleton to PATH --render-html PATH — Write the HTML dashboard to PATH
Privacy
Fully local. Zero network calls.
The script reads ~/.claude/projects/*.jsonl only. Nothing else.
only. Nothing else. Narrative prose is written by Claude during the skill invocation (inside your Claude Code session); it has access only to the on-disk files you explicitly point it at.
User-prompt content is truncated to ≤180 chars before being included in the skeleton, and redacted for /Users/… paths, emails, hex-like tokens.
paths, emails, hex-like tokens. Output files ( ./cc-canary-<date>.{md,html} ) live in the directory where you invoked the skill. Nothing is uploaded anywhere.
Contributing
Issues, metric suggestions, and PRs welcome: github.com/delta-hq/cc-canary/issues. Output format and metric set may change during 0.x.
About the name
Canaries were used in coal mines to detect early signs of danger. cc-canary does the same for drift in your Claude Code sessions.
License
MIT