Autonomous research loop for Claude Code: iteratively improves Claude config and project code via tmux + ralph-loop + git worktrees. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.1 KiB
autoresearch for Claude Code — Design Document
Date: 2026-04-04
Status: Approved
Overview
An autonomous research loop that runs inside Claude Code via tmux. It iteratively improves Claude Code configuration (skills, prompts, CLAUDE.md files) and project code by running experiments in isolated git worktrees, evaluating them against benchmarks, and keeping only improvements.
Inspired by karpathy/autoresearch.
Components & Files
~/.claude/
├── autoresearch.yaml # Main config (projects, thresholds)
├── autoresearch/
│ ├── log.jsonl # Experiment log (append-only)
│ └── experiments/ # Per-experiment metadata (JSON)
└── skills/
└── autoresearch/
├── skill.md # /autoresearch skill (start/stop/status)
└── loop.md # Internal loop skill (ralph-loop payload)
~/work/<project>/.worktrees/
└── exp-NNNN-<description>/ # Isolated experiment worktree
Claude Code Hook in settings.json:
PostSessionStart → checks token thresholds → starts tmux if conditions are met.
Configuration (~/.claude/autoresearch.yaml)
projects:
- path: ~/.claude
benchmarks: []
time_limit_minutes: 5
- path: ~/work/kundendoku
benchmarks: ["go test ./...", "go build ./..."]
time_limit_minutes: 10
- path: ~/work/tueit-x
benchmarks: ["go test ./..."]
time_limit_minutes: 10
token_threshold:
context_remaining_pct: 60 # start when >60% context remaining
api_budget_usd: 5.00 # start when >$5 API budget remaining
Experiment Loop (per iteration)
Start
│
├─ Token check (context% + API budget)
│ └─ below threshold → sleep, next round
│
├─ Stuck check (same files in last 3 iterations?)
│ └─ yes → pause + tmux notification to user
│
├─ Create worktree: exp-NNNN-<auto-description>
│
├─ Claude analyzes + modifies (time_limit_minutes)
│
├─ Run benchmarks (timeout = time_limit_minutes)
│
├─ Compare metrics: before vs. after
│ ├─ better → merge into main, delete worktree
│ └─ worse/neutral → delete worktree
│
└─ Append log entry → log.jsonl
tmux Layout
Session: autoresearch
│
├── Window 0: [overview]
│ ├── Pane left (70%): tail -f ~/.claude/autoresearch/log.jsonl (formatted)
│ └── Pane right (30%): metrics summary (experiments today, success rate, budget used)
│
├── Window 1: [claude-config]
│ └── Full: claude (ralph-loop) — optimizes ~/.claude/
│
├── Window 2: [kundendoku]
│ └── Full: claude (ralph-loop) — optimizes ~/work/kundendoku/
│
└── Window N: [<project>]
└── Full: claude (ralph-loop)
User attaches with tmux attach -t autoresearch, navigates with Ctrl+b <number>.
Token Awareness
Context token check:
Hook at session start estimates remaining context based on current session length. A new session = full context available.
API budget check:
Call to Anthropic Usage API (/v1/usage) using the configured API key. Returns consumed credits → delta against configured limit.
Automatic start:
Both conditions met → tmux new-session -d -s autoresearch + build windows. An already-running session is never started twice.
Stuck Detection
If the same set of files appears in the last 3 consecutive experiment logs, the loop pauses and sends a visible tmux message asking the user how to proceed:
- Increase time limit
- Narrow scope (focus on a subdirectory)
- Skip this project for now
- Continue anyway
Rollout Strategy
- Start with
~/.claudeoptimization only (small scope, fast iterations) - Verify stuck detection and token awareness work correctly
- Add first project (e.g.
kundendoku) - Expand to remaining projects
Non-Goals
- No distributed/multi-machine support
- No web UI (tmux is the interface)
- No automatic push to remote (merges stay local until manually reviewed)