Files

T

thomas.kopp 3dd72faed3 Add autoresearch design document

Autonomous research loop for Claude Code: iteratively improves
Claude config and project code via tmux + ralph-loop + git worktrees.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-04 12:06:16 +02:00

4.1 KiB

Raw Blame History

autoresearch for Claude Code — Design Document

Date: 2026-04-04
Status: Approved

Overview

An autonomous research loop that runs inside Claude Code via tmux. It iteratively improves Claude Code configuration (skills, prompts, CLAUDE.md files) and project code by running experiments in isolated git worktrees, evaluating them against benchmarks, and keeping only improvements.

Inspired by karpathy/autoresearch.

Components & Files

~/.claude/
├── autoresearch.yaml           # Main config (projects, thresholds)
├── autoresearch/
│   ├── log.jsonl               # Experiment log (append-only)
│   └── experiments/            # Per-experiment metadata (JSON)
└── skills/
    └── autoresearch/
        ├── skill.md            # /autoresearch skill (start/stop/status)
        └── loop.md             # Internal loop skill (ralph-loop payload)

~/work/<project>/.worktrees/
└── exp-NNNN-<description>/     # Isolated experiment worktree

Claude Code Hook in settings.json:
PostSessionStart → checks token thresholds → starts tmux if conditions are met.

Configuration (`~/.claude/autoresearch.yaml`)

projects:
  - path: ~/.claude
    benchmarks: []
    time_limit_minutes: 5
  - path: ~/work/kundendoku
    benchmarks: ["go test ./...", "go build ./..."]
    time_limit_minutes: 10
  - path: ~/work/tueit-x
    benchmarks: ["go test ./..."]
    time_limit_minutes: 10

token_threshold:
  context_remaining_pct: 60   # start when >60% context remaining
  api_budget_usd: 5.00        # start when >$5 API budget remaining

Experiment Loop (per iteration)

Start
 │
 ├─ Token check (context% + API budget)
 │   └─ below threshold → sleep, next round
 │
 ├─ Stuck check (same files in last 3 iterations?)
 │   └─ yes → pause + tmux notification to user
 │
 ├─ Create worktree: exp-NNNN-<auto-description>
 │
 ├─ Claude analyzes + modifies (time_limit_minutes)
 │
 ├─ Run benchmarks (timeout = time_limit_minutes)
 │
 ├─ Compare metrics: before vs. after
 │   ├─ better → merge into main, delete worktree
 │   └─ worse/neutral → delete worktree
 │
 └─ Append log entry → log.jsonl

tmux Layout

Session: autoresearch
│
├── Window 0: [overview]
│   ├── Pane left (70%):   tail -f ~/.claude/autoresearch/log.jsonl (formatted)
│   └── Pane right (30%):  metrics summary (experiments today, success rate, budget used)
│
├── Window 1: [claude-config]
│   └── Full: claude (ralph-loop) — optimizes ~/.claude/
│
├── Window 2: [kundendoku]
│   └── Full: claude (ralph-loop) — optimizes ~/work/kundendoku/
│
└── Window N: [<project>]
    └── Full: claude (ralph-loop)

User attaches with tmux attach -t autoresearch, navigates with Ctrl+b <number>.

Token Awareness

Context token check:
Hook at session start estimates remaining context based on current session length. A new session = full context available.

API budget check:
Call to Anthropic Usage API (/v1/usage) using the configured API key. Returns consumed credits → delta against configured limit.

Automatic start:
Both conditions met → tmux new-session -d -s autoresearch + build windows. An already-running session is never started twice.

Stuck Detection

If the same set of files appears in the last 3 consecutive experiment logs, the loop pauses and sends a visible tmux message asking the user how to proceed:

Increase time limit
Narrow scope (focus on a subdirectory)
Skip this project for now
Continue anyway

Rollout Strategy

Start with ~/.claude optimization only (small scope, fast iterations)
Verify stuck detection and token awareness work correctly
Add first project (e.g. kundendoku)
Expand to remaining projects

Non-Goals

No distributed/multi-machine support
No web UI (tmux is the interface)
No automatic push to remote (merges stay local until manually reviewed)

4.1 KiB Raw Blame History