Add autoresearch design document

Autonomous research loop for Claude Code: iteratively improves Claude config and project code via tmux + ralph-loop + git worktrees. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:06:16 +02:00
commit 3dd72faed3
1 changed files with 142 additions and 0 deletions
@@ -0,0 +1,142 @@
+# autoresearch for Claude Code — Design Document
+
+**Date:** 2026-04-04  
+**Status:** Approved
+
+## Overview
+
+An autonomous research loop that runs inside Claude Code via tmux. It iteratively improves Claude Code configuration (skills, prompts, CLAUDE.md files) and project code by running experiments in isolated git worktrees, evaluating them against benchmarks, and keeping only improvements.
+
+Inspired by [karpathy/autoresearch](https://github.com/karpathy/autoresearch).
+
+---
+
+## Components & Files
+
+```
+~/.claude/
+├── autoresearch.yaml           # Main config (projects, thresholds)
+├── autoresearch/
+│   ├── log.jsonl               # Experiment log (append-only)
+│   └── experiments/            # Per-experiment metadata (JSON)
+└── skills/
+    └── autoresearch/
+        ├── skill.md            # /autoresearch skill (start/stop/status)
+        └── loop.md             # Internal loop skill (ralph-loop payload)
+
+~/work/<project>/.worktrees/
+└── exp-NNNN-<description>/     # Isolated experiment worktree
+```
+
+**Claude Code Hook** in `settings.json`:  
+`PostSessionStart` → checks token thresholds → starts tmux if conditions are met.
+
+---
+
+## Configuration (`~/.claude/autoresearch.yaml`)
+
+```yaml
+projects:
+  - path: ~/.claude
+    benchmarks: []
+    time_limit_minutes: 5
+  - path: ~/work/kundendoku
+    benchmarks: ["go test ./...", "go build ./..."]
+    time_limit_minutes: 10
+  - path: ~/work/tueit-x
+    benchmarks: ["go test ./..."]
+    time_limit_minutes: 10
+
+token_threshold:
+  context_remaining_pct: 60   # start when >60% context remaining
+  api_budget_usd: 5.00        # start when >$5 API budget remaining
+```
+
+---
+
+## Experiment Loop (per iteration)
+
+```
+Start
+ │
+ ├─ Token check (context% + API budget)
+ │   └─ below threshold → sleep, next round
+ │
+ ├─ Stuck check (same files in last 3 iterations?)
+ │   └─ yes → pause + tmux notification to user
+ │
+ ├─ Create worktree: exp-NNNN-<auto-description>
+ │
+ ├─ Claude analyzes + modifies (time_limit_minutes)
+ │
+ ├─ Run benchmarks (timeout = time_limit_minutes)
+ │
+ ├─ Compare metrics: before vs. after
+ │   ├─ better → merge into main, delete worktree
+ │   └─ worse/neutral → delete worktree
+ │
+ └─ Append log entry → log.jsonl
+```
+
+---
+
+## tmux Layout
+
+```
+Session: autoresearch
+│
+├── Window 0: [overview]
+│   ├── Pane left (70%):   tail -f ~/.claude/autoresearch/log.jsonl (formatted)
+│   └── Pane right (30%):  metrics summary (experiments today, success rate, budget used)
+│
+├── Window 1: [claude-config]
+│   └── Full: claude (ralph-loop) — optimizes ~/.claude/
+│
+├── Window 2: [kundendoku]
+│   └── Full: claude (ralph-loop) — optimizes ~/work/kundendoku/
+│
+└── Window N: [<project>]
+    └── Full: claude (ralph-loop)
+```
+
+User attaches with `tmux attach -t autoresearch`, navigates with `Ctrl+b <number>`.
+
+---
+
+## Token Awareness
+
+**Context token check:**  
+Hook at session start estimates remaining context based on current session length. A new session = full context available.
+
+**API budget check:**  
+Call to Anthropic Usage API (`/v1/usage`) using the configured API key. Returns consumed credits → delta against configured limit.
+
+**Automatic start:**  
+Both conditions met → `tmux new-session -d -s autoresearch` + build windows. An already-running session is never started twice.
+
+---
+
+## Stuck Detection
+
+If the same set of files appears in the last 3 consecutive experiment logs, the loop pauses and sends a visible tmux message asking the user how to proceed:
+- Increase time limit
+- Narrow scope (focus on a subdirectory)
+- Skip this project for now
+- Continue anyway
+
+---
+
+## Rollout Strategy
+
+1. Start with `~/.claude` optimization only (small scope, fast iterations)
+2. Verify stuck detection and token awareness work correctly
+3. Add first project (e.g. `kundendoku`)
+4. Expand to remaining projects
+
+---
+
+## Non-Goals
+
+- No distributed/multi-machine support
+- No web UI (tmux is the interface)
+- No automatic push to remote (merges stay local until manually reviewed)