Files
autoresearch/docs/plans/2026-04-04-autoresearch-implementation.md
T
thomas.kopp cef8e38b22 Add autoresearch implementation plan
11 tasks covering config, budget check, log utilities, stuck detection,
metrics display, skills, tmux orchestrator, and PostSessionStart hook.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 13:59:53 +02:00

30 KiB

autoresearch Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Autonomous research loop that iteratively improves Claude Code config and project code, running inside tmux with one claude session (ralph-loop) per project.

Architecture: A bash orchestrator (bin/start.sh) creates a tmux session with one window per project. Each window starts an interactive claude session and invokes /autoresearch-loop which uses ralph-loop to repeat experiments indefinitely. Each iteration: create worktree → make changes → run benchmarks → compare metric → merge or discard → log. A PostSessionStart hook auto-starts the tmux session when token/budget thresholds are met.

Tech Stack: Bash (orchestration, worktree management), Python 3 + rich (log viewer, metrics display), ralph-loop plugin (iteration driver), claude CLI (interactive sessions), tmux 3.x, jq (JSON processing)


Task 1: Project scaffolding

Files:

  • Create: ~/work/autoresearch/bin/ (directory)
  • Create: ~/work/autoresearch/tests/ (directory)
  • Create: ~/work/autoresearch/CLAUDE.md
  • Create: ~/work/autoresearch/.gitignore

Step 1: Create directories

mkdir -p ~/work/autoresearch/bin
mkdir -p ~/work/autoresearch/tests

Step 2: Write CLAUDE.md

# autoresearch

Autonomous research loop for Claude Code. Runs inside tmux, one window per project.

## Key files
- `bin/start.sh` — tmux session orchestrator (start/stop/status)
- `bin/check-budget.py` — Anthropic API budget checker
- `bin/log-view.py` — formatted log viewer (overview pane, left)
- `bin/metrics.py` — metrics summary (overview pane, right)
- `~/.claude/autoresearch.yaml` — main config
- `~/.claude/skills/autoresearch/skill.md` — /autoresearch skill
- `~/.claude/skills/autoresearch-loop/skill.md` — /autoresearch-loop (ralph-loop payload)

## Run
```bash
bin/start.sh          # start tmux session
bin/start.sh stop     # stop all windows
bin/start.sh status   # show running experiments

Logs

  • ~/.claude/autoresearch/log.jsonl — append-only experiment log
  • Each line: { "ts": "ISO8601", "project": "...", "exp": "NNNN", "metric_before": 0.0, "metric_after": 0.0, "kept": true, "files_changed": ["..."] }

**Step 3: Write .gitignore**

pycache/ *.pyc .worktrees/


**Step 4: Commit**

```bash
cd ~/work/autoresearch
git add CLAUDE.md .gitignore bin/ tests/
git commit -m "feat: project scaffolding"

Task 2: Configuration file + schema

Files:

  • Create: ~/.claude/autoresearch.yaml
  • Create: ~/work/autoresearch/bin/config.py
  • Create: ~/work/autoresearch/tests/test_config.py

Step 1: Write failing test

# tests/test_config.py
import sys
sys.path.insert(0, 'bin')
from config import load_config, ConfigError
import pytest, tempfile, os, yaml

def test_load_valid_config():
    data = {
        'projects': [{'path': '/tmp', 'benchmarks': ['echo ok'], 'time_limit_minutes': 5}],
        'token_threshold': {'context_remaining_pct': 60, 'api_budget_usd': 5.0}
    }
    with tempfile.NamedTemporaryFile('w', suffix='.yaml', delete=False) as f:
        yaml.dump(data, f)
        name = f.name
    cfg = load_config(name)
    assert cfg['projects'][0]['time_limit_minutes'] == 5
    os.unlink(name)

def test_missing_projects_raises():
    with tempfile.NamedTemporaryFile('w', suffix='.yaml', delete=False) as f:
        yaml.dump({}, f)
        name = f.name
    with pytest.raises(ConfigError):
        load_config(name)
    os.unlink(name)

def test_expands_tilde_in_path():
    data = {
        'projects': [{'path': '~/.claude', 'benchmarks': [], 'time_limit_minutes': 5}],
        'token_threshold': {'context_remaining_pct': 60, 'api_budget_usd': 5.0}
    }
    with tempfile.NamedTemporaryFile('w', suffix='.yaml', delete=False) as f:
        yaml.dump(data, f)
        name = f.name
    cfg = load_config(name)
    assert '~' not in cfg['projects'][0]['path']
    os.unlink(name)

Step 2: Run test to verify it fails

cd ~/work/autoresearch && python3 -m pytest tests/test_config.py -v

Expected: FAIL — ModuleNotFoundError: No module named 'config'

Step 3: Write implementation

# bin/config.py
import os
import yaml

CONFIG_PATH = os.path.expanduser('~/.claude/autoresearch.yaml')
DEFAULTS = {'time_limit_minutes': 10, 'benchmarks': []}

class ConfigError(Exception):
    pass

def load_config(path=CONFIG_PATH):
    with open(path) as f:
        cfg = yaml.safe_load(f)
    if not cfg or 'projects' not in cfg or not cfg['projects']:
        raise ConfigError(f"'projects' list is required in {path}")
    for p in cfg['projects']:
        p.setdefault('benchmarks', DEFAULTS['benchmarks'])
        p.setdefault('time_limit_minutes', DEFAULTS['time_limit_minutes'])
        p['path'] = os.path.expanduser(p['path'])
    cfg.setdefault('token_threshold', {'context_remaining_pct': 60, 'api_budget_usd': 5.0})
    return cfg

Step 4: Write default config

# ~/.claude/autoresearch.yaml
projects:
  - path: ~/.claude
    benchmarks: []
    time_limit_minutes: 5

  - path: ~/work/kundendoku
    benchmarks:
      - "go test ./..."
      - "go build ./..."
    time_limit_minutes: 10

token_threshold:
  context_remaining_pct: 60
  api_budget_usd: 5.00

Step 5: Run tests to verify they pass

python3 -m pytest tests/test_config.py -v

Expected: 3 PASSED

Step 6: Commit

git add bin/config.py tests/test_config.py
git commit -m "feat: config loader with validation"

Task 3: API budget checker

Files:

  • Create: ~/work/autoresearch/bin/check-budget.py
  • Create: ~/work/autoresearch/tests/test_check_budget.py

Step 1: Write failing tests

# tests/test_check_budget.py
import sys
sys.path.insert(0, 'bin')
from unittest.mock import patch, MagicMock
import pytest

# Import after path setup
import importlib
check_budget = importlib.import_module('check-budget')

def test_budget_ok_when_used_less_than_limit():
    with patch('check-budget.get_used_usd', return_value=2.0):
        assert check_budget.is_budget_ok(limit_usd=5.0) is True

def test_budget_not_ok_when_used_exceeds_limit():
    with patch('check-budget.get_used_usd', return_value=5.5):
        assert check_budget.is_budget_ok(limit_usd=5.0) is False

def test_budget_ok_on_api_error():
    # Fail open: if we can't check, assume OK
    with patch('check-budget.get_used_usd', side_effect=Exception("network error")):
        assert check_budget.is_budget_ok(limit_usd=5.0) is True

Note: Python files with hyphens need special import handling. Rename to check_budget.py for easier imports.

Step 2: Run tests to verify they fail

python3 -m pytest tests/test_check_budget.py -v

Expected: FAIL — import error

Step 3: Write implementation

# bin/check_budget.py
"""Check Anthropic API budget usage. Fails open (returns True) on errors."""
import os
import sys
import urllib.request
import json

USAGE_API = "https://api.anthropic.com/v1/usage"

def get_used_usd() -> float:
    api_key = os.environ.get('ANTHROPIC_API_KEY', '')
    if not api_key:
        raise EnvironmentError("ANTHROPIC_API_KEY not set")
    req = urllib.request.Request(
        USAGE_API,
        headers={"x-api-key": api_key, "anthropic-version": "2023-06-01"}
    )
    with urllib.request.urlopen(req, timeout=5) as resp:
        data = json.loads(resp.read())
    # Sum all usage costs
    return sum(item.get('cost_usd', 0.0) for item in data.get('data', []))

def is_budget_ok(limit_usd: float) -> bool:
    try:
        used = get_used_usd()
        return used < limit_usd
    except Exception:
        return True  # fail open

if __name__ == '__main__':
    limit = float(sys.argv[1]) if len(sys.argv) > 1 else 5.0
    ok = is_budget_ok(limit)
    print("ok" if ok else "low")
    sys.exit(0 if ok else 1)

Step 4: Fix test imports (use check_budget not check-budget)

Update tests/test_check_budget.py:

import sys
sys.path.insert(0, 'bin')
from unittest.mock import patch
from check_budget import is_budget_ok, get_used_usd

def test_budget_ok_when_used_less_than_limit():
    with patch('check_budget.get_used_usd', return_value=2.0):
        assert is_budget_ok(limit_usd=5.0) is True

def test_budget_not_ok_when_used_exceeds_limit():
    with patch('check_budget.get_used_usd', return_value=5.5):
        assert is_budget_ok(limit_usd=5.0) is False

def test_budget_ok_on_api_error():
    with patch('check_budget.get_used_usd', side_effect=Exception("network")):
        assert is_budget_ok(limit_usd=5.0) is True

Step 5: Run tests

python3 -m pytest tests/test_check_budget.py -v

Expected: 3 PASSED

Step 6: Commit

git add bin/check_budget.py tests/test_check_budget.py
git commit -m "feat: Anthropic API budget checker, fails open on error"

Task 4: Experiment log utilities

Files:

  • Create: ~/work/autoresearch/bin/log_append.py
  • Create: ~/work/autoresearch/bin/log_view.py
  • Create: ~/work/autoresearch/tests/test_log.py
  • Create dir: ~/.claude/autoresearch/

Log entry schema (one JSON per line in ~/.claude/autoresearch/log.jsonl):

{
  "ts": "2026-04-04T22:00:00Z",
  "project": "kundendoku",
  "exp": "0001",
  "description": "Optimize SQL query in handler.go",
  "metric_before": 1.234,
  "metric_after": 1.187,
  "kept": true,
  "files_changed": ["internal/handler.go"],
  "stuck": false,
  "error": null
}

Step 1: Write failing tests

# tests/test_log.py
import sys, os, json, tempfile, pytest
sys.path.insert(0, 'bin')
from log_append import append_entry, LOG_PATH

def test_append_creates_file(tmp_path):
    log = str(tmp_path / "log.jsonl")
    append_entry({"ts": "2026-01-01T00:00:00Z", "project": "test"}, path=log)
    assert os.path.exists(log)

def test_append_writes_valid_json(tmp_path):
    log = str(tmp_path / "log.jsonl")
    append_entry({"project": "x", "kept": True}, path=log)
    with open(log) as f:
        data = json.loads(f.read().strip())
    assert data["project"] == "x"

def test_append_multiple_entries(tmp_path):
    log = str(tmp_path / "log.jsonl")
    for i in range(3):
        append_entry({"n": i}, path=log)
    with open(log) as f:
        lines = [l for l in f if l.strip()]
    assert len(lines) == 3

Step 2: Run tests to verify they fail

python3 -m pytest tests/test_log.py -v

Step 3: Write log_append.py

# bin/log_append.py
import json, os
from datetime import datetime, timezone

LOG_PATH = os.path.expanduser('~/.claude/autoresearch/log.jsonl')

def append_entry(entry: dict, path: str = LOG_PATH) -> None:
    entry.setdefault('ts', datetime.now(timezone.utc).isoformat())
    os.makedirs(os.path.dirname(path), exist_ok=True)
    with open(path, 'a') as f:
        f.write(json.dumps(entry) + '\n')

Step 4: Write log_view.py

# bin/log_view.py
"""Live log viewer for overview pane. Run: python3 bin/log_view.py --follow"""
import json, sys, time, os
from rich.console import Console
from rich.table import Table
from rich import box

LOG_PATH = os.path.expanduser('~/.claude/autoresearch/log.jsonl')
console = Console()

def read_entries(path=LOG_PATH):
    if not os.path.exists(path):
        return []
    with open(path) as f:
        entries = []
        for line in f:
            line = line.strip()
            if line:
                try:
                    entries.append(json.loads(line))
                except json.JSONDecodeError:
                    pass
    return entries

def render_table(entries):
    table = Table(box=box.SIMPLE, show_header=True, header_style="bold #DA251C")
    table.add_column("Time", style="dim", width=8)
    table.add_column("Project", width=14)
    table.add_column("Exp", width=6)
    table.add_column("Description", width=40)
    table.add_column("Δ Metric", width=10, justify="right")
    table.add_column("", width=3)
    for e in entries[-30:]:
        ts = e.get('ts', '')[-8:][:5] if e.get('ts') else '?'
        before = e.get('metric_before')
        after = e.get('metric_after')
        if before and after:
            delta = f"{after - before:+.3f}"
            delta_style = "green" if after < before else "red"
        else:
            delta = "n/a"
            delta_style = "dim"
        kept = "✓" if e.get('kept') else "✗"
        kept_style = "green" if e.get('kept') else "red"
        table.add_row(
            ts,
            e.get('project', '?')[:14],
            str(e.get('exp', '?')),
            (e.get('description') or '')[:40],
            f"[{delta_style}]{delta}[/{delta_style}]",
            f"[{kept_style}]{kept}[/{kept_style}]",
        )
    return table

if __name__ == '__main__':
    follow = '--follow' in sys.argv
    while True:
        entries = read_entries()
        console.clear()
        console.print(f"[bold #DA251C]autoresearch[/bold #DA251C] — {len(entries)} experiments", justify="center")
        console.print(render_table(entries))
        if not follow:
            break
        time.sleep(5)

Step 5: Run tests

python3 -m pytest tests/test_log.py -v

Expected: 3 PASSED

Step 6: Commit

git add bin/log_append.py bin/log_view.py tests/test_log.py
git commit -m "feat: experiment log append and rich viewer"

Task 5: Stuck detection

Files:

  • Create: ~/work/autoresearch/bin/stuck_check.py
  • Create: ~/work/autoresearch/tests/test_stuck.py

Step 1: Write failing tests

# tests/test_stuck.py
import sys
sys.path.insert(0, 'bin')
from stuck_check import is_stuck

def test_not_stuck_with_different_files():
    entries = [
        {'project': 'x', 'files_changed': ['a.go']},
        {'project': 'x', 'files_changed': ['b.go']},
        {'project': 'x', 'files_changed': ['c.go']},
    ]
    assert is_stuck('x', entries) is False

def test_stuck_when_same_files_3_times():
    entries = [
        {'project': 'x', 'files_changed': ['a.go', 'b.go']},
        {'project': 'x', 'files_changed': ['a.go', 'b.go']},
        {'project': 'x', 'files_changed': ['a.go', 'b.go']},
    ]
    assert is_stuck('x', entries) is True

def test_not_stuck_with_fewer_than_3_entries():
    entries = [
        {'project': 'x', 'files_changed': ['a.go']},
        {'project': 'x', 'files_changed': ['a.go']},
    ]
    assert is_stuck('x', entries) is False

def test_only_checks_matching_project():
    entries = [
        {'project': 'y', 'files_changed': ['a.go']},
        {'project': 'y', 'files_changed': ['a.go']},
        {'project': 'y', 'files_changed': ['a.go']},
        {'project': 'x', 'files_changed': ['a.go']},
        {'project': 'x', 'files_changed': ['a.go']},
    ]
    assert is_stuck('x', entries) is False  # only 2 entries for x

Step 2: Run tests to verify they fail

python3 -m pytest tests/test_stuck.py -v

Step 3: Write implementation

# bin/stuck_check.py
"""Detect if experiments are cycling over the same files."""
import json, os, sys
from log_append import LOG_PATH, read_entries  # reuse log reader

def is_stuck(project: str, entries: list, window: int = 3) -> bool:
    project_entries = [e for e in entries if e.get('project') == project]
    if len(project_entries) < window:
        return False
    recent = project_entries[-window:]
    file_sets = [frozenset(e.get('files_changed', [])) for e in recent]
    return len(set(file_sets)) == 1 and file_sets[0] != frozenset()

if __name__ == '__main__':
    project = sys.argv[1] if len(sys.argv) > 1 else ''
    from log_append import read_entries as _read
    entries = _read()
    if is_stuck(project, entries):
        print("stuck")
        sys.exit(1)
    print("ok")
    sys.exit(0)

Note: stuck_check.py imports read_entries from log_append. Update log_append.py to export read_entries (move the function there from log_view.py).

Step 4: Move read_entries to log_append.py

Add to bin/log_append.py:

def read_entries(path: str = LOG_PATH) -> list:
    if not os.path.exists(path):
        return []
    entries = []
    with open(path) as f:
        for line in f:
            line = line.strip()
            if line:
                try:
                    entries.append(json.loads(line))
                except json.JSONDecodeError:
                    pass
    return entries

Update bin/log_view.py to import read_entries from log_append instead of defining it locally.

Step 5: Run all tests

python3 -m pytest tests/ -v

Expected: all PASSED

Step 6: Commit

git add bin/stuck_check.py bin/log_append.py bin/log_view.py tests/test_stuck.py
git commit -m "feat: stuck detection — alerts when same files repeat 3 times"

Task 6: Metrics summary script (overview pane, right)

Files:

  • Create: ~/work/autoresearch/bin/metrics.py

No tests needed — this is pure display logic.

Step 1: Write metrics.py

# bin/metrics.py
"""Compact metrics panel for overview pane right side. Run: watch -n5 python3 bin/metrics.py"""
import json, os, sys
from datetime import datetime, timezone
from rich.console import Console
from rich.panel import Panel
from rich.text import Text
sys.path.insert(0, os.path.dirname(__file__))
from log_append import read_entries

console = Console(width=40)

def summarize(entries):
    today = datetime.now(timezone.utc).date().isoformat()
    today_entries = [e for e in entries if e.get('ts', '').startswith(today)]
    total = len(today_entries)
    kept = sum(1 for e in today_entries if e.get('kept'))
    projects = {}
    for e in today_entries:
        p = e.get('project', 'unknown')
        projects.setdefault(p, {'total': 0, 'kept': 0})
        projects[p]['total'] += 1
        if e.get('kept'):
            projects[p]['kept'] += 1
    return total, kept, projects

def render():
    entries = read_entries()
    total, kept, projects = summarize(entries)
    rate = f"{kept/total*100:.0f}%" if total else "—"
    text = Text()
    text.append(f"Today: {total} experiments\n", style="bold")
    text.append(f"Kept:  {kept} ({rate})\n\n", style="green")
    for p, s in projects.items():
        r = f"{s['kept']}/{s['total']}"
        text.append(f"  {p[:16]:<16} {r}\n")
    console.print(Panel(text, title="[bold #DA251C]Metrics[/bold #DA251C]", border_style="#FFD802"))

if __name__ == '__main__':
    render()

Step 2: Test manually

python3 bin/metrics.py

Expected: renders a panel (empty if no log yet)

Step 3: Commit

git add bin/metrics.py
git commit -m "feat: metrics summary panel for overview pane"

Task 7: /autoresearch-loop skill (ralph-loop payload)

Files:

  • Create: ~/.claude/skills/autoresearch-loop/skill.md

This is the prompt that ralph-loop repeats on every iteration. It runs in the project directory.

Step 1: Create skill

mkdir -p ~/.claude/skills/autoresearch-loop

Write ~/.claude/skills/autoresearch-loop/skill.md:

---
description: "One autoresearch experiment iteration (called by ralph-loop)"
hide-from-slash-command-tool: "true"
---

# autoresearch — one experiment iteration

You are the autoresearch agent for this project. Execute exactly one experiment:

## 1. Pre-flight checks

Run these checks first. If either fails, output `<promise>STOP</promise>` immediately.

```bash
# Budget check
python3 ~/work/autoresearch/bin/check_budget.py $(python3 -c "
import yaml; c=yaml.safe_load(open(os.path.expanduser('~/.claude/autoresearch.yaml'))); print(c['token_threshold']['api_budget_usd'])
" 2>/dev/null || echo 5.0)
```

```bash
# Stuck check
python3 ~/work/autoresearch/bin/stuck_check.py "$(basename $PWD)"
```

If stuck: print a clear message explaining which files keep appearing, then output `<promise>STOP</promise>`.

## 2. Plan your experiment

- Read recent log entries for this project to understand what has been tried
- Identify ONE concrete improvement to make (code quality, config, performance, prompt clarity)
- Be specific: name the file and the change

## 3. Create worktree

```bash
EXP=$(printf "%04d" $(ls .worktrees/ 2>/dev/null | wc -l))
BRANCH="exp-${EXP}-$(echo '<short-description>' | tr ' ' '-' | tr '[:upper:]' '[:lower:]')"
git worktree add .worktrees/${BRANCH} -b ${BRANCH}
cd .worktrees/${BRANCH}
```

## 4. Make changes

Work in the worktree. Make exactly the planned change. Keep it focused.

## 5. Run benchmarks (with timeout)

```bash
# Read benchmarks from config
python3 -c "
import yaml, os
cfg = yaml.safe_load(open(os.path.expanduser('~/.claude/autoresearch.yaml')))
project_path = os.getcwd()
for p in cfg['projects']:
    if os.path.expanduser(p['path']) in project_path:
        for b in p['benchmarks']:
            print(b)
        break
"
```

Run each benchmark with `timeout <time_limit_seconds>`. Capture output and exit codes.

## 6. Evaluate

- If benchmarks fail: discard (worktree is broken)
- If benchmarks pass: compare to baseline on main branch
- For projects with no benchmarks: use your judgment (is this objectively better?)

## 7. Merge or discard

```bash
# If keeping:
cd ~/work/<project>
git merge .worktrees/${BRANCH} --no-ff -m "exp: ${BRANCH}"
git worktree remove .worktrees/${BRANCH}

# If discarding:
git worktree remove .worktrees/${BRANCH} --force
git branch -D ${BRANCH}
```

## 8. Log the result

```python
python3 ~/work/autoresearch/bin/log_append.py
```

Or call directly:

```python
import sys; sys.path.insert(0, '/home/templis/work/autoresearch/bin')
from log_append import append_entry
append_entry({
    "project": "<project-name>",
    "exp": "<NNNN>",
    "description": "<what was changed>",
    "metric_before": <float or null>,
    "metric_after": <float or null>,
    "kept": <true/false>,
    "files_changed": ["<file1>", "<file2>"],
    "stuck": False,
    "error": None
})
```

## 9. Continue

Do NOT output `<promise>STOP</promise>` unless a pre-flight check failed.
The loop will automatically feed this prompt back for the next iteration.

Step 2: Verify skill file exists

ls ~/.claude/skills/autoresearch-loop/skill.md

Step 3: Commit

cd ~/work/autoresearch
git add -A  # skill is in ~/.claude, handled separately
git commit -m "feat: autoresearch-loop skill (ralph-loop payload)"

(Note: ~/.claude skills are managed by the claude-config repo separately)


Task 8: /autoresearch skill (start/stop/status interface)

Files:

  • Create: ~/.claude/skills/autoresearch/skill.md

Step 1: Write skill

mkdir -p ~/.claude/skills/autoresearch

Write ~/.claude/skills/autoresearch/skill.md:

---
description: "Start, stop, or show status of the autoresearch tmux session"
argument-hint: "[start|stop|status]"
---

# autoresearch

Manage the autoresearch tmux session.

## start (default)

Run:
```bash
~/work/autoresearch/bin/start.sh start
```

Report: which tmux windows were created, which projects are running.

## stop

Run:
```bash
~/work/autoresearch/bin/start.sh stop
```

## status

Run:
```bash
~/work/autoresearch/bin/start.sh status
```

Show: which windows are running, last log entry per project.

Step 2: Verify

ls ~/.claude/skills/autoresearch/skill.md

Task 9: tmux orchestrator (bin/start.sh)

Files:

  • Create: ~/work/autoresearch/bin/start.sh

Step 1: Write start.sh

#!/usr/bin/env bash
# autoresearch tmux orchestrator
# Usage: start.sh [start|stop|status]
set -euo pipefail

SESSION="autoresearch"
CONFIG="${HOME}/.claude/autoresearch.yaml"
LOGDIR="${HOME}/.claude/autoresearch"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# Read projects from config
get_projects() {
  python3 -c "
import yaml, os, sys
cfg = yaml.safe_load(open('${CONFIG}'))
for p in cfg['projects']:
    print(os.path.expanduser(p['path']))
"
}

cmd_start() {
  if tmux has-session -t "$SESSION" 2>/dev/null; then
    echo "autoresearch: session '$SESSION' already running"
    echo "  attach: tmux attach -t $SESSION"
    exit 0
  fi

  mkdir -p "$LOGDIR"

  # Window 0: overview (log viewer left + metrics right)
  tmux new-session -d -s "$SESSION" -n "overview" -x 220 -y 50
  tmux split-window -h -t "${SESSION}:overview" -p 30
  tmux send-keys -t "${SESSION}:overview.0" \
    "watch -n5 python3 ${SCRIPT_DIR}/log_view.py" Enter
  tmux send-keys -t "${SESSION}:overview.1" \
    "watch -n10 python3 ${SCRIPT_DIR}/metrics.py" Enter

  # One window per project
  WIN=1
  while IFS= read -r project_path; do
    project_name=$(basename "$project_path")
    tmux new-window -t "${SESSION}:${WIN}" -n "$project_name"
    tmux send-keys -t "${SESSION}:${WIN}" \
      "cd '${project_path}' && claude --name 'autoresearch-${project_name}'" Enter
    # Wait briefly then send the ralph-loop invocation
    sleep 2
    tmux send-keys -t "${SESSION}:${WIN}" \
      "/ralph-loop Run one autoresearch experiment iteration. --completion-promise STOP" Enter
    WIN=$((WIN + 1))
  done < <(get_projects)

  tmux select-window -t "${SESSION}:0"
  echo "autoresearch: started session '$SESSION' with $((WIN-1)) project windows"
  echo "  attach: tmux attach -t $SESSION"
}

cmd_stop() {
  if ! tmux has-session -t "$SESSION" 2>/dev/null; then
    echo "autoresearch: session '$SESSION' not running"
    exit 0
  fi
  tmux kill-session -t "$SESSION"
  echo "autoresearch: session stopped"
}

cmd_status() {
  if ! tmux has-session -t "$SESSION" 2>/dev/null; then
    echo "autoresearch: not running"
    exit 1
  fi
  echo "autoresearch: running"
  tmux list-windows -t "$SESSION"
  echo ""
  echo "Last entries per project:"
  python3 -c "
import json, os
from collections import defaultdict
log = os.path.expanduser('~/.claude/autoresearch/log.jsonl')
if not os.path.exists(log):
    print('  (no experiments yet)')
    exit()
latest = {}
with open(log) as f:
    for line in f:
        try:
            e = json.loads(line)
            latest[e.get('project')] = e
        except: pass
for proj, e in latest.items():
    kept = '✓' if e.get('kept') else '✗'
    print(f'  {proj}: exp {e.get(\"exp\",\"?\")} {kept} — {e.get(\"description\",\"\")[:50]}')
"
}

case "${1:-start}" in
  start)  cmd_start ;;
  stop)   cmd_stop ;;
  status) cmd_status ;;
  *)      echo "Usage: $0 [start|stop|status]"; exit 1 ;;
esac

Step 2: Make executable

chmod +x ~/work/autoresearch/bin/start.sh

Step 3: Dry-run test (no actual tmux)

# Verify syntax
bash -n ~/work/autoresearch/bin/start.sh && echo "syntax OK"
# Verify status when not running
~/work/autoresearch/bin/start.sh status || echo "expected: not running"

Step 4: Commit

cd ~/work/autoresearch
git add bin/start.sh
git commit -m "feat: tmux orchestrator — start/stop/status autoresearch session"

Task 10: PostSessionStart hook

Files:

  • Modify: ~/.claude/settings.json
  • Create: ~/work/autoresearch/bin/session-start-hook.sh

Step 1: Write hook script

#!/usr/bin/env bash
# PostSessionStart hook: auto-start autoresearch when tokens + budget available
set -euo pipefail

CONFIG="${HOME}/.claude/autoresearch.yaml"
SESSION="autoresearch"

# Only run if config exists
[[ -f "$CONFIG" ]] || exit 0

# Skip if autoresearch session already running
tmux has-session -t "$SESSION" 2>/dev/null && exit 0

# Check API budget
BUDGET_LIMIT=$(python3 -c "
import yaml; c=yaml.safe_load(open('${CONFIG}'))
print(c.get('token_threshold', {}).get('api_budget_usd', 5.0))
" 2>/dev/null || echo "5.0")

python3 "${HOME}/work/autoresearch/bin/check_budget.py" "$BUDGET_LIMIT" || exit 0

# Budget OK → start autoresearch in background
nohup "${HOME}/work/autoresearch/bin/start.sh" start \
  >> "${HOME}/.claude/autoresearch/hook.log" 2>&1 &

exit 0
chmod +x ~/work/autoresearch/bin/session-start-hook.sh

Step 2: Add hook to settings.json

Read ~/.claude/settings.json first, then add the PostSessionStart hook.

New hooks section (merge with existing):

"PostSessionStart": [
  {
    "hooks": [
      {
        "type": "command",
        "command": "~/work/autoresearch/bin/session-start-hook.sh",
        "async": true
      }
    ]
  }
]

Step 3: Verify settings.json is valid JSON

python3 -m json.tool ~/.claude/settings.json > /dev/null && echo "valid JSON"

Step 4: Test hook manually

~/work/autoresearch/bin/session-start-hook.sh && echo "hook OK"

Step 5: Commit

cd ~/work/autoresearch
git add bin/session-start-hook.sh
git commit -m "feat: PostSessionStart hook — auto-start when budget available"

Task 11: Integration dry-run

Goal: Start the tmux session, verify all windows open correctly, verify log viewer and metrics panel render.

Step 1: Start session

~/work/autoresearch/bin/start.sh start

Expected output: autoresearch: started session 'autoresearch' with N project windows

Step 2: Verify tmux windows

tmux list-windows -t autoresearch

Expected: 0: overview, 1: claude-config (or first project name), etc.

Step 3: Check overview pane renders

tmux attach -t autoresearch
# Navigate to window 0, verify log viewer and metrics panel are visible
# Ctrl+b d to detach

Step 4: Verify claude session started in project window

Attach and switch to window 1, verify claude prompt is visible.

Step 5: Stop session

~/work/autoresearch/bin/start.sh stop

Step 6: Add project to CLAUDE.md and workspace CLAUDE.md

Update /home/templis/work/CLAUDE.md projects table to include autoresearch/.

Step 7: Final commit + push to Gitea

cd ~/work/autoresearch
git add -A
git commit -m "feat: integration verified — autoresearch MVP complete"
git remote add origin git@git.tueit.de:tueit_GmbH/autoresearch.git
git push -u origin main

Summary of created files

File Purpose
~/work/autoresearch/bin/config.py Load + validate autoresearch.yaml
~/work/autoresearch/bin/check_budget.py Anthropic API budget check
~/work/autoresearch/bin/log_append.py Append experiment log entries
~/work/autoresearch/bin/log_view.py Rich log viewer (overview pane left)
~/work/autoresearch/bin/metrics.py Metrics summary (overview pane right)
~/work/autoresearch/bin/stuck_check.py Stuck file detection
~/work/autoresearch/bin/start.sh tmux session orchestrator
~/work/autoresearch/bin/session-start-hook.sh PostSessionStart hook
~/.claude/autoresearch.yaml Main config (projects + thresholds)
~/.claude/skills/autoresearch/skill.md /autoresearch user skill
~/.claude/skills/autoresearch-loop/skill.md /autoresearch-loop ralph-loop payload
~/.claude/settings.json +PostSessionStart hook