cef8e38b22
11 tasks covering config, budget check, log utilities, stuck detection, metrics display, skills, tmux orchestrator, and PostSessionStart hook. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1152 lines
30 KiB
Markdown
1152 lines
30 KiB
Markdown
# autoresearch Implementation Plan
|
|
|
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|
|
|
**Goal:** Autonomous research loop that iteratively improves Claude Code config and project code, running inside tmux with one claude session (ralph-loop) per project.
|
|
|
|
**Architecture:** A bash orchestrator (`bin/start.sh`) creates a tmux session with one window per project. Each window starts an interactive `claude` session and invokes `/autoresearch-loop` which uses ralph-loop to repeat experiments indefinitely. Each iteration: create worktree → make changes → run benchmarks → compare metric → merge or discard → log. A PostSessionStart hook auto-starts the tmux session when token/budget thresholds are met.
|
|
|
|
**Tech Stack:** Bash (orchestration, worktree management), Python 3 + rich (log viewer, metrics display), ralph-loop plugin (iteration driver), claude CLI (interactive sessions), tmux 3.x, jq (JSON processing)
|
|
|
|
---
|
|
|
|
## Task 1: Project scaffolding
|
|
|
|
**Files:**
|
|
- Create: `~/work/autoresearch/bin/` (directory)
|
|
- Create: `~/work/autoresearch/tests/` (directory)
|
|
- Create: `~/work/autoresearch/CLAUDE.md`
|
|
- Create: `~/work/autoresearch/.gitignore`
|
|
|
|
**Step 1: Create directories**
|
|
|
|
```bash
|
|
mkdir -p ~/work/autoresearch/bin
|
|
mkdir -p ~/work/autoresearch/tests
|
|
```
|
|
|
|
**Step 2: Write CLAUDE.md**
|
|
|
|
```markdown
|
|
# autoresearch
|
|
|
|
Autonomous research loop for Claude Code. Runs inside tmux, one window per project.
|
|
|
|
## Key files
|
|
- `bin/start.sh` — tmux session orchestrator (start/stop/status)
|
|
- `bin/check-budget.py` — Anthropic API budget checker
|
|
- `bin/log-view.py` — formatted log viewer (overview pane, left)
|
|
- `bin/metrics.py` — metrics summary (overview pane, right)
|
|
- `~/.claude/autoresearch.yaml` — main config
|
|
- `~/.claude/skills/autoresearch/skill.md` — /autoresearch skill
|
|
- `~/.claude/skills/autoresearch-loop/skill.md` — /autoresearch-loop (ralph-loop payload)
|
|
|
|
## Run
|
|
```bash
|
|
bin/start.sh # start tmux session
|
|
bin/start.sh stop # stop all windows
|
|
bin/start.sh status # show running experiments
|
|
```
|
|
|
|
## Logs
|
|
- `~/.claude/autoresearch/log.jsonl` — append-only experiment log
|
|
- Each line: `{ "ts": "ISO8601", "project": "...", "exp": "NNNN", "metric_before": 0.0, "metric_after": 0.0, "kept": true, "files_changed": ["..."] }`
|
|
```
|
|
|
|
**Step 3: Write .gitignore**
|
|
|
|
```
|
|
__pycache__/
|
|
*.pyc
|
|
.worktrees/
|
|
```
|
|
|
|
**Step 4: Commit**
|
|
|
|
```bash
|
|
cd ~/work/autoresearch
|
|
git add CLAUDE.md .gitignore bin/ tests/
|
|
git commit -m "feat: project scaffolding"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 2: Configuration file + schema
|
|
|
|
**Files:**
|
|
- Create: `~/.claude/autoresearch.yaml`
|
|
- Create: `~/work/autoresearch/bin/config.py`
|
|
- Create: `~/work/autoresearch/tests/test_config.py`
|
|
|
|
**Step 1: Write failing test**
|
|
|
|
```python
|
|
# tests/test_config.py
|
|
import sys
|
|
sys.path.insert(0, 'bin')
|
|
from config import load_config, ConfigError
|
|
import pytest, tempfile, os, yaml
|
|
|
|
def test_load_valid_config():
|
|
data = {
|
|
'projects': [{'path': '/tmp', 'benchmarks': ['echo ok'], 'time_limit_minutes': 5}],
|
|
'token_threshold': {'context_remaining_pct': 60, 'api_budget_usd': 5.0}
|
|
}
|
|
with tempfile.NamedTemporaryFile('w', suffix='.yaml', delete=False) as f:
|
|
yaml.dump(data, f)
|
|
name = f.name
|
|
cfg = load_config(name)
|
|
assert cfg['projects'][0]['time_limit_minutes'] == 5
|
|
os.unlink(name)
|
|
|
|
def test_missing_projects_raises():
|
|
with tempfile.NamedTemporaryFile('w', suffix='.yaml', delete=False) as f:
|
|
yaml.dump({}, f)
|
|
name = f.name
|
|
with pytest.raises(ConfigError):
|
|
load_config(name)
|
|
os.unlink(name)
|
|
|
|
def test_expands_tilde_in_path():
|
|
data = {
|
|
'projects': [{'path': '~/.claude', 'benchmarks': [], 'time_limit_minutes': 5}],
|
|
'token_threshold': {'context_remaining_pct': 60, 'api_budget_usd': 5.0}
|
|
}
|
|
with tempfile.NamedTemporaryFile('w', suffix='.yaml', delete=False) as f:
|
|
yaml.dump(data, f)
|
|
name = f.name
|
|
cfg = load_config(name)
|
|
assert '~' not in cfg['projects'][0]['path']
|
|
os.unlink(name)
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
```bash
|
|
cd ~/work/autoresearch && python3 -m pytest tests/test_config.py -v
|
|
```
|
|
Expected: FAIL — `ModuleNotFoundError: No module named 'config'`
|
|
|
|
**Step 3: Write implementation**
|
|
|
|
```python
|
|
# bin/config.py
|
|
import os
|
|
import yaml
|
|
|
|
CONFIG_PATH = os.path.expanduser('~/.claude/autoresearch.yaml')
|
|
DEFAULTS = {'time_limit_minutes': 10, 'benchmarks': []}
|
|
|
|
class ConfigError(Exception):
|
|
pass
|
|
|
|
def load_config(path=CONFIG_PATH):
|
|
with open(path) as f:
|
|
cfg = yaml.safe_load(f)
|
|
if not cfg or 'projects' not in cfg or not cfg['projects']:
|
|
raise ConfigError(f"'projects' list is required in {path}")
|
|
for p in cfg['projects']:
|
|
p.setdefault('benchmarks', DEFAULTS['benchmarks'])
|
|
p.setdefault('time_limit_minutes', DEFAULTS['time_limit_minutes'])
|
|
p['path'] = os.path.expanduser(p['path'])
|
|
cfg.setdefault('token_threshold', {'context_remaining_pct': 60, 'api_budget_usd': 5.0})
|
|
return cfg
|
|
```
|
|
|
|
**Step 4: Write default config**
|
|
|
|
```yaml
|
|
# ~/.claude/autoresearch.yaml
|
|
projects:
|
|
- path: ~/.claude
|
|
benchmarks: []
|
|
time_limit_minutes: 5
|
|
|
|
- path: ~/work/kundendoku
|
|
benchmarks:
|
|
- "go test ./..."
|
|
- "go build ./..."
|
|
time_limit_minutes: 10
|
|
|
|
token_threshold:
|
|
context_remaining_pct: 60
|
|
api_budget_usd: 5.00
|
|
```
|
|
|
|
**Step 5: Run tests to verify they pass**
|
|
|
|
```bash
|
|
python3 -m pytest tests/test_config.py -v
|
|
```
|
|
Expected: 3 PASSED
|
|
|
|
**Step 6: Commit**
|
|
|
|
```bash
|
|
git add bin/config.py tests/test_config.py
|
|
git commit -m "feat: config loader with validation"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 3: API budget checker
|
|
|
|
**Files:**
|
|
- Create: `~/work/autoresearch/bin/check-budget.py`
|
|
- Create: `~/work/autoresearch/tests/test_check_budget.py`
|
|
|
|
**Step 1: Write failing tests**
|
|
|
|
```python
|
|
# tests/test_check_budget.py
|
|
import sys
|
|
sys.path.insert(0, 'bin')
|
|
from unittest.mock import patch, MagicMock
|
|
import pytest
|
|
|
|
# Import after path setup
|
|
import importlib
|
|
check_budget = importlib.import_module('check-budget')
|
|
|
|
def test_budget_ok_when_used_less_than_limit():
|
|
with patch('check-budget.get_used_usd', return_value=2.0):
|
|
assert check_budget.is_budget_ok(limit_usd=5.0) is True
|
|
|
|
def test_budget_not_ok_when_used_exceeds_limit():
|
|
with patch('check-budget.get_used_usd', return_value=5.5):
|
|
assert check_budget.is_budget_ok(limit_usd=5.0) is False
|
|
|
|
def test_budget_ok_on_api_error():
|
|
# Fail open: if we can't check, assume OK
|
|
with patch('check-budget.get_used_usd', side_effect=Exception("network error")):
|
|
assert check_budget.is_budget_ok(limit_usd=5.0) is True
|
|
```
|
|
|
|
Note: Python files with hyphens need special import handling. Rename to `check_budget.py` for easier imports.
|
|
|
|
**Step 2: Run tests to verify they fail**
|
|
|
|
```bash
|
|
python3 -m pytest tests/test_check_budget.py -v
|
|
```
|
|
Expected: FAIL — import error
|
|
|
|
**Step 3: Write implementation**
|
|
|
|
```python
|
|
# bin/check_budget.py
|
|
"""Check Anthropic API budget usage. Fails open (returns True) on errors."""
|
|
import os
|
|
import sys
|
|
import urllib.request
|
|
import json
|
|
|
|
USAGE_API = "https://api.anthropic.com/v1/usage"
|
|
|
|
def get_used_usd() -> float:
|
|
api_key = os.environ.get('ANTHROPIC_API_KEY', '')
|
|
if not api_key:
|
|
raise EnvironmentError("ANTHROPIC_API_KEY not set")
|
|
req = urllib.request.Request(
|
|
USAGE_API,
|
|
headers={"x-api-key": api_key, "anthropic-version": "2023-06-01"}
|
|
)
|
|
with urllib.request.urlopen(req, timeout=5) as resp:
|
|
data = json.loads(resp.read())
|
|
# Sum all usage costs
|
|
return sum(item.get('cost_usd', 0.0) for item in data.get('data', []))
|
|
|
|
def is_budget_ok(limit_usd: float) -> bool:
|
|
try:
|
|
used = get_used_usd()
|
|
return used < limit_usd
|
|
except Exception:
|
|
return True # fail open
|
|
|
|
if __name__ == '__main__':
|
|
limit = float(sys.argv[1]) if len(sys.argv) > 1 else 5.0
|
|
ok = is_budget_ok(limit)
|
|
print("ok" if ok else "low")
|
|
sys.exit(0 if ok else 1)
|
|
```
|
|
|
|
**Step 4: Fix test imports (use `check_budget` not `check-budget`)**
|
|
|
|
Update `tests/test_check_budget.py`:
|
|
|
|
```python
|
|
import sys
|
|
sys.path.insert(0, 'bin')
|
|
from unittest.mock import patch
|
|
from check_budget import is_budget_ok, get_used_usd
|
|
|
|
def test_budget_ok_when_used_less_than_limit():
|
|
with patch('check_budget.get_used_usd', return_value=2.0):
|
|
assert is_budget_ok(limit_usd=5.0) is True
|
|
|
|
def test_budget_not_ok_when_used_exceeds_limit():
|
|
with patch('check_budget.get_used_usd', return_value=5.5):
|
|
assert is_budget_ok(limit_usd=5.0) is False
|
|
|
|
def test_budget_ok_on_api_error():
|
|
with patch('check_budget.get_used_usd', side_effect=Exception("network")):
|
|
assert is_budget_ok(limit_usd=5.0) is True
|
|
```
|
|
|
|
**Step 5: Run tests**
|
|
|
|
```bash
|
|
python3 -m pytest tests/test_check_budget.py -v
|
|
```
|
|
Expected: 3 PASSED
|
|
|
|
**Step 6: Commit**
|
|
|
|
```bash
|
|
git add bin/check_budget.py tests/test_check_budget.py
|
|
git commit -m "feat: Anthropic API budget checker, fails open on error"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 4: Experiment log utilities
|
|
|
|
**Files:**
|
|
- Create: `~/work/autoresearch/bin/log_append.py`
|
|
- Create: `~/work/autoresearch/bin/log_view.py`
|
|
- Create: `~/work/autoresearch/tests/test_log.py`
|
|
- Create dir: `~/.claude/autoresearch/`
|
|
|
|
**Log entry schema (one JSON per line in `~/.claude/autoresearch/log.jsonl`):**
|
|
|
|
```json
|
|
{
|
|
"ts": "2026-04-04T22:00:00Z",
|
|
"project": "kundendoku",
|
|
"exp": "0001",
|
|
"description": "Optimize SQL query in handler.go",
|
|
"metric_before": 1.234,
|
|
"metric_after": 1.187,
|
|
"kept": true,
|
|
"files_changed": ["internal/handler.go"],
|
|
"stuck": false,
|
|
"error": null
|
|
}
|
|
```
|
|
|
|
**Step 1: Write failing tests**
|
|
|
|
```python
|
|
# tests/test_log.py
|
|
import sys, os, json, tempfile, pytest
|
|
sys.path.insert(0, 'bin')
|
|
from log_append import append_entry, LOG_PATH
|
|
|
|
def test_append_creates_file(tmp_path):
|
|
log = str(tmp_path / "log.jsonl")
|
|
append_entry({"ts": "2026-01-01T00:00:00Z", "project": "test"}, path=log)
|
|
assert os.path.exists(log)
|
|
|
|
def test_append_writes_valid_json(tmp_path):
|
|
log = str(tmp_path / "log.jsonl")
|
|
append_entry({"project": "x", "kept": True}, path=log)
|
|
with open(log) as f:
|
|
data = json.loads(f.read().strip())
|
|
assert data["project"] == "x"
|
|
|
|
def test_append_multiple_entries(tmp_path):
|
|
log = str(tmp_path / "log.jsonl")
|
|
for i in range(3):
|
|
append_entry({"n": i}, path=log)
|
|
with open(log) as f:
|
|
lines = [l for l in f if l.strip()]
|
|
assert len(lines) == 3
|
|
```
|
|
|
|
**Step 2: Run tests to verify they fail**
|
|
|
|
```bash
|
|
python3 -m pytest tests/test_log.py -v
|
|
```
|
|
|
|
**Step 3: Write log_append.py**
|
|
|
|
```python
|
|
# bin/log_append.py
|
|
import json, os
|
|
from datetime import datetime, timezone
|
|
|
|
LOG_PATH = os.path.expanduser('~/.claude/autoresearch/log.jsonl')
|
|
|
|
def append_entry(entry: dict, path: str = LOG_PATH) -> None:
|
|
entry.setdefault('ts', datetime.now(timezone.utc).isoformat())
|
|
os.makedirs(os.path.dirname(path), exist_ok=True)
|
|
with open(path, 'a') as f:
|
|
f.write(json.dumps(entry) + '\n')
|
|
```
|
|
|
|
**Step 4: Write log_view.py**
|
|
|
|
```python
|
|
# bin/log_view.py
|
|
"""Live log viewer for overview pane. Run: python3 bin/log_view.py --follow"""
|
|
import json, sys, time, os
|
|
from rich.console import Console
|
|
from rich.table import Table
|
|
from rich import box
|
|
|
|
LOG_PATH = os.path.expanduser('~/.claude/autoresearch/log.jsonl')
|
|
console = Console()
|
|
|
|
def read_entries(path=LOG_PATH):
|
|
if not os.path.exists(path):
|
|
return []
|
|
with open(path) as f:
|
|
entries = []
|
|
for line in f:
|
|
line = line.strip()
|
|
if line:
|
|
try:
|
|
entries.append(json.loads(line))
|
|
except json.JSONDecodeError:
|
|
pass
|
|
return entries
|
|
|
|
def render_table(entries):
|
|
table = Table(box=box.SIMPLE, show_header=True, header_style="bold #DA251C")
|
|
table.add_column("Time", style="dim", width=8)
|
|
table.add_column("Project", width=14)
|
|
table.add_column("Exp", width=6)
|
|
table.add_column("Description", width=40)
|
|
table.add_column("Δ Metric", width=10, justify="right")
|
|
table.add_column("", width=3)
|
|
for e in entries[-30:]:
|
|
ts = e.get('ts', '')[-8:][:5] if e.get('ts') else '?'
|
|
before = e.get('metric_before')
|
|
after = e.get('metric_after')
|
|
if before and after:
|
|
delta = f"{after - before:+.3f}"
|
|
delta_style = "green" if after < before else "red"
|
|
else:
|
|
delta = "n/a"
|
|
delta_style = "dim"
|
|
kept = "✓" if e.get('kept') else "✗"
|
|
kept_style = "green" if e.get('kept') else "red"
|
|
table.add_row(
|
|
ts,
|
|
e.get('project', '?')[:14],
|
|
str(e.get('exp', '?')),
|
|
(e.get('description') or '')[:40],
|
|
f"[{delta_style}]{delta}[/{delta_style}]",
|
|
f"[{kept_style}]{kept}[/{kept_style}]",
|
|
)
|
|
return table
|
|
|
|
if __name__ == '__main__':
|
|
follow = '--follow' in sys.argv
|
|
while True:
|
|
entries = read_entries()
|
|
console.clear()
|
|
console.print(f"[bold #DA251C]autoresearch[/bold #DA251C] — {len(entries)} experiments", justify="center")
|
|
console.print(render_table(entries))
|
|
if not follow:
|
|
break
|
|
time.sleep(5)
|
|
```
|
|
|
|
**Step 5: Run tests**
|
|
|
|
```bash
|
|
python3 -m pytest tests/test_log.py -v
|
|
```
|
|
Expected: 3 PASSED
|
|
|
|
**Step 6: Commit**
|
|
|
|
```bash
|
|
git add bin/log_append.py bin/log_view.py tests/test_log.py
|
|
git commit -m "feat: experiment log append and rich viewer"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 5: Stuck detection
|
|
|
|
**Files:**
|
|
- Create: `~/work/autoresearch/bin/stuck_check.py`
|
|
- Create: `~/work/autoresearch/tests/test_stuck.py`
|
|
|
|
**Step 1: Write failing tests**
|
|
|
|
```python
|
|
# tests/test_stuck.py
|
|
import sys
|
|
sys.path.insert(0, 'bin')
|
|
from stuck_check import is_stuck
|
|
|
|
def test_not_stuck_with_different_files():
|
|
entries = [
|
|
{'project': 'x', 'files_changed': ['a.go']},
|
|
{'project': 'x', 'files_changed': ['b.go']},
|
|
{'project': 'x', 'files_changed': ['c.go']},
|
|
]
|
|
assert is_stuck('x', entries) is False
|
|
|
|
def test_stuck_when_same_files_3_times():
|
|
entries = [
|
|
{'project': 'x', 'files_changed': ['a.go', 'b.go']},
|
|
{'project': 'x', 'files_changed': ['a.go', 'b.go']},
|
|
{'project': 'x', 'files_changed': ['a.go', 'b.go']},
|
|
]
|
|
assert is_stuck('x', entries) is True
|
|
|
|
def test_not_stuck_with_fewer_than_3_entries():
|
|
entries = [
|
|
{'project': 'x', 'files_changed': ['a.go']},
|
|
{'project': 'x', 'files_changed': ['a.go']},
|
|
]
|
|
assert is_stuck('x', entries) is False
|
|
|
|
def test_only_checks_matching_project():
|
|
entries = [
|
|
{'project': 'y', 'files_changed': ['a.go']},
|
|
{'project': 'y', 'files_changed': ['a.go']},
|
|
{'project': 'y', 'files_changed': ['a.go']},
|
|
{'project': 'x', 'files_changed': ['a.go']},
|
|
{'project': 'x', 'files_changed': ['a.go']},
|
|
]
|
|
assert is_stuck('x', entries) is False # only 2 entries for x
|
|
```
|
|
|
|
**Step 2: Run tests to verify they fail**
|
|
|
|
```bash
|
|
python3 -m pytest tests/test_stuck.py -v
|
|
```
|
|
|
|
**Step 3: Write implementation**
|
|
|
|
```python
|
|
# bin/stuck_check.py
|
|
"""Detect if experiments are cycling over the same files."""
|
|
import json, os, sys
|
|
from log_append import LOG_PATH, read_entries # reuse log reader
|
|
|
|
def is_stuck(project: str, entries: list, window: int = 3) -> bool:
|
|
project_entries = [e for e in entries if e.get('project') == project]
|
|
if len(project_entries) < window:
|
|
return False
|
|
recent = project_entries[-window:]
|
|
file_sets = [frozenset(e.get('files_changed', [])) for e in recent]
|
|
return len(set(file_sets)) == 1 and file_sets[0] != frozenset()
|
|
|
|
if __name__ == '__main__':
|
|
project = sys.argv[1] if len(sys.argv) > 1 else ''
|
|
from log_append import read_entries as _read
|
|
entries = _read()
|
|
if is_stuck(project, entries):
|
|
print("stuck")
|
|
sys.exit(1)
|
|
print("ok")
|
|
sys.exit(0)
|
|
```
|
|
|
|
Note: `stuck_check.py` imports `read_entries` from `log_append`. Update `log_append.py` to export `read_entries` (move the function there from `log_view.py`).
|
|
|
|
**Step 4: Move read_entries to log_append.py**
|
|
|
|
Add to `bin/log_append.py`:
|
|
|
|
```python
|
|
def read_entries(path: str = LOG_PATH) -> list:
|
|
if not os.path.exists(path):
|
|
return []
|
|
entries = []
|
|
with open(path) as f:
|
|
for line in f:
|
|
line = line.strip()
|
|
if line:
|
|
try:
|
|
entries.append(json.loads(line))
|
|
except json.JSONDecodeError:
|
|
pass
|
|
return entries
|
|
```
|
|
|
|
Update `bin/log_view.py` to import `read_entries` from `log_append` instead of defining it locally.
|
|
|
|
**Step 5: Run all tests**
|
|
|
|
```bash
|
|
python3 -m pytest tests/ -v
|
|
```
|
|
Expected: all PASSED
|
|
|
|
**Step 6: Commit**
|
|
|
|
```bash
|
|
git add bin/stuck_check.py bin/log_append.py bin/log_view.py tests/test_stuck.py
|
|
git commit -m "feat: stuck detection — alerts when same files repeat 3 times"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 6: Metrics summary script (overview pane, right)
|
|
|
|
**Files:**
|
|
- Create: `~/work/autoresearch/bin/metrics.py`
|
|
|
|
No tests needed — this is pure display logic.
|
|
|
|
**Step 1: Write metrics.py**
|
|
|
|
```python
|
|
# bin/metrics.py
|
|
"""Compact metrics panel for overview pane right side. Run: watch -n5 python3 bin/metrics.py"""
|
|
import json, os, sys
|
|
from datetime import datetime, timezone
|
|
from rich.console import Console
|
|
from rich.panel import Panel
|
|
from rich.text import Text
|
|
sys.path.insert(0, os.path.dirname(__file__))
|
|
from log_append import read_entries
|
|
|
|
console = Console(width=40)
|
|
|
|
def summarize(entries):
|
|
today = datetime.now(timezone.utc).date().isoformat()
|
|
today_entries = [e for e in entries if e.get('ts', '').startswith(today)]
|
|
total = len(today_entries)
|
|
kept = sum(1 for e in today_entries if e.get('kept'))
|
|
projects = {}
|
|
for e in today_entries:
|
|
p = e.get('project', 'unknown')
|
|
projects.setdefault(p, {'total': 0, 'kept': 0})
|
|
projects[p]['total'] += 1
|
|
if e.get('kept'):
|
|
projects[p]['kept'] += 1
|
|
return total, kept, projects
|
|
|
|
def render():
|
|
entries = read_entries()
|
|
total, kept, projects = summarize(entries)
|
|
rate = f"{kept/total*100:.0f}%" if total else "—"
|
|
text = Text()
|
|
text.append(f"Today: {total} experiments\n", style="bold")
|
|
text.append(f"Kept: {kept} ({rate})\n\n", style="green")
|
|
for p, s in projects.items():
|
|
r = f"{s['kept']}/{s['total']}"
|
|
text.append(f" {p[:16]:<16} {r}\n")
|
|
console.print(Panel(text, title="[bold #DA251C]Metrics[/bold #DA251C]", border_style="#FFD802"))
|
|
|
|
if __name__ == '__main__':
|
|
render()
|
|
```
|
|
|
|
**Step 2: Test manually**
|
|
|
|
```bash
|
|
python3 bin/metrics.py
|
|
```
|
|
Expected: renders a panel (empty if no log yet)
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
git add bin/metrics.py
|
|
git commit -m "feat: metrics summary panel for overview pane"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 7: /autoresearch-loop skill (ralph-loop payload)
|
|
|
|
**Files:**
|
|
- Create: `~/.claude/skills/autoresearch-loop/skill.md`
|
|
|
|
This is the prompt that ralph-loop repeats on every iteration. It runs in the project directory.
|
|
|
|
**Step 1: Create skill**
|
|
|
|
```bash
|
|
mkdir -p ~/.claude/skills/autoresearch-loop
|
|
```
|
|
|
|
Write `~/.claude/skills/autoresearch-loop/skill.md`:
|
|
|
|
````markdown
|
|
---
|
|
description: "One autoresearch experiment iteration (called by ralph-loop)"
|
|
hide-from-slash-command-tool: "true"
|
|
---
|
|
|
|
# autoresearch — one experiment iteration
|
|
|
|
You are the autoresearch agent for this project. Execute exactly one experiment:
|
|
|
|
## 1. Pre-flight checks
|
|
|
|
Run these checks first. If either fails, output `<promise>STOP</promise>` immediately.
|
|
|
|
```bash
|
|
# Budget check
|
|
python3 ~/work/autoresearch/bin/check_budget.py $(python3 -c "
|
|
import yaml; c=yaml.safe_load(open(os.path.expanduser('~/.claude/autoresearch.yaml'))); print(c['token_threshold']['api_budget_usd'])
|
|
" 2>/dev/null || echo 5.0)
|
|
```
|
|
|
|
```bash
|
|
# Stuck check
|
|
python3 ~/work/autoresearch/bin/stuck_check.py "$(basename $PWD)"
|
|
```
|
|
|
|
If stuck: print a clear message explaining which files keep appearing, then output `<promise>STOP</promise>`.
|
|
|
|
## 2. Plan your experiment
|
|
|
|
- Read recent log entries for this project to understand what has been tried
|
|
- Identify ONE concrete improvement to make (code quality, config, performance, prompt clarity)
|
|
- Be specific: name the file and the change
|
|
|
|
## 3. Create worktree
|
|
|
|
```bash
|
|
EXP=$(printf "%04d" $(ls .worktrees/ 2>/dev/null | wc -l))
|
|
BRANCH="exp-${EXP}-$(echo '<short-description>' | tr ' ' '-' | tr '[:upper:]' '[:lower:]')"
|
|
git worktree add .worktrees/${BRANCH} -b ${BRANCH}
|
|
cd .worktrees/${BRANCH}
|
|
```
|
|
|
|
## 4. Make changes
|
|
|
|
Work in the worktree. Make exactly the planned change. Keep it focused.
|
|
|
|
## 5. Run benchmarks (with timeout)
|
|
|
|
```bash
|
|
# Read benchmarks from config
|
|
python3 -c "
|
|
import yaml, os
|
|
cfg = yaml.safe_load(open(os.path.expanduser('~/.claude/autoresearch.yaml')))
|
|
project_path = os.getcwd()
|
|
for p in cfg['projects']:
|
|
if os.path.expanduser(p['path']) in project_path:
|
|
for b in p['benchmarks']:
|
|
print(b)
|
|
break
|
|
"
|
|
```
|
|
|
|
Run each benchmark with `timeout <time_limit_seconds>`. Capture output and exit codes.
|
|
|
|
## 6. Evaluate
|
|
|
|
- If benchmarks fail: discard (worktree is broken)
|
|
- If benchmarks pass: compare to baseline on main branch
|
|
- For projects with no benchmarks: use your judgment (is this objectively better?)
|
|
|
|
## 7. Merge or discard
|
|
|
|
```bash
|
|
# If keeping:
|
|
cd ~/work/<project>
|
|
git merge .worktrees/${BRANCH} --no-ff -m "exp: ${BRANCH}"
|
|
git worktree remove .worktrees/${BRANCH}
|
|
|
|
# If discarding:
|
|
git worktree remove .worktrees/${BRANCH} --force
|
|
git branch -D ${BRANCH}
|
|
```
|
|
|
|
## 8. Log the result
|
|
|
|
```python
|
|
python3 ~/work/autoresearch/bin/log_append.py
|
|
```
|
|
|
|
Or call directly:
|
|
|
|
```python
|
|
import sys; sys.path.insert(0, '/home/templis/work/autoresearch/bin')
|
|
from log_append import append_entry
|
|
append_entry({
|
|
"project": "<project-name>",
|
|
"exp": "<NNNN>",
|
|
"description": "<what was changed>",
|
|
"metric_before": <float or null>,
|
|
"metric_after": <float or null>,
|
|
"kept": <true/false>,
|
|
"files_changed": ["<file1>", "<file2>"],
|
|
"stuck": False,
|
|
"error": None
|
|
})
|
|
```
|
|
|
|
## 9. Continue
|
|
|
|
Do NOT output `<promise>STOP</promise>` unless a pre-flight check failed.
|
|
The loop will automatically feed this prompt back for the next iteration.
|
|
````
|
|
|
|
**Step 2: Verify skill file exists**
|
|
|
|
```bash
|
|
ls ~/.claude/skills/autoresearch-loop/skill.md
|
|
```
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
cd ~/work/autoresearch
|
|
git add -A # skill is in ~/.claude, handled separately
|
|
git commit -m "feat: autoresearch-loop skill (ralph-loop payload)"
|
|
```
|
|
|
|
(Note: ~/.claude skills are managed by the claude-config repo separately)
|
|
|
|
---
|
|
|
|
## Task 8: /autoresearch skill (start/stop/status interface)
|
|
|
|
**Files:**
|
|
- Create: `~/.claude/skills/autoresearch/skill.md`
|
|
|
|
**Step 1: Write skill**
|
|
|
|
```bash
|
|
mkdir -p ~/.claude/skills/autoresearch
|
|
```
|
|
|
|
Write `~/.claude/skills/autoresearch/skill.md`:
|
|
|
|
````markdown
|
|
---
|
|
description: "Start, stop, or show status of the autoresearch tmux session"
|
|
argument-hint: "[start|stop|status]"
|
|
---
|
|
|
|
# autoresearch
|
|
|
|
Manage the autoresearch tmux session.
|
|
|
|
## start (default)
|
|
|
|
Run:
|
|
```bash
|
|
~/work/autoresearch/bin/start.sh start
|
|
```
|
|
|
|
Report: which tmux windows were created, which projects are running.
|
|
|
|
## stop
|
|
|
|
Run:
|
|
```bash
|
|
~/work/autoresearch/bin/start.sh stop
|
|
```
|
|
|
|
## status
|
|
|
|
Run:
|
|
```bash
|
|
~/work/autoresearch/bin/start.sh status
|
|
```
|
|
|
|
Show: which windows are running, last log entry per project.
|
|
````
|
|
|
|
**Step 2: Verify**
|
|
|
|
```bash
|
|
ls ~/.claude/skills/autoresearch/skill.md
|
|
```
|
|
|
|
---
|
|
|
|
## Task 9: tmux orchestrator (bin/start.sh)
|
|
|
|
**Files:**
|
|
- Create: `~/work/autoresearch/bin/start.sh`
|
|
|
|
**Step 1: Write start.sh**
|
|
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
# autoresearch tmux orchestrator
|
|
# Usage: start.sh [start|stop|status]
|
|
set -euo pipefail
|
|
|
|
SESSION="autoresearch"
|
|
CONFIG="${HOME}/.claude/autoresearch.yaml"
|
|
LOGDIR="${HOME}/.claude/autoresearch"
|
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
|
|
# Read projects from config
|
|
get_projects() {
|
|
python3 -c "
|
|
import yaml, os, sys
|
|
cfg = yaml.safe_load(open('${CONFIG}'))
|
|
for p in cfg['projects']:
|
|
print(os.path.expanduser(p['path']))
|
|
"
|
|
}
|
|
|
|
cmd_start() {
|
|
if tmux has-session -t "$SESSION" 2>/dev/null; then
|
|
echo "autoresearch: session '$SESSION' already running"
|
|
echo " attach: tmux attach -t $SESSION"
|
|
exit 0
|
|
fi
|
|
|
|
mkdir -p "$LOGDIR"
|
|
|
|
# Window 0: overview (log viewer left + metrics right)
|
|
tmux new-session -d -s "$SESSION" -n "overview" -x 220 -y 50
|
|
tmux split-window -h -t "${SESSION}:overview" -p 30
|
|
tmux send-keys -t "${SESSION}:overview.0" \
|
|
"watch -n5 python3 ${SCRIPT_DIR}/log_view.py" Enter
|
|
tmux send-keys -t "${SESSION}:overview.1" \
|
|
"watch -n10 python3 ${SCRIPT_DIR}/metrics.py" Enter
|
|
|
|
# One window per project
|
|
WIN=1
|
|
while IFS= read -r project_path; do
|
|
project_name=$(basename "$project_path")
|
|
tmux new-window -t "${SESSION}:${WIN}" -n "$project_name"
|
|
tmux send-keys -t "${SESSION}:${WIN}" \
|
|
"cd '${project_path}' && claude --name 'autoresearch-${project_name}'" Enter
|
|
# Wait briefly then send the ralph-loop invocation
|
|
sleep 2
|
|
tmux send-keys -t "${SESSION}:${WIN}" \
|
|
"/ralph-loop Run one autoresearch experiment iteration. --completion-promise STOP" Enter
|
|
WIN=$((WIN + 1))
|
|
done < <(get_projects)
|
|
|
|
tmux select-window -t "${SESSION}:0"
|
|
echo "autoresearch: started session '$SESSION' with $((WIN-1)) project windows"
|
|
echo " attach: tmux attach -t $SESSION"
|
|
}
|
|
|
|
cmd_stop() {
|
|
if ! tmux has-session -t "$SESSION" 2>/dev/null; then
|
|
echo "autoresearch: session '$SESSION' not running"
|
|
exit 0
|
|
fi
|
|
tmux kill-session -t "$SESSION"
|
|
echo "autoresearch: session stopped"
|
|
}
|
|
|
|
cmd_status() {
|
|
if ! tmux has-session -t "$SESSION" 2>/dev/null; then
|
|
echo "autoresearch: not running"
|
|
exit 1
|
|
fi
|
|
echo "autoresearch: running"
|
|
tmux list-windows -t "$SESSION"
|
|
echo ""
|
|
echo "Last entries per project:"
|
|
python3 -c "
|
|
import json, os
|
|
from collections import defaultdict
|
|
log = os.path.expanduser('~/.claude/autoresearch/log.jsonl')
|
|
if not os.path.exists(log):
|
|
print(' (no experiments yet)')
|
|
exit()
|
|
latest = {}
|
|
with open(log) as f:
|
|
for line in f:
|
|
try:
|
|
e = json.loads(line)
|
|
latest[e.get('project')] = e
|
|
except: pass
|
|
for proj, e in latest.items():
|
|
kept = '✓' if e.get('kept') else '✗'
|
|
print(f' {proj}: exp {e.get(\"exp\",\"?\")} {kept} — {e.get(\"description\",\"\")[:50]}')
|
|
"
|
|
}
|
|
|
|
case "${1:-start}" in
|
|
start) cmd_start ;;
|
|
stop) cmd_stop ;;
|
|
status) cmd_status ;;
|
|
*) echo "Usage: $0 [start|stop|status]"; exit 1 ;;
|
|
esac
|
|
```
|
|
|
|
**Step 2: Make executable**
|
|
|
|
```bash
|
|
chmod +x ~/work/autoresearch/bin/start.sh
|
|
```
|
|
|
|
**Step 3: Dry-run test (no actual tmux)**
|
|
|
|
```bash
|
|
# Verify syntax
|
|
bash -n ~/work/autoresearch/bin/start.sh && echo "syntax OK"
|
|
# Verify status when not running
|
|
~/work/autoresearch/bin/start.sh status || echo "expected: not running"
|
|
```
|
|
|
|
**Step 4: Commit**
|
|
|
|
```bash
|
|
cd ~/work/autoresearch
|
|
git add bin/start.sh
|
|
git commit -m "feat: tmux orchestrator — start/stop/status autoresearch session"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 10: PostSessionStart hook
|
|
|
|
**Files:**
|
|
- Modify: `~/.claude/settings.json`
|
|
- Create: `~/work/autoresearch/bin/session-start-hook.sh`
|
|
|
|
**Step 1: Write hook script**
|
|
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
# PostSessionStart hook: auto-start autoresearch when tokens + budget available
|
|
set -euo pipefail
|
|
|
|
CONFIG="${HOME}/.claude/autoresearch.yaml"
|
|
SESSION="autoresearch"
|
|
|
|
# Only run if config exists
|
|
[[ -f "$CONFIG" ]] || exit 0
|
|
|
|
# Skip if autoresearch session already running
|
|
tmux has-session -t "$SESSION" 2>/dev/null && exit 0
|
|
|
|
# Check API budget
|
|
BUDGET_LIMIT=$(python3 -c "
|
|
import yaml; c=yaml.safe_load(open('${CONFIG}'))
|
|
print(c.get('token_threshold', {}).get('api_budget_usd', 5.0))
|
|
" 2>/dev/null || echo "5.0")
|
|
|
|
python3 "${HOME}/work/autoresearch/bin/check_budget.py" "$BUDGET_LIMIT" || exit 0
|
|
|
|
# Budget OK → start autoresearch in background
|
|
nohup "${HOME}/work/autoresearch/bin/start.sh" start \
|
|
>> "${HOME}/.claude/autoresearch/hook.log" 2>&1 &
|
|
|
|
exit 0
|
|
```
|
|
|
|
```bash
|
|
chmod +x ~/work/autoresearch/bin/session-start-hook.sh
|
|
```
|
|
|
|
**Step 2: Add hook to settings.json**
|
|
|
|
Read `~/.claude/settings.json` first, then add the PostSessionStart hook.
|
|
|
|
New hooks section (merge with existing):
|
|
|
|
```json
|
|
"PostSessionStart": [
|
|
{
|
|
"hooks": [
|
|
{
|
|
"type": "command",
|
|
"command": "~/work/autoresearch/bin/session-start-hook.sh",
|
|
"async": true
|
|
}
|
|
]
|
|
}
|
|
]
|
|
```
|
|
|
|
**Step 3: Verify settings.json is valid JSON**
|
|
|
|
```bash
|
|
python3 -m json.tool ~/.claude/settings.json > /dev/null && echo "valid JSON"
|
|
```
|
|
|
|
**Step 4: Test hook manually**
|
|
|
|
```bash
|
|
~/work/autoresearch/bin/session-start-hook.sh && echo "hook OK"
|
|
```
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
cd ~/work/autoresearch
|
|
git add bin/session-start-hook.sh
|
|
git commit -m "feat: PostSessionStart hook — auto-start when budget available"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 11: Integration dry-run
|
|
|
|
**Goal:** Start the tmux session, verify all windows open correctly, verify log viewer and metrics panel render.
|
|
|
|
**Step 1: Start session**
|
|
|
|
```bash
|
|
~/work/autoresearch/bin/start.sh start
|
|
```
|
|
Expected output: `autoresearch: started session 'autoresearch' with N project windows`
|
|
|
|
**Step 2: Verify tmux windows**
|
|
|
|
```bash
|
|
tmux list-windows -t autoresearch
|
|
```
|
|
Expected: `0: overview`, `1: claude-config` (or first project name), etc.
|
|
|
|
**Step 3: Check overview pane renders**
|
|
|
|
```bash
|
|
tmux attach -t autoresearch
|
|
# Navigate to window 0, verify log viewer and metrics panel are visible
|
|
# Ctrl+b d to detach
|
|
```
|
|
|
|
**Step 4: Verify claude session started in project window**
|
|
|
|
Attach and switch to window 1, verify `claude` prompt is visible.
|
|
|
|
**Step 5: Stop session**
|
|
|
|
```bash
|
|
~/work/autoresearch/bin/start.sh stop
|
|
```
|
|
|
|
**Step 6: Add project to CLAUDE.md and workspace CLAUDE.md**
|
|
|
|
Update `/home/templis/work/CLAUDE.md` projects table to include `autoresearch/`.
|
|
|
|
**Step 7: Final commit + push to Gitea**
|
|
|
|
```bash
|
|
cd ~/work/autoresearch
|
|
git add -A
|
|
git commit -m "feat: integration verified — autoresearch MVP complete"
|
|
git remote add origin git@git.tueit.de:tueit_GmbH/autoresearch.git
|
|
git push -u origin main
|
|
```
|
|
|
|
---
|
|
|
|
## Summary of created files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `~/work/autoresearch/bin/config.py` | Load + validate autoresearch.yaml |
|
|
| `~/work/autoresearch/bin/check_budget.py` | Anthropic API budget check |
|
|
| `~/work/autoresearch/bin/log_append.py` | Append experiment log entries |
|
|
| `~/work/autoresearch/bin/log_view.py` | Rich log viewer (overview pane left) |
|
|
| `~/work/autoresearch/bin/metrics.py` | Metrics summary (overview pane right) |
|
|
| `~/work/autoresearch/bin/stuck_check.py` | Stuck file detection |
|
|
| `~/work/autoresearch/bin/start.sh` | tmux session orchestrator |
|
|
| `~/work/autoresearch/bin/session-start-hook.sh` | PostSessionStart hook |
|
|
| `~/.claude/autoresearch.yaml` | Main config (projects + thresholds) |
|
|
| `~/.claude/skills/autoresearch/skill.md` | `/autoresearch` user skill |
|
|
| `~/.claude/skills/autoresearch-loop/skill.md` | `/autoresearch-loop` ralph-loop payload |
|
|
| `~/.claude/settings.json` | +PostSessionStart hook |
|