# Design: tüit Transkriptor **Date:** 2026-04-01 **Status:** Approved **Platform:** Arch Linux, KDE Plasma (Wayland), AMD RX 6800 XT (RDNA2 / ROCm) ## Goal A local AI transcription tool that runs as a system tray application, monitors audio input, and produces LLM-refined Markdown transcripts saved directly into the Nextcloud-synced notes folder. Designed as a personal secretary — the user can provide instructions alongside the recording to guide the LLM output. ## Architecture ``` tueit_Transkriptor/ ├── main.py # Entry point: starts FastAPI + pystray ├── api/ │ ├── router.py # REST endpoints + WebSocket │ └── state.py # Global app state (recording, transcript, ...) ├── audio.py # sounddevice → PCM buffer ├── transcription.py # faster-whisper wrapper (ROCm-capable) ├── llm.py # Ollama httpx client ├── output.py # Render Markdown + write to Nextcloud folder ├── config.py # TOML config (~/.config/tueit-transcriber/config.toml) ├── frontend/ │ ├── index.html # Single-page UI (tüit CI: dark mode, #DA251C, #FFD802, Overpass) │ └── app.js # WebSocket client for live status ├── install.sh # Check deps (ROCm, Ollama, Python packages), set up systemd user service └── requirements.txt ``` ## Data Flow ``` SIGUSR1 / Tray click / API POST /toggle → sounddevice captures PCM (16kHz mono) → on stop: WAV → faster-whisper → raw text → raw text + user instructions (from UI) → Ollama (gemma3:12b via ROCm) → Markdown with tüit CI (frontmatter, headings, highlights) → file: ~/cloud.shron.de/Hetzner Storagebox/work/YYYY-MM-DD-HHmm-.md ``` ## API Endpoints | Method | Path | Purpose | |--------|------|---------| | `POST` | `/toggle` | Start/stop recording (also triggered via SIGUSR1) | | `GET` | `/status` | Current state: recording / processing / idle | | `GET` | `/transcripts` | List of recent transcripts | | `WS` | `/ws` | Live updates to frontend | | `GET` | `/config` | Current configuration | | `PUT` | `/config` | Update config (model, output path, ...) | Future Thunderbird integration: `POST /compose` — generates a draft from the transcript. The API foundation is already in place. ## UI Permanent browser window opened at startup (`http://localhost:8765`). Dark mode, tüit CI colors and Overpass font. - **Top:** Large record toggle button (red when active, grey when idle) + status display - **Middle:** Instruction text field — persistent, included as LLM context on every processing run. Examples: "highlight the key points", "create a ticket for this", "draft an offer" - **Bottom:** Live transcript preview during processing; list of recent transcripts (clickable → opens file) ## Trigger Mechanism - **Tray icon click** — toggles recording - **SIGUSR1** — toggles recording (Wayland-compatible hotkey workaround) - PID written to `~/.local/run/tueit-transcriber.pid` - KDE custom shortcut: `pkill -USR1 -f main.py` - Works on any DE, Wayland-independent ## Ollama Model The RX 6800 XT has 16 GB GDDR6 VRAM. ROCm supports RDNA2 since ROCm 5.x. | Component | Model | VRAM | |-----------|-------|------| | LLM | `gemma3:12b` (default) | ~8–9 GB | | Whisper | `large-v3` (ROCm) | ~3 GB | | Fallback Whisper | `medium` | ~1.5 GB | Both configurable via `config.toml` and the settings UI. ## Output Format Markdown file with YAML frontmatter: ```markdown --- date: 2026-04-01T14:32:00 tags: [transkript] --- # <LLM-generated title> <LLM-refined transcript content> ``` File naming: `YYYY-MM-DD-HHmm-<slugified-title>.md` Output path: `~/cloud.shron.de/Hetzner Storagebox/work/` ## Dependencies - `faster-whisper` — Whisper inference - `sounddevice` — audio capture - `httpx` — Ollama API client - `fastapi` + `uvicorn` — local HTTP/WebSocket server - `pystray` + `Pillow` — system tray icon - `tomli` / `tomllib` — TOML config - Ollama (system, with ROCm) - ROCm (system, via pacman: `rocm-hip-sdk`) ## Future Extensions - Thunderbird integration via `POST /compose` - Zammad ticket creation via `POST /ticket` - Template system (e.g. "offer", "reminder", "meeting notes")