4.3 KiB
Design: tüit Transkriptor
Date: 2026-04-01 Status: Approved Platform: Arch Linux, KDE Plasma (Wayland), AMD RX 6800 XT (RDNA2 / ROCm)
Goal
A local AI transcription tool that runs as a system tray application, monitors audio input, and produces LLM-refined Markdown transcripts saved directly into the Nextcloud-synced notes folder. Designed as a personal secretary — the user can provide instructions alongside the recording to guide the LLM output.
Architecture
tueit_Transkriptor/
├── main.py # Entry point: starts FastAPI + pystray
├── api/
│ ├── router.py # REST endpoints + WebSocket
│ └── state.py # Global app state (recording, transcript, ...)
├── audio.py # sounddevice → PCM buffer
├── transcription.py # faster-whisper wrapper (ROCm-capable)
├── llm.py # Ollama httpx client
├── output.py # Render Markdown + write to Nextcloud folder
├── config.py # TOML config (~/.config/tueit-transcriber/config.toml)
├── frontend/
│ ├── index.html # Single-page UI (tüit CI: dark mode, #DA251C, #FFD802, Overpass)
│ └── app.js # WebSocket client for live status
├── install.sh # Check deps (ROCm, Ollama, Python packages), set up systemd user service
└── requirements.txt
Data Flow
SIGUSR1 / Tray click / API POST /toggle
→ sounddevice captures PCM (16kHz mono)
→ on stop: WAV → faster-whisper → raw text
→ raw text + user instructions (from UI) → Ollama (gemma3:12b via ROCm)
→ Markdown with tüit CI (frontmatter, headings, highlights)
→ file: ~/cloud.shron.de/Hetzner Storagebox/work/YYYY-MM-DD-HHmm-<title>.md
API Endpoints
| Method | Path | Purpose |
|---|---|---|
POST |
/toggle |
Start/stop recording (also triggered via SIGUSR1) |
GET |
/status |
Current state: recording / processing / idle |
GET |
/transcripts |
List of recent transcripts |
WS |
/ws |
Live updates to frontend |
GET |
/config |
Current configuration |
PUT |
/config |
Update config (model, output path, ...) |
Future Thunderbird integration: POST /compose — generates a draft from the transcript. The API foundation is already in place.
UI
Permanent browser window opened at startup (http://localhost:8765). Dark mode, tüit CI colors and Overpass font.
- Top: Large record toggle button (red when active, grey when idle) + status display
- Middle: Instruction text field — persistent, included as LLM context on every processing run. Examples: "highlight the key points", "create a ticket for this", "draft an offer"
- Bottom: Live transcript preview during processing; list of recent transcripts (clickable → opens file)
Trigger Mechanism
- Tray icon click — toggles recording
- SIGUSR1 — toggles recording (Wayland-compatible hotkey workaround)
- PID written to
~/.local/run/tueit-transcriber.pid - KDE custom shortcut:
pkill -USR1 -f main.py - Works on any DE, Wayland-independent
- PID written to
Ollama Model
The RX 6800 XT has 16 GB GDDR6 VRAM. ROCm supports RDNA2 since ROCm 5.x.
| Component | Model | VRAM |
|---|---|---|
| LLM | gemma3:12b (default) |
~8–9 GB |
| Whisper | large-v3 (ROCm) |
~3 GB |
| Fallback Whisper | medium |
~1.5 GB |
Both configurable via config.toml and the settings UI.
Output Format
Markdown file with YAML frontmatter:
---
date: 2026-04-01T14:32:00
tags: [transkript]
---
# <LLM-generated title>
<LLM-refined transcript content>
File naming: YYYY-MM-DD-HHmm-<slugified-title>.md
Output path: ~/cloud.shron.de/Hetzner Storagebox/work/
Dependencies
faster-whisper— Whisper inferencesounddevice— audio capturehttpx— Ollama API clientfastapi+uvicorn— local HTTP/WebSocket serverpystray+Pillow— system tray icontomli/tomllib— TOML config- Ollama (system, with ROCm)
- ROCm (system, via pacman:
rocm-hip-sdk)
Future Extensions
- Thunderbird integration via
POST /compose - Zammad ticket creation via
POST /ticket - Template system (e.g. "offer", "reminder", "meeting notes")