79 Commits

Author SHA1 Message Date
thomas.kopp 8ec9044c75 fix: whisper repetition loops, meeting transcript punctuation
- transcription: add temperature_inc=0 to whispercpp to disable fallback (prevents loops)
- pipeline: punctuate meeting transcript in one pass (parallel with summarize)
- output: write_meeting_docs accepts pre-built transcript_text
- llm: punctuate prompt preserves speaker labels

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 12:34:11 +02:00
thomas.kopp 658f9be47f fix: punctuate raw transcript, strip JSON code fences, filter null speaker names
- llm: punctuate() adds punctuation/capitalisation without changing words
- llm: _strip_code_fences() handles markdown-wrapped JSON from gemma3
- llm: filter string 'null' from identify_speakers result
- pipeline: punctuate raw_text in parallel with refine for solo recordings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 12:23:25 +02:00
thomas.kopp d3582eaeb7 feat: tab navigation in modal (Index/Transkript/Zusammenfassung)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 12:10:50 +02:00
thomas.kopp 336628341b feat: AI-generated title+tldr, subfolder structure, backlinks in transkript/zusammenfassung
- llm: generate_title_and_tldr() returns concise title and 2-3 sentence summary
- output: index in root, transkript+zusammenfassung in {base}/ subdir with backlinks
- pipeline: call generate_title_and_tldr for both solo and meeting recordings
- router: mirror subdir structure when copying to Obsidian vault

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 12:07:54 +02:00
thomas.kopp 1cfb9c127b fix: use vault+file URI format for Obsidian, more reliable than path=
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 11:57:47 +02:00
thomas.kopp fe8b8bb125 fix: auto-include transkript/zusammenfassung siblings when copying index to Obsidian vault
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 11:47:29 +02:00
thomas.kopp ca10cbb20b fix: call obsidian binary directly instead of xdg-open for URI handling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 11:42:23 +02:00
thomas.kopp 180fe43df7 fix: handle pyannote 4.x DiarizeOutput wrapper in diarize()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 11:34:02 +02:00
thomas.kopp 8ee11a31a1 fix: use token= instead of use_auth_token= for pyannote Pipeline.from_pretrained
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 11:15:14 +02:00
thomas.kopp 06f7361004 feat: write 3 files per solo recording (index + transkript + zusammenfassung)
- pipeline: call write_solo_docs() instead of save_transcript(); broadcast paths dict
- router: /open accepts paths list for Obsidian mode, copies all 3 files to vault
- app.js: store _modalPaths from saved event; Obsidian button sends all paths
- tests: test_write_solo_docs_creates_three_files added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 11:10:28 +02:00
thomas.kopp a37e09fb4e feat: copy transcript to Obsidian vault on open
Config: obsidian.vault path. On Obsidian button click, file is copied to
vault dir then opened via obsidian:// URI. Vault path configurable in settings.
2026-04-02 11:00:55 +02:00
thomas.kopp 6f718f0753 feat: add Obsidian open button; fix folder button using dolphin --select 2026-04-02 10:55:19 +02:00
thomas.kopp 348ce332c7 feat: add folder button to transcript modal 2026-04-02 10:47:08 +02:00
thomas.kopp 7e0851fc95 fix: pass whisper backend to solo pipeline transcribe_file call 2026-04-02 09:18:20 +02:00
thomas.kopp 11dee75ab3 fix: record at 48000 Hz — PipeWire virtual sinks reject 16 kHz resampling
Whisper and faster-whisper both handle arbitrary sample rates internally.
2026-04-02 09:14:34 +02:00
thomas.kopp b4e7e08918 fix: update audio devices test to mock sounddevice instead of pactl 2026-04-02 07:52:34 +02:00
thomas.kopp 04b655e664 fix: use sounddevice names for audio device list and combined source
- /audio/devices now returns sounddevice device names (not pactl source names)
  so the stored device name works directly with sd.InputStream
- /audio/combined maps sounddevice names back to pactl source names via
  description matching for the loopback commands
- Combined sink description set to 'transkriptor-combined' (no spaces) so
  sounddevice name matches the value stored in config
- Add _pactl_source_for_sd_name() helper for the mapping
2026-04-02 07:51:42 +02:00
thomas.kopp 251f9c238d fix: restore PipeWire combined source automatically on startup
Save mic/monitor device names to pipewire-modules.json alongside module IDs.
On startup, recreate transkriptor-combined if not already loaded.
2026-04-02 01:46:19 +02:00
thomas.kopp 1a61b53027 fix: serve /settings without auth header — JS handles token check 2026-04-02 01:38:17 +02:00
thomas.kopp c7cad4bb2a feat: add whisper.cpp ROCm backend support for AMD GPU acceleration
- transcription.py: new _transcribe_remote_whispercpp() using /inference endpoint
- transcription.py: backend param routes to openai or whispercpp remote path
- config.py: whisper.backend default 'openai', alt 'whispercpp'
- pipeline.py: passes backend from config to transcribe_file
- settings: backend dropdown (OpenAI-compat / whisper.cpp)
- SETUP.md: whisper.cpp ROCm build and systemd setup instructions

whisper-cpp-server running on beastix :8080 (ROCm0, gfx1030, RX 6800 XT)
2026-04-02 01:33:32 +02:00
thomas.kopp 56d41b8620 docs: add HuggingFace diarization setup instructions to SETUP.md 2026-04-02 01:18:55 +02:00
thomas.kopp 5f384af6cf feat: add diarization section to settings page
Adds a "Diarisierung" section with an enabled/disabled toggle,
HuggingFace token input, and a help link to pyannote/speaker-diarization-3.1.
loadConfig() and the save handler now persist diarization settings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 01:18:26 +02:00
thomas.kopp 0eb85b98f1 feat: add frontend speaker naming card for diarization
Shows a card with excerpt navigation and name inputs when the backend
emits speakers_unknown. Submitting posts the mapping to /speakers or
leaves speakers anonymous; handles awaiting_speakers status label.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 01:17:23 +02:00
thomas.kopp e04816fce6 feat: meeting pipeline — parallel diarization, speaker ID, 3-doc output 2026-04-02 01:13:24 +02:00
thomas.kopp 37e432f7fa feat: POST /speakers — resolves pipeline pause with speaker name mapping 2026-04-02 01:07:41 +02:00
thomas.kopp dbb35ce71d feat: AppState gains speaker pause fields and AWAITING_SPEAKERS status 2026-04-02 01:06:30 +02:00
thomas.kopp 033c1fc486 feat: write_meeting_docs() — creates index, transkript, zusammenfassung 2026-04-02 01:05:07 +02:00
thomas.kopp 9b5b89e159 feat: OllamaClient.identify_speakers() and summarize() for diarization pipeline 2026-04-02 01:03:40 +02:00
thomas.kopp b8cc8a3b33 feat: align_segments() — map Whisper timestamps to pyannote speakers 2026-04-02 01:00:58 +02:00
thomas.kopp 1a9d0eacc2 feat: Diarizer class wrapping pyannote/speaker-diarization-3.1 2026-04-02 00:59:50 +02:00
thomas.kopp 47909637a8 feat: transcribe_file returns timestamped segments when with_segments=True 2026-04-02 00:55:53 +02:00
thomas.kopp 7dfc0e0c5f feat: add diarization config defaults (enabled=false, hf_token) 2026-04-02 00:53:53 +02:00
thomas.kopp 7cd6c2a848 docs: diarization implementation plan (13 tasks) 2026-04-02 00:50:57 +02:00
thomas.kopp 8d1af32ef3 docs: diarization + speaker identification design 2026-04-02 00:46:18 +02:00
thomas.kopp 80ce1aa77c docs: add setup guide for Beastix server and client installation 2026-04-02 00:01:05 +02:00
thomas.kopp 52ba53bec4 fix: validate Ollama URL protocol before fetching api/tags 2026-04-01 20:51:23 +02:00
thomas.kopp 0bdc0a5e42 feat: settings page — PipeWire audio device + remote Whisper/Ollama config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 20:48:56 +02:00
thomas.kopp 81fbbfb56e feat: status includes is_admin, gear icon in header for admins
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 20:45:33 +02:00
thomas.kopp 2376bf5d71 fix: PUT /config deep-merges nested config instead of shallow update
Replaces cfg.update(body) with _deep_merge so partial updates (e.g.
setting whisper.base_url) no longer wipe sibling keys. Also persists
the merged config back to disk via tomli_w. Adds test_put_config_deep_merges.
2026-04-01 20:40:40 +02:00
thomas.kopp ff68827280 fix: module_ids as integers in response, add 403 test for POST /audio/combined 2026-04-01 20:38:43 +02:00
thomas.kopp 478a1ac9d0 feat: GET /audio/devices, POST /audio/combined — PipeWire source management 2026-04-01 20:36:27 +02:00
thomas.kopp ef4aa2a840 feat: AudioRecorder accepts device param — reads audio.device from config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 20:32:44 +02:00
thomas.kopp 5e7faa8844 fix: use get_running_loop() instead of deprecated get_event_loop() 2026-04-01 20:30:06 +02:00
thomas.kopp 8300851e77 feat: remote Whisper via whisper.base_url — OpenAI-compatible upload 2026-04-01 20:28:31 +02:00
thomas.kopp 912b333124 feat: add audio.device and whisper.base_url to config defaults 2026-04-01 20:25:48 +02:00
thomas.kopp 3f9abc6a89 docs: settings page + remote whisper design 2026-04-01 20:11:38 +02:00
thomas.kopp d8c6fc790b fix: define _guest_user() for tray/hotkey-triggered recording 2026-04-01 16:00:33 +02:00
thomas.kopp ccdc75c74c feat: show date and time in transcript list items 2026-04-01 14:40:01 +02:00
thomas.kopp b74147967b feat: tüit logo in header, clean transcript item layout with grouped action buttons 2026-04-01 14:37:03 +02:00
thomas.kopp 2ab6e7d73b fix: move reprocess button to transcript list item, remove from modal 2026-04-01 14:30:28 +02:00