feat: add whisper.cpp ROCm backend support for AMD GPU acceleration

- transcription.py: new _transcribe_remote_whispercpp() using /inference endpoint
- transcription.py: backend param routes to openai or whispercpp remote path
- config.py: whisper.backend default 'openai', alt 'whispercpp'
- pipeline.py: passes backend from config to transcribe_file
- settings: backend dropdown (OpenAI-compat / whisper.cpp)
- SETUP.md: whisper.cpp ROCm build and systemd setup instructions

whisper-cpp-server running on beastix :8080 (ROCm0, gfx1030, RX 6800 XT)
This commit is contained in:
2026-04-02 01:33:32 +02:00
parent 56d41b8620
commit c7cad4bb2a
6 changed files with 75 additions and 19 deletions
+29 -18
View File
@@ -20,34 +20,41 @@ Einstellungsseite.
## Beastix (Server-Setup, einmalig)
### 1. faster-whisper-server installieren
### 1. whisper.cpp mit ROCm/GPU kompilieren
Voraussetzung: ROCm installiert (Arch: `sudo pacman -S rocm-hip-sdk`).
```bash
sudo pacman -S python-pipx # Arch Linux
pipx install faster-whisper-server
pipx ensurepath
mkdir -p ~/src && cd ~/src
git clone https://github.com/ggml-org/whisper.cpp.git --depth=1
cd whisper.cpp
# Für AMD RX 6800 XT (gfx1030) — gfx-Target ggf. anpassen
cmake -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release -DWHISPER_BUILD_SERVER=ON
cmake --build build -j$(nproc)
# Modell large-v3 herunterladen (~2.9 GB)
bash models/download-ggml-model.sh large-v3
```
**Bekannter Bug in Version 0.0.2** — fehlende `pyproject.toml` im pipx-venv:
```bash
cat > ~/.local/share/pipx/venvs/faster-whisper-server/lib/python*/site-packages/pyproject.toml << 'EOF'
[project]
name = "faster-whisper-server"
version = "0.0.2"
EOF
```
`gfx1030` = RX 6800 XT. Andere AMD GPUs: `rocminfo | grep gfx`
### 2. Als systemd-User-Service einrichten
```bash
cat > ~/.config/systemd/user/faster-whisper-server.service << 'EOF'
cat > ~/.config/systemd/user/whisper-cpp-server.service << 'EOF'
[Unit]
Description=faster-whisper-server (OpenAI-compatible Whisper API)
Description=whisper.cpp Server (ROCm/GPU)
After=network.target
[Service]
ExecStart=%h/.local/bin/faster-whisper-server --host 0.0.0.0 --port 8000 --model large-v3
ExecStart=%h/src/whisper.cpp/build/bin/whisper-server \
--host 0.0.0.0 \
--port 8080 \
--model %h/src/whisper.cpp/models/ggml-large-v3.bin \
--language de \
--threads 4 \
--convert
Restart=on-failure
RestartSec=5
@@ -56,9 +63,12 @@ WantedBy=default.target
EOF
systemctl --user daemon-reload
systemctl --user enable --now faster-whisper-server.service
systemctl --user enable --now whisper-cpp-server.service
```
Logs prüfen: `journalctl --user -u whisper-cpp-server -f`
GPU-Nutzung bestätigt wenn in den Logs steht: `using ROCm0 backend`
### 3. Ollama installieren (falls noch nicht vorhanden)
```bash
@@ -105,7 +115,8 @@ Als Admin einloggen → Zahnrad-Icon im Header → Einstellungen:
| Feld | Wert (Beispiel) |
|------|-----------------|
| Whisper Server URL | `http://beastix:8000` |
| Whisper Backend | `whisper.cpp Server` |
| Whisper Server URL | `http://beastix:8080` |
| Whisper Modell | `large-v3` |
| Ollama Server URL | `http://beastix:11434` |
| Ollama Modell | `gemma3:12b` (aus Dropdown wählen) |