feat: add whisper.cpp ROCm backend support for AMD GPU acceleration
- transcription.py: new _transcribe_remote_whispercpp() using /inference endpoint - transcription.py: backend param routes to openai or whispercpp remote path - config.py: whisper.backend default 'openai', alt 'whispercpp' - pipeline.py: passes backend from config to transcribe_file - settings: backend dropdown (OpenAI-compat / whisper.cpp) - SETUP.md: whisper.cpp ROCm build and systemd setup instructions whisper-cpp-server running on beastix :8080 (ROCm0, gfx1030, RX 6800 XT)
This commit is contained in:
+29
-18
@@ -20,34 +20,41 @@ Einstellungsseite.
|
||||
|
||||
## Beastix (Server-Setup, einmalig)
|
||||
|
||||
### 1. faster-whisper-server installieren
|
||||
### 1. whisper.cpp mit ROCm/GPU kompilieren
|
||||
|
||||
Voraussetzung: ROCm installiert (Arch: `sudo pacman -S rocm-hip-sdk`).
|
||||
|
||||
```bash
|
||||
sudo pacman -S python-pipx # Arch Linux
|
||||
pipx install faster-whisper-server
|
||||
pipx ensurepath
|
||||
mkdir -p ~/src && cd ~/src
|
||||
git clone https://github.com/ggml-org/whisper.cpp.git --depth=1
|
||||
cd whisper.cpp
|
||||
|
||||
# Für AMD RX 6800 XT (gfx1030) — gfx-Target ggf. anpassen
|
||||
cmake -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release -DWHISPER_BUILD_SERVER=ON
|
||||
cmake --build build -j$(nproc)
|
||||
|
||||
# Modell large-v3 herunterladen (~2.9 GB)
|
||||
bash models/download-ggml-model.sh large-v3
|
||||
```
|
||||
|
||||
**Bekannter Bug in Version 0.0.2** — fehlende `pyproject.toml` im pipx-venv:
|
||||
|
||||
```bash
|
||||
cat > ~/.local/share/pipx/venvs/faster-whisper-server/lib/python*/site-packages/pyproject.toml << 'EOF'
|
||||
[project]
|
||||
name = "faster-whisper-server"
|
||||
version = "0.0.2"
|
||||
EOF
|
||||
```
|
||||
`gfx1030` = RX 6800 XT. Andere AMD GPUs: `rocminfo | grep gfx`
|
||||
|
||||
### 2. Als systemd-User-Service einrichten
|
||||
|
||||
```bash
|
||||
cat > ~/.config/systemd/user/faster-whisper-server.service << 'EOF'
|
||||
cat > ~/.config/systemd/user/whisper-cpp-server.service << 'EOF'
|
||||
[Unit]
|
||||
Description=faster-whisper-server (OpenAI-compatible Whisper API)
|
||||
Description=whisper.cpp Server (ROCm/GPU)
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
ExecStart=%h/.local/bin/faster-whisper-server --host 0.0.0.0 --port 8000 --model large-v3
|
||||
ExecStart=%h/src/whisper.cpp/build/bin/whisper-server \
|
||||
--host 0.0.0.0 \
|
||||
--port 8080 \
|
||||
--model %h/src/whisper.cpp/models/ggml-large-v3.bin \
|
||||
--language de \
|
||||
--threads 4 \
|
||||
--convert
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
@@ -56,9 +63,12 @@ WantedBy=default.target
|
||||
EOF
|
||||
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable --now faster-whisper-server.service
|
||||
systemctl --user enable --now whisper-cpp-server.service
|
||||
```
|
||||
|
||||
Logs prüfen: `journalctl --user -u whisper-cpp-server -f`
|
||||
GPU-Nutzung bestätigt wenn in den Logs steht: `using ROCm0 backend`
|
||||
|
||||
### 3. Ollama installieren (falls noch nicht vorhanden)
|
||||
|
||||
```bash
|
||||
@@ -105,7 +115,8 @@ Als Admin einloggen → Zahnrad-Icon im Header → Einstellungen:
|
||||
|
||||
| Feld | Wert (Beispiel) |
|
||||
|------|-----------------|
|
||||
| Whisper Server URL | `http://beastix:8000` |
|
||||
| Whisper Backend | `whisper.cpp Server` |
|
||||
| Whisper Server URL | `http://beastix:8080` |
|
||||
| Whisper Modell | `large-v3` |
|
||||
| Ollama Server URL | `http://beastix:11434` |
|
||||
| Ollama Modell | `gemma3:12b` (aus Dropdown wählen) |
|
||||
|
||||
Reference in New Issue
Block a user