17 β€” Troubleshooting

Issues observed during development and how they were resolved. Use as a checklist when something breaks.

Setup

llama-cpp-python installs CPU-only. Force the CUDA wheel:

venvs\venv_main\Scripts\pip install llama-cpp-python `
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

GPT-SoVITS server fails to start with import errors. Make sure PYTHONPATH includes both vendor\GPT-SoVITS and vendor\GPT-SoVITS\GPT_SoVITS before launching api_v2.py. The launch script handles this; if running manually, replicate it (commit 857f3fb).

Port 9880 already in use. The launch script skips starting a new SoVITS server in that case. Either stop the existing process or pick a different port via tts.api_port.

Runtime

STT prints repeated phrases or boilerplate. Two layers exist: edit config/filters.txt to add literal hallucination phrases (commit 890b782), and trust the adjacent-dedup / short-phrase repeat suppression in vocal10n.stt.filters (commit d6116c5). If loops persist, check that stt.beam_size = 1 and that input audio is not being clipped.

Subtitles confirm too late or split mid-sentence. Tune stt.max_segment_age (raised from 2 s to 4 s in commit 53e3cbe). Lower it for snappier confirmation; raise it for cleaner sentence boundaries.

Audio device selection is flaky / duplicates appear. PortAudio reports the same device under multiple host APIs. The UI deduplicates per-name (commit 74be837). If a previously-used device fails, re-select it from the dropdown to retrigger device opening.

TTS produces no audio but the queue drains. Confirm tts.ref_audio_path resolves to an existing file. The client uses absolute path resolution (commit 4a19037); a typo or moved file fails silently in older builds.

TTS first response is slow. Warm-up runs on server start in a background thread (commits 0f22fcc, 142f915). If you hear cold-start latency, watch the logs to confirm the warm-up call returned successfully before user-driven synthesis.

Qwen3-TTS server output looks corrupted. Stdout is reserved for the binary protocol and stderr is drained separately (commit 2311042). Do not print() to stdout from qwen3_server.py.

Echo loop: TTS gets re-transcribed by STT. Ensure aec.enabled = true. If the room has very long reverb, raise aec.filter_taps (default 2048 β‰ˆ 128 ms at 16 kHz). If user speech is being attenuated during TTS playback, the double-talk threshold may be too low β€” raise aec.dt_threshold (commit f334773).

UI

Combobox arrows or buttons are cut off / invisible. Resolved across commits 46675dc through 2e3bcef. If a custom theme re-introduces the issue, replicate the explicit min sizes used in stt_tab.py’s model selector.

Section A looks wrong after switching to Simple mode. Pro and Simple modes have separate Section A layouts (commit e96bcb9). Toggling the mode should rebuild the section; if it does not, restart the app.

Simple mode β€œStart All” hangs. Each stage has its own timeout. Check the status pill that is stuck and inspect the logs β€” Qwen3-TTS warm-up has the longest budget (commit 81cedf4). The panel rolls back automatically on timeout.

Files / Output

No files appear in output/. File writer is created only if at least one output.* flag is true. Check the Output tab and verify output.directory is writable.

SRT timestamps look wrong. Timestamps come from the latency tracker / segment times, not wall-clock. If you start the session significantly after the first mic input, expect a negative-looking offset relative to wall-clock β€” but within-file timing should still be correct.