17 β Troubleshooting
Issues observed during development and how they were resolved. Use as a checklist when something breaks.
Setup
llama-cpp-python installs CPU-only.
Force the CUDA wheel:
venvs\venv_main\Scripts\pip install llama-cpp-python `
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
GPT-SoVITS server fails to start with import errors.
Make sure PYTHONPATH includes both vendor\GPT-SoVITS and
vendor\GPT-SoVITS\GPT_SoVITS before launching api_v2.py. The launch
script handles this; if running manually, replicate it (commit
857f3fb).
Port 9880 already in use.
The launch script skips starting a new SoVITS server in that case.
Either stop the existing process or pick a different port via tts.api_port.
Runtime
STT prints repeated phrases or boilerplate.
Two layers exist: edit config/filters.txt to add literal hallucination
phrases (commit 890b782), and trust the adjacent-dedup / short-phrase
repeat suppression in vocal10n.stt.filters (commit d6116c5). If
loops persist, check that stt.beam_size = 1 and that input audio is
not being clipped.
Subtitles confirm too late or split mid-sentence.
Tune stt.max_segment_age (raised from 2 s to 4 s in commit 53e3cbe).
Lower it for snappier confirmation; raise it for cleaner sentence
boundaries.
Audio device selection is flaky / duplicates appear.
PortAudio reports the same device under multiple host APIs. The UI
deduplicates per-name (commit 74be837). If a previously-used device
fails, re-select it from the dropdown to retrigger device opening.
TTS produces no audio but the queue drains.
Confirm tts.ref_audio_path resolves to an existing file. The client
uses absolute path resolution (commit 4a19037); a typo or moved file
fails silently in older builds.
TTS first response is slow.
Warm-up runs on server start in a background thread (commits 0f22fcc,
142f915). If you hear cold-start latency, watch the logs to confirm
the warm-up call returned successfully before user-driven synthesis.
Qwen3-TTS server output looks corrupted.
Stdout is reserved for the binary protocol and stderr is drained
separately (commit 2311042). Do not print() to stdout from
qwen3_server.py.
Echo loop: TTS gets re-transcribed by STT.
Ensure aec.enabled = true. If the room has very long reverb, raise
aec.filter_taps (default 2048 β 128 ms at 16 kHz). If user speech is
being attenuated during TTS playback, the double-talk threshold may be
too low β raise aec.dt_threshold (commit f334773).
UI
Combobox arrows or buttons are cut off / invisible.
Resolved across commits 46675dc through 2e3bcef. If a custom theme
re-introduces the issue, replicate the explicit min sizes used in
stt_tab.pyβs model selector.
Section A looks wrong after switching to Simple mode.
Pro and Simple modes have separate Section A layouts (commit e96bcb9).
Toggling the mode should rebuild the section; if it does not, restart
the app.
Simple mode βStart Allβ hangs.
Each stage has its own timeout. Check the status pill that is stuck and
inspect the logs β Qwen3-TTS warm-up has the longest budget (commit
81cedf4). The panel rolls back automatically on timeout.
Files / Output
No files appear in output/.
File writer is created only if at least one output.* flag is true.
Check the Output tab and verify output.directory is writable.
SRT timestamps look wrong. Timestamps come from the latency tracker / segment times, not wall-clock. If you start the session significantly after the first mic input, expect a negative-looking offset relative to wall-clock β but within-file timing should still be correct.