04 β Environments and Setup
Why Multiple Virtual Environments
GPT-SoVITS pins a number of older / pinned ML packages
(torch, transformers, librosa, etc.) that conflict with the more
recent stack used by faster-whisper and llama-cpp-python. Forcing both
into one environment is brittle. The project therefore uses two
mandatory venvs and an optional third for the Qwen3-TTS backend:
| Venv | Python | Purpose | Created by |
|---|---|---|---|
venvs/venv_main | 3.11 | Vocal10n app: UI, STT, LLM, pipeline | setup_env.ps1 |
venvs/venv_tts | 3.11 | GPT-SoVITS HTTP server (api_v2.py) | setup_env.ps1 |
venvs/venv_qwen3tts | 3.11 | Qwen3-TTS HTTP server | manual |
The launch script (start.ps1) starts the GPT-SoVITS server from
venv_tts and the main application from venv_main. The two communicate
over 127.0.0.1:9880 (TTS) and 127.0.0.1:5124 (OBS overlay).
One-Time Setup
Prerequisites:
- Windows 10/11.
- Python 3.11 reachable via
py -3.11. - NVIDIA driver + CUDA 12.x runtime libraries.
- Visual C++ build tools (required by
llama-cpp-pythonif it falls back to source build).
Steps:
git clone https://github.com/itsLittleKevin/Vocal10n.git
cd Vocal10n
# Creates venvs/venv_main and venvs/venv_tts and installs requirements
.\setup_env.ps1
# Drop models into:
# models/llm/Qwen3-4B-Instruct-2507.Q4_K_M.gguf
# vendor/GPT-SoVITS/GPT_SoVITS/pretrained_models/...
# reference_audio/<your_clone_sample>.wav (+ matching .txt)
If llama-cpp-python fails to install with CUDA support, install it
explicitly with the matching wheel:
venvs\venv_main\Scripts\pip install llama-cpp-python `
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
The setup script accepts:
-SkipMain/-SkipTTSto update only one venv.-Forceto wipe and recreate.
Models You Need
| File | Path | Notes |
|---|---|---|
| Whisper large-v3-turbo | downloaded on first use into ~/.cache/huggingface | controlled by stt.model_size |
| Qwen3-4B GGUF | models/llm/Qwen3-4B-Instruct-2507.Q4_K_M.gguf | path is in translation.model_path |
| GPT-SoVITS pretrained | vendor/GPT-SoVITS/GPT_SoVITS/pretrained_models/ | follow upstream README |
| Reference WAV + text | reference_audio/ and vendor/GPT-SoVITS/reference_audio/ | path in tts.ref_audio_path |
Launching
.\start.ps1
The script:
- Verifies
venv_mainexists. - If port 9880 is free and
vendor/GPT-SoVITS/api_v2.pyexists, starts the SoVITS server hidden in the background and saves the PID. - Sets
PYTHONPATHto include bothvendor/GPT-SoVITS/andvendor/GPT-SoVITS/GPT_SoVITS/so the API module can resolve its relative imports. - Runs
python -m vocal10n.appfromvenv_main. - On exit, terminates the SoVITS process if it is still running.
start.bat is a thin wrapper that invokes the PowerShell script.