04 — Environments and Setup

Why Multiple Virtual Environments

GPT-SoVITS pins a number of older / pinned ML packages (torch, transformers, librosa, etc.) that conflict with the more recent stack used by faster-whisper and llama-cpp-python. Forcing both into one environment is brittle. The project therefore uses two mandatory venvs and an optional third for the Qwen3-TTS backend:

Venv	Python	Purpose	Created by
`venvs/venv_main`	3.11	Vocal10n app: UI, STT, LLM, pipeline	`setup_env.ps1`
`venvs/venv_tts`	3.11	GPT-SoVITS HTTP server (`api_v2.py`)	`setup_env.ps1`
`venvs/venv_qwen3tts`	3.11	Qwen3-TTS HTTP server	manual

The launch script (start.ps1) starts the GPT-SoVITS server from venv_tts and the main application from venv_main. The two communicate over 127.0.0.1:9880 (TTS) and 127.0.0.1:5124 (OBS overlay).

One-Time Setup

Prerequisites:

Windows 10/11.
Python 3.11 reachable via py -3.11.
NVIDIA driver + CUDA 12.x runtime libraries.
Visual C++ build tools (required by llama-cpp-python if it falls back to source build).

Steps:

git clone https://github.com/itsLittleKevin/Vocal10n.git
cd Vocal10n

# Creates venvs/venv_main and venvs/venv_tts and installs requirements
.\setup_env.ps1

# Drop models into:
#   models/llm/Qwen3-4B-Instruct-2507.Q4_K_M.gguf
#   vendor/GPT-SoVITS/GPT_SoVITS/pretrained_models/...
#   reference_audio/<your_clone_sample>.wav (+ matching .txt)

If llama-cpp-python fails to install with CUDA support, install it explicitly with the matching wheel:

venvs\venv_main\Scripts\pip install llama-cpp-python `
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

The setup script accepts:

-SkipMain / -SkipTTS to update only one venv.
-Force to wipe and recreate.

Models You Need

File	Path	Notes
Whisper large-v3-turbo	downloaded on first use into `~/.cache/huggingface`	controlled by `stt.model_size`
Qwen3-4B GGUF	`models/llm/Qwen3-4B-Instruct-2507.Q4_K_M.gguf`	path is in `translation.model_path`
GPT-SoVITS pretrained	`vendor/GPT-SoVITS/GPT_SoVITS/pretrained_models/`	follow upstream README
Reference WAV + text	`reference_audio/` and `vendor/GPT-SoVITS/reference_audio/`	path in `tts.ref_audio_path`

Launching

.\start.ps1

The script:

Verifies venv_main exists.
If port 9880 is free and vendor/GPT-SoVITS/api_v2.py exists, starts the SoVITS server hidden in the background and saves the PID.
Sets PYTHONPATH to include both vendor/GPT-SoVITS/ and vendor/GPT-SoVITS/GPT_SoVITS/ so the API module can resolve its relative imports.
Runs python -m vocal10n.app from venv_main.
On exit, terminates the SoVITS process if it is still running.

start.bat is a thin wrapper that invokes the PowerShell script.