04 β€” Environments and Setup

Why Multiple Virtual Environments

GPT-SoVITS pins a number of older / pinned ML packages (torch, transformers, librosa, etc.) that conflict with the more recent stack used by faster-whisper and llama-cpp-python. Forcing both into one environment is brittle. The project therefore uses two mandatory venvs and an optional third for the Qwen3-TTS backend:

VenvPythonPurposeCreated by
venvs/venv_main3.11Vocal10n app: UI, STT, LLM, pipelinesetup_env.ps1
venvs/venv_tts3.11GPT-SoVITS HTTP server (api_v2.py)setup_env.ps1
venvs/venv_qwen3tts3.11Qwen3-TTS HTTP servermanual

The launch script (start.ps1) starts the GPT-SoVITS server from venv_tts and the main application from venv_main. The two communicate over 127.0.0.1:9880 (TTS) and 127.0.0.1:5124 (OBS overlay).

One-Time Setup

Prerequisites:

  • Windows 10/11.
  • Python 3.11 reachable via py -3.11.
  • NVIDIA driver + CUDA 12.x runtime libraries.
  • Visual C++ build tools (required by llama-cpp-python if it falls back to source build).

Steps:

git clone https://github.com/itsLittleKevin/Vocal10n.git
cd Vocal10n

# Creates venvs/venv_main and venvs/venv_tts and installs requirements
.\setup_env.ps1

# Drop models into:
#   models/llm/Qwen3-4B-Instruct-2507.Q4_K_M.gguf
#   vendor/GPT-SoVITS/GPT_SoVITS/pretrained_models/...
#   reference_audio/<your_clone_sample>.wav (+ matching .txt)

If llama-cpp-python fails to install with CUDA support, install it explicitly with the matching wheel:

venvs\venv_main\Scripts\pip install llama-cpp-python `
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

The setup script accepts:

  • -SkipMain / -SkipTTS to update only one venv.
  • -Force to wipe and recreate.

Models You Need

FilePathNotes
Whisper large-v3-turbodownloaded on first use into ~/.cache/huggingfacecontrolled by stt.model_size
Qwen3-4B GGUFmodels/llm/Qwen3-4B-Instruct-2507.Q4_K_M.ggufpath is in translation.model_path
GPT-SoVITS pretrainedvendor/GPT-SoVITS/GPT_SoVITS/pretrained_models/follow upstream README
Reference WAV + textreference_audio/ and vendor/GPT-SoVITS/reference_audio/path in tts.ref_audio_path

Launching

.\start.ps1

The script:

  1. Verifies venv_main exists.
  2. If port 9880 is free and vendor/GPT-SoVITS/api_v2.py exists, starts the SoVITS server hidden in the background and saves the PID.
  3. Sets PYTHONPATH to include both vendor/GPT-SoVITS/ and vendor/GPT-SoVITS/GPT_SoVITS/ so the API module can resolve its relative imports.
  4. Runs python -m vocal10n.app from venv_main.
  5. On exit, terminates the SoVITS process if it is still running.

start.bat is a thin wrapper that invokes the PowerShell script.