03 β€” Repository Layout

Vocal10n/
β”œβ”€β”€ README.md                  # User-facing quick start.
β”œβ”€β”€ LICENSE                    # MIT.
β”œβ”€β”€ pyproject.toml             # Package metadata for `vocal10n`.
β”œβ”€β”€ setup_env.ps1              # Creates venv_main + venv_tts, installs deps.
β”œβ”€β”€ start.ps1 / start.bat      # Launches GPT-SoVITS subprocess + main app.
β”œβ”€β”€ initialplan.md             # Original project plan (Chinese + English).
β”œβ”€β”€ promptfile.md              # Prompt history / scratchpad.
β”œβ”€β”€ simple_ui_plan.md          # Simple/Pro mode design doc.
β”œβ”€β”€ simple_ui_validation.md    # Manual QA checklist for Simple mode.
β”œβ”€β”€ current.logs               # Last run log capture (transient).
β”‚
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ default.yaml           # Single source of truth for runtime config.
β”‚   β”œβ”€β”€ context_gaming.txt     # Example domain context for translation.
β”‚   └── filters.txt            # Hallucination filter list.
β”‚
β”œβ”€β”€ knowledge_base/
β”‚   └── glossary_general.txt   # Default glossary used by the corrector.
β”‚
β”œβ”€β”€ stt_terms/
β”‚   └── context_gaming.txt     # Example STT initial-prompt term list.
β”‚
β”œβ”€β”€ reference_audio/           # User-provided voice clone references.
β”‚   β”œβ”€β”€ audio_03.txt
β”‚   └── README.md
β”‚
β”œβ”€β”€ models/                    # Local model storage (git-ignored).
β”‚   β”œβ”€β”€ stt/                   # FasterWhisper caches.
β”‚   β”œβ”€β”€ llm/Qwen3-4B-Instruct-2507.Q4_K_M.gguf
β”‚   └── tts/                   # GPT-SoVITS pretrained weights.
β”‚
β”œβ”€β”€ output/
β”‚   β”œβ”€β”€ audio/                 # WAV recordings (when save_wav).
β”‚   β”œβ”€β”€ subtitles/             # *_source.srt / *_target.srt pairs.
β”‚   └── training_data/         # Future training output.
β”‚
β”œβ”€β”€ requirements/
β”‚   β”œβ”€β”€ requirements-main.txt  # venv_main deps.
β”‚   └── requirements-tts.txt   # venv_tts deps (in addition to vendor reqs).
β”‚
β”œβ”€β”€ src/vocal10n/              # Application package β€” see chapter 02.
β”œβ”€β”€ src/vocal10n.egg-info/     # Generated by editable install.
β”‚
β”œβ”€β”€ vendor/
β”‚   β”œβ”€β”€ GPT-SoVITS/            # Vendored upstream repository.
β”‚   └── Qwen3/                 # Vendored Qwen3-TTS source.
β”‚
β”œβ”€β”€ venvs/
β”‚   β”œβ”€β”€ venv_main/             # Python 3.11 β€” STT + LLM + UI.
β”‚   β”œβ”€β”€ venv_tts/              # Python 3.11 β€” GPT-SoVITS server.
β”‚   └── venv_qwen3tts/         # Python 3.10/3.11 β€” Qwen3-TTS server.
β”‚
β”œβ”€β”€ temp_qwen3tts/             # Scratch dir used during Qwen3-TTS bring-up.
β”œβ”€β”€ training/                  # Reserved for future training tooling.
β”‚
β”œβ”€β”€ Vocal10n-prebuild/         # Legacy reference implementation. Not packaged.
└── doc/                       # This documentation set.

Notable Conventions

  • src/ layout with pyproject.toml. Install with pip install -e . inside the venv (the setup script does this for you via the requirements file pulling in the package).
  • One YAML config. config/default.yaml is the only config file loaded at runtime. UI changes that should persist write back here.
  • Vendored models, not submodules. GPT-SoVITS lives under vendor/GPT-SoVITS/ to pin a known-good revision and to avoid a network dependency at install time.
  • Output filenames are YYYY-MM-DD_HH-MM-SS_source.srt / _target.srt, so each session is grouped by start timestamp.