02 β Architecture
Process Model
Vocal10n runs as two cooperating processes:
flowchart LR
App["vocal10n.app<br/>(venv_main)<br/>PySide6 UI Β· STT Β· LLM"]
SoVITS["GPT-SoVITS api_v2.py<br/>(venv_tts subprocess)"]
Qwen3["Qwen3-TTS server<br/>(venv_qwen3tts subprocess)"]
OBS["OBS Browser Source"]
App <-->|"HTTP :9880"| SoVITS
App <-.->|"HTTP (alt backend)"| Qwen3
App -->|"HTTP :5124"| OBS
The Qwen3-TTS backend, when selected, runs as another managed subprocess
launched from vocal10n.tts.qwen3_server. Only one TTS backend is active
at a time; the UI swaps between them via tts_container_tab.
Threading Model
The main process is split into Qtβs main thread plus a small set of
worker threads. The boundary is the EventDispatcher (pub/sub) defined in
vocal10n.pipeline.events.
| Thread | Owner | Responsibility |
|---|---|---|
| Qt main | QApplication | UI, signal/slot delivery, SystemState reads/writes |
STTWorker | vocal10n.stt.worker | Pull audio frames, run Whisper, publish text events |
AudioCapture callback | sounddevice | Mic chunks β ring buffer + AEC |
LLMWorker (in LLMController) | vocal10n.llm.controller | Translate confirmed and pending text |
TTSQueue worker | vocal10n.tts.queue | Pull translated text, call TTS HTTP API |
| Audio playback | vocal10n.tts.audio_output | sounddevice output stream + crossfade |
| File writer | vocal10n.pipeline.file_writer | Async SRT/TXT/WAV I/O |
| GPU monitor | vocal10n.utils.gpu | Periodic VRAM / util sampling |
SystemState (vocal10n.state.SystemState) is a QObject with a
threading.RLock. Every property setter emits a Qt signal so UI widgets
can bind without polling. Worker threads write state through these
properties; the Qt event loop delivers signals to widgets in the UI
thread.
Data Flow (Happy Path)
flowchart TD
Mic(["mic chunk"]) --> Cap["AudioCapture<br/>16 kHz mono Β· 0.2 s"]
Cap --> AEC["AEC<br/>NLMS + double-talk detector"]
AEC --> Whisper["Whisper sliding window<br/>window_seconds = 6.5"]
Whisper --> Segs["STTEngine.transcribe<br/>SegmentResult[]"]
Segs --> Filt["HallucinationFilter<br/>dedup Β· phonetic correction"]
Filt --> Partial{{"STT_PARTIAL"}}
Filt --> Confirmed{{"STT_CONFIRMED"}}
Partial --> OBSlive["OBS overlay<br/>(live)"]
Confirmed --> LLMp["LLMController.translate_pending"]
Confirmed --> LLMc["LLMController.translate_confirmed"]
LLMp --> TPartial{{"TRANSLATION_PARTIAL"}}
LLMc --> TConfirmed{{"TRANSLATION_CONFIRMED"}}
TPartial --> OBSlive
TConfirmed --> Q["TTSQueue.enqueue"]
TConfirmed --> FW["FileWriter<br/>TXT Β· SRT"]
TConfirmed --> OBS2["OBS overlay<br/>(confirmed)"]
Q --> HTTP["TTS HTTP"] --> Play["AudioPlayer"]
Module Map
src/vocal10n/
βββ app.py # Entry point. Creates QApplication, theme, state, MainWindow.
βββ config.py # YAML loader, dotted-key access, get_config singleton.
βββ constants.py # Enums: Language, ModelStatus, EventType.
βββ state.py # SystemState QObject (thread-safe, signal-emitting).
β
βββ stt/
β βββ audio_capture.py # sounddevice mic input, ring buffer.
β βββ playback_aec.py # PlaybackTimeline + NLMS AEC + DTD.
β βββ engine.py # FasterWhisper wrapper.
β βββ filters.py # Hallucination filter, dedup, phonetic correction.
β βββ transcript.py # Segment confirmation logic.
β βββ diarizer.py # Optional speaker tagger.
β βββ controller.py # Orchestrates load/unload, language changes.
β βββ worker.py # Background thread driving the engine.
β
βββ llm/
β βββ engine.py # llama-cpp-python loader.
β βββ api_backend.py # OpenAI-compatible HTTP client.
β βββ corrector.py # Glossary / RAG-based source correction.
β βββ translator.py # Prompt templates, ChatML, generation.
β βββ rag.py # Vector retrieval for large glossaries.
β βββ controller.py # Wires events, debounce, context window.
β
βββ tts/
β βββ client.py # GPT-SoVITS HTTP client.
β βββ server_manager.py # Subprocess launcher / health check for SoVITS.
β βββ queue.py # TTS queue with pruning.
β βββ audio_output.py # Streaming playback + crossfade.
β βββ controller.py # Wires events for GPT-SoVITS path.
β βββ qwen3_server.py # Qwen3-TTS subprocess server.
β βββ qwen3_client.py # Qwen3-TTS protocol client.
β βββ qwen3_controller.py # Wires events for Qwen3-TTS path.
β
βββ pipeline/
β βββ events.py # Event types, EventDispatcher, dataclass payloads.
β βββ coordinator.py # Session lifecycle, FileWriter ownership.
β βββ latency.py # Latency tracker (per-stage stamps).
β βββ file_writer.py # Async TXT/SRT/WAV writers.
β
βββ obs/
β βββ server.py # HTTP server on 127.0.0.1:5124.
β βββ overlay.html # Browser-source HTML/CSS template.
β
βββ ui/
β βββ main_window.py # Window, menu, mode toggle.
β βββ section_a.py # Top: live text + metrics (Pro variant).
β βββ section_b.py # Bottom: tab container.
β βββ tabs/ # stt_tab, translation_tab, tts_container_tab,
β β # tts_tab, qwen3_tts_tab, output_tab, obs_tab,
β β # kb_tab, training_tab.
β βββ widgets/ # param_slider, model_selector, stream_text,
β β # term_file_list, filter_list_editor,
β β # simple_mode_panel.
β βββ styles/theme.qss # Dark theme.
β
βββ utils/
βββ gpu.py # pynvml wrapper.
βββ logger.py # Logging configuration.
See 03 β Repository Layout for the full tree including non-source folders.