02 — Architecture

Process Model

Vocal10n runs as two cooperating processes:

flowchart LR
    App["vocal10n.app<br/>(venv_main)<br/>PySide6 UI · STT · LLM"]
    SoVITS["GPT-SoVITS api_v2.py<br/>(venv_tts subprocess)"]
    Qwen3["Qwen3-TTS server<br/>(venv_qwen3tts subprocess)"]
    OBS["OBS Browser Source"]

    App <-->|"HTTP :9880"| SoVITS
    App <-.->|"HTTP (alt backend)"| Qwen3
    App -->|"HTTP :5124"| OBS

The Qwen3-TTS backend, when selected, runs as another managed subprocess launched from vocal10n.tts.qwen3_server. Only one TTS backend is active at a time; the UI swaps between them via tts_container_tab.

Threading Model

The main process is split into Qt’s main thread plus a small set of worker threads. The boundary is the EventDispatcher (pub/sub) defined in vocal10n.pipeline.events.

Thread	Owner	Responsibility
Qt main	`QApplication`	UI, signal/slot delivery, `SystemState` reads/writes
`STTWorker`	`vocal10n.stt.worker`	Pull audio frames, run Whisper, publish text events
`AudioCapture` callback	`sounddevice`	Mic chunks → ring buffer + AEC
`LLMWorker` (in `LLMController`)	`vocal10n.llm.controller`	Translate confirmed and pending text
`TTSQueue` worker	`vocal10n.tts.queue`	Pull translated text, call TTS HTTP API
Audio playback	`vocal10n.tts.audio_output`	`sounddevice` output stream + crossfade
File writer	`vocal10n.pipeline.file_writer`	Async SRT/TXT/WAV I/O
GPU monitor	`vocal10n.utils.gpu`	Periodic VRAM / util sampling

SystemState (vocal10n.state.SystemState) is a QObject with a threading.RLock. Every property setter emits a Qt signal so UI widgets can bind without polling. Worker threads write state through these properties; the Qt event loop delivers signals to widgets in the UI thread.

Data Flow (Happy Path)

flowchart TD
    Mic(["mic chunk"]) --> Cap["AudioCapture<br/>16 kHz mono · 0.2 s"]
    Cap --> AEC["AEC<br/>NLMS + double-talk detector"]
    AEC --> Whisper["Whisper sliding window<br/>window_seconds = 6.5"]
    Whisper --> Segs["STTEngine.transcribe<br/>SegmentResult[]"]
    Segs --> Filt["HallucinationFilter<br/>dedup · phonetic correction"]

    Filt --> Partial{{"STT_PARTIAL"}}
    Filt --> Confirmed{{"STT_CONFIRMED"}}

    Partial --> OBSlive["OBS overlay<br/>(live)"]

    Confirmed --> LLMp["LLMController.translate_pending"]
    Confirmed --> LLMc["LLMController.translate_confirmed"]

    LLMp --> TPartial{{"TRANSLATION_PARTIAL"}}
    LLMc --> TConfirmed{{"TRANSLATION_CONFIRMED"}}

    TPartial --> OBSlive
    TConfirmed --> Q["TTSQueue.enqueue"]
    TConfirmed --> FW["FileWriter<br/>TXT · SRT"]
    TConfirmed --> OBS2["OBS overlay<br/>(confirmed)"]

    Q --> HTTP["TTS HTTP"] --> Play["AudioPlayer"]

Module Map

src/vocal10n/
├── app.py              # Entry point. Creates QApplication, theme, state, MainWindow.
├── config.py           # YAML loader, dotted-key access, get_config singleton.
├── constants.py        # Enums: Language, ModelStatus, EventType.
├── state.py            # SystemState QObject (thread-safe, signal-emitting).
│
├── stt/
│   ├── audio_capture.py    # sounddevice mic input, ring buffer.
│   ├── playback_aec.py     # PlaybackTimeline + NLMS AEC + DTD.
│   ├── engine.py           # FasterWhisper wrapper.
│   ├── filters.py          # Hallucination filter, dedup, phonetic correction.
│   ├── transcript.py       # Segment confirmation logic.
│   ├── diarizer.py         # Optional speaker tagger.
│   ├── controller.py       # Orchestrates load/unload, language changes.
│   └── worker.py           # Background thread driving the engine.
│
├── llm/
│   ├── engine.py           # llama-cpp-python loader.
│   ├── api_backend.py      # OpenAI-compatible HTTP client.
│   ├── corrector.py        # Glossary / RAG-based source correction.
│   ├── translator.py       # Prompt templates, ChatML, generation.
│   ├── rag.py              # Vector retrieval for large glossaries.
│   └── controller.py       # Wires events, debounce, context window.
│
├── tts/
│   ├── client.py               # GPT-SoVITS HTTP client.
│   ├── server_manager.py       # Subprocess launcher / health check for SoVITS.
│   ├── queue.py                # TTS queue with pruning.
│   ├── audio_output.py         # Streaming playback + crossfade.
│   ├── controller.py           # Wires events for GPT-SoVITS path.
│   ├── qwen3_server.py         # Qwen3-TTS subprocess server.
│   ├── qwen3_client.py         # Qwen3-TTS protocol client.
│   └── qwen3_controller.py     # Wires events for Qwen3-TTS path.
│
├── pipeline/
│   ├── events.py           # Event types, EventDispatcher, dataclass payloads.
│   ├── coordinator.py      # Session lifecycle, FileWriter ownership.
│   ├── latency.py          # Latency tracker (per-stage stamps).
│   └── file_writer.py      # Async TXT/SRT/WAV writers.
│
├── obs/
│   ├── server.py           # HTTP server on 127.0.0.1:5124.
│   └── overlay.html        # Browser-source HTML/CSS template.
│
├── ui/
│   ├── main_window.py      # Window, menu, mode toggle.
│   ├── section_a.py        # Top: live text + metrics (Pro variant).
│   ├── section_b.py        # Bottom: tab container.
│   ├── tabs/               # stt_tab, translation_tab, tts_container_tab,
│   │                       # tts_tab, qwen3_tts_tab, output_tab, obs_tab,
│   │                       # kb_tab, training_tab.
│   ├── widgets/            # param_slider, model_selector, stream_text,
│   │                       # term_file_list, filter_list_editor,
│   │                       # simple_mode_panel.
│   └── styles/theme.qss    # Dark theme.
│
└── utils/
    ├── gpu.py              # pynvml wrapper.
    └── logger.py           # Logging configuration.

See 03 — Repository Layout for the full tree including non-source folders.