02 β€” Architecture

Process Model

Vocal10n runs as two cooperating processes:

flowchart LR
    App["vocal10n.app<br/>(venv_main)<br/>PySide6 UI Β· STT Β· LLM"]
    SoVITS["GPT-SoVITS api_v2.py<br/>(venv_tts subprocess)"]
    Qwen3["Qwen3-TTS server<br/>(venv_qwen3tts subprocess)"]
    OBS["OBS Browser Source"]

    App <-->|"HTTP :9880"| SoVITS
    App <-.->|"HTTP (alt backend)"| Qwen3
    App -->|"HTTP :5124"| OBS

The Qwen3-TTS backend, when selected, runs as another managed subprocess launched from vocal10n.tts.qwen3_server. Only one TTS backend is active at a time; the UI swaps between them via tts_container_tab.

Threading Model

The main process is split into Qt’s main thread plus a small set of worker threads. The boundary is the EventDispatcher (pub/sub) defined in vocal10n.pipeline.events.

ThreadOwnerResponsibility
Qt mainQApplicationUI, signal/slot delivery, SystemState reads/writes
STTWorkervocal10n.stt.workerPull audio frames, run Whisper, publish text events
AudioCapture callbacksounddeviceMic chunks β†’ ring buffer + AEC
LLMWorker (in LLMController)vocal10n.llm.controllerTranslate confirmed and pending text
TTSQueue workervocal10n.tts.queuePull translated text, call TTS HTTP API
Audio playbackvocal10n.tts.audio_outputsounddevice output stream + crossfade
File writervocal10n.pipeline.file_writerAsync SRT/TXT/WAV I/O
GPU monitorvocal10n.utils.gpuPeriodic VRAM / util sampling

SystemState (vocal10n.state.SystemState) is a QObject with a threading.RLock. Every property setter emits a Qt signal so UI widgets can bind without polling. Worker threads write state through these properties; the Qt event loop delivers signals to widgets in the UI thread.

Data Flow (Happy Path)

flowchart TD
    Mic(["mic chunk"]) --> Cap["AudioCapture<br/>16 kHz mono Β· 0.2 s"]
    Cap --> AEC["AEC<br/>NLMS + double-talk detector"]
    AEC --> Whisper["Whisper sliding window<br/>window_seconds = 6.5"]
    Whisper --> Segs["STTEngine.transcribe<br/>SegmentResult[]"]
    Segs --> Filt["HallucinationFilter<br/>dedup Β· phonetic correction"]

    Filt --> Partial{{"STT_PARTIAL"}}
    Filt --> Confirmed{{"STT_CONFIRMED"}}

    Partial --> OBSlive["OBS overlay<br/>(live)"]

    Confirmed --> LLMp["LLMController.translate_pending"]
    Confirmed --> LLMc["LLMController.translate_confirmed"]

    LLMp --> TPartial{{"TRANSLATION_PARTIAL"}}
    LLMc --> TConfirmed{{"TRANSLATION_CONFIRMED"}}

    TPartial --> OBSlive
    TConfirmed --> Q["TTSQueue.enqueue"]
    TConfirmed --> FW["FileWriter<br/>TXT Β· SRT"]
    TConfirmed --> OBS2["OBS overlay<br/>(confirmed)"]

    Q --> HTTP["TTS HTTP"] --> Play["AudioPlayer"]

Module Map

src/vocal10n/
β”œβ”€β”€ app.py              # Entry point. Creates QApplication, theme, state, MainWindow.
β”œβ”€β”€ config.py           # YAML loader, dotted-key access, get_config singleton.
β”œβ”€β”€ constants.py        # Enums: Language, ModelStatus, EventType.
β”œβ”€β”€ state.py            # SystemState QObject (thread-safe, signal-emitting).
β”‚
β”œβ”€β”€ stt/
β”‚   β”œβ”€β”€ audio_capture.py    # sounddevice mic input, ring buffer.
β”‚   β”œβ”€β”€ playback_aec.py     # PlaybackTimeline + NLMS AEC + DTD.
β”‚   β”œβ”€β”€ engine.py           # FasterWhisper wrapper.
β”‚   β”œβ”€β”€ filters.py          # Hallucination filter, dedup, phonetic correction.
β”‚   β”œβ”€β”€ transcript.py       # Segment confirmation logic.
β”‚   β”œβ”€β”€ diarizer.py         # Optional speaker tagger.
β”‚   β”œβ”€β”€ controller.py       # Orchestrates load/unload, language changes.
β”‚   └── worker.py           # Background thread driving the engine.
β”‚
β”œβ”€β”€ llm/
β”‚   β”œβ”€β”€ engine.py           # llama-cpp-python loader.
β”‚   β”œβ”€β”€ api_backend.py      # OpenAI-compatible HTTP client.
β”‚   β”œβ”€β”€ corrector.py        # Glossary / RAG-based source correction.
β”‚   β”œβ”€β”€ translator.py       # Prompt templates, ChatML, generation.
β”‚   β”œβ”€β”€ rag.py              # Vector retrieval for large glossaries.
β”‚   └── controller.py       # Wires events, debounce, context window.
β”‚
β”œβ”€β”€ tts/
β”‚   β”œβ”€β”€ client.py               # GPT-SoVITS HTTP client.
β”‚   β”œβ”€β”€ server_manager.py       # Subprocess launcher / health check for SoVITS.
β”‚   β”œβ”€β”€ queue.py                # TTS queue with pruning.
β”‚   β”œβ”€β”€ audio_output.py         # Streaming playback + crossfade.
β”‚   β”œβ”€β”€ controller.py           # Wires events for GPT-SoVITS path.
β”‚   β”œβ”€β”€ qwen3_server.py         # Qwen3-TTS subprocess server.
β”‚   β”œβ”€β”€ qwen3_client.py         # Qwen3-TTS protocol client.
β”‚   └── qwen3_controller.py     # Wires events for Qwen3-TTS path.
β”‚
β”œβ”€β”€ pipeline/
β”‚   β”œβ”€β”€ events.py           # Event types, EventDispatcher, dataclass payloads.
β”‚   β”œβ”€β”€ coordinator.py      # Session lifecycle, FileWriter ownership.
β”‚   β”œβ”€β”€ latency.py          # Latency tracker (per-stage stamps).
β”‚   └── file_writer.py      # Async TXT/SRT/WAV writers.
β”‚
β”œβ”€β”€ obs/
β”‚   β”œβ”€β”€ server.py           # HTTP server on 127.0.0.1:5124.
β”‚   └── overlay.html        # Browser-source HTML/CSS template.
β”‚
β”œβ”€β”€ ui/
β”‚   β”œβ”€β”€ main_window.py      # Window, menu, mode toggle.
β”‚   β”œβ”€β”€ section_a.py        # Top: live text + metrics (Pro variant).
β”‚   β”œβ”€β”€ section_b.py        # Bottom: tab container.
β”‚   β”œβ”€β”€ tabs/               # stt_tab, translation_tab, tts_container_tab,
β”‚   β”‚                       # tts_tab, qwen3_tts_tab, output_tab, obs_tab,
β”‚   β”‚                       # kb_tab, training_tab.
β”‚   β”œβ”€β”€ widgets/            # param_slider, model_selector, stream_text,
β”‚   β”‚                       # term_file_list, filter_list_editor,
β”‚   β”‚                       # simple_mode_panel.
β”‚   └── styles/theme.qss    # Dark theme.
β”‚
└── utils/
    β”œβ”€β”€ gpu.py              # pynvml wrapper.
    └── logger.py           # Logging configuration.

See 03 β€” Repository Layout for the full tree including non-source folders.