06 β€” Pipeline and State

This chapter covers the cross-cutting plumbing that lets the modules cooperate without depending on each other directly.

SystemState β€” Single Source of Truth

vocal10n.state.SystemState is a QObject carrying every piece of state that the UI cares about: model statuses, enable flags, languages, and the four live text buffers (current/accumulated Γ— source/translation).

Key properties:

  • stt_status, llm_status, tts_status, tts_qwen3_status β€” ModelStatus enum (UNLOADED, LOADING, LOADED, ERROR).
  • stt_enabled, llm_enabled, tts_enabled, tts_source_enabled, tts_target_enabled, speaker_tagging.
  • source_language, target_language β€” Language enum.
  • current_*_text, accumulated_*_text β€” strings updated by workers.

Each property has a matching *_changed Qt signal that fires only on real change. UI widgets connect to these signals; worker threads write through the property setters under an internal RLock.

Event Dispatcher

vocal10n.pipeline.events defines:

  • EventType (in vocal10n.constants) β€” values such as STT_PARTIAL, STT_CONFIRMED, TRANSLATION_PARTIAL, TRANSLATION_CONFIRMED, TTS_STARTED, TTS_FINISHED, PIPELINE_READY.
  • Dataclass payloads: Event, TextEvent, TranslationEvent.
  • EventDispatcher β€” a thread-safe subscribe(type, callback) / publish(event) pub/sub, accessed through get_dispatcher().

Modules publish into the dispatcher and subscribe to whichever event types they need. This is what keeps the wiring loose: the STT module does not know about the LLM or TTS modules.

Latency Tracker

vocal10n.pipeline.latency.LatencyTracker records per-stage timestamps keyed by an utterance id, and exposes rolling-window aggregates that the UI uses for the Section A metrics:

  • stt_partial_ms β€” mic-frame-arrival β†’ first partial.
  • stt_confirmed_ms β€” mic-frame-arrival β†’ confirmed segment.
  • translation_ms β€” confirmed β†’ translation produced.
  • tts_ttfa_ms β€” translated β†’ first audio frame heard (TTFA).

PipelineCoordinator

vocal10n.pipeline.coordinator.PipelineCoordinator owns session lifecycle rather than per-event flow:

  • start_session() creates a FileWriter if any output.* flag is on, subscribes it to STT_CONFIRMED and TRANSLATION_CONFIRMED, and publishes PIPELINE_READY.
  • stop_session() flushes pending buffers, unsubscribes the writer, finalises SRT / TXT / WAV files.
  • It can hold a reference to AudioCapture so the WAV writer can read the mic ring buffer at the right offset.

The coordinator deliberately does not load or unload models; that lives in each module’s controller.

Module Controllers

Each backend module exposes a controller that the UI talks to:

  • vocal10n.stt.controller.STTController β€” load/unload, language switching, enable/disable, owns STTWorker.
  • vocal10n.llm.controller.LLMController β€” load/unload, prompt updates, KB hot-reload, debounced translation.
  • vocal10n.tts.controller.TTSController β€” GPT-SoVITS path: server health, queue, playback.
  • vocal10n.tts.qwen3_controller.Qwen3TTSController β€” same surface for the Qwen3-TTS backend.

Controllers are the only objects the UI tabs hold references to. They mediate writes to SystemState and subscribe to dispatcher events.