05 β€” Configuration

All runtime configuration lives in config/default.yaml. It is loaded once by vocal10n.config.get_config() which exposes a small Config object supporting dotted-key access (cfg.get("stt.model_size")) and section views (cfg.section("stt")).

This chapter is a reference for every section. Defaults shown reflect the shipped values.

pipeline

Top-level switches and pacing.

KeyDefaultMeaning
name"Vocal10n"Display name.
target_latency_ms2500Soft end-to-end latency target.
enable_stt / enable_translation / enable_ttsfalseModule toggles at startup. UI may flip these.
enable_pending_translationtrueTranslate uncommitted text for display only.
enable_confirmed_translationtrueTranslate committed text for TTS / files.
tts_source"confirmed"Which text feeds TTS: confirmed, pending, or both.
translation_debounce_ms150Debounce window for partial-translation calls.
confirmed_batch_delay_ms400Delay before flushing confirmed text into a batch.
tts_queue_max_size10Hard cap on queued TTS jobs.
tts_queue_max_pending3Drop-oldest threshold to keep latency bounded.
max_buffer_age_s2.0Max age before an unconfirmed buffer is force-flushed.
min_clause_chars8Minimum clause length before clause-end triggers translation.

stt β€” FasterWhisper

KeyDefaultMeaning
model_sizelarge-v3-turboHF model id or local path.
devicecudaPassed to WhisperModel.
compute_typeint8_float16Mixed-precision compute mode.
window_seconds6.5Sliding decode window.
confirm_threshold0.3Time tail (s) below which segments stay β€œpending”.
min_transcribe_duration0.3Minimum audio length before a transcribe call.
max_segment_age4.0Force-confirm any segment older than this.
sample_rate16000Mic capture rate.
channels / chunk_duration1 / 0.2Capture chunking.
languagenullnull = auto-detect; or "zh", "en".
use_simplified_chinesetrueConvert traditional output to simplified.
initial_prompt_capacity200Cap on terms injected via initial_prompt.
beam_size1Greedy by default for speed.

translation β€” Qwen3 / OpenAI-compatible

KeyDefaultMeaning
backendlocallocal = llama-cpp GGUF, api = OpenAI-compatible HTTP.
model_pathmodels/llm/Qwen3-4B-Instruct-2507.Q4_K_M.ggufUsed when backend=local.
n_gpu_layers / n_ctx / n_batch / n_threads-1, 512, 8, 4llama.cpp tuning.
api_url / api_model / api_key / api_timeoutlocal LM Studio defaultsUsed when backend=api.
temperature / top_k / top_p / max_tokens0.0, 1, 1.0, 64Deterministic short outputs.
target_latency_ms200Soft per-call budget.
target_languageEnglishDisplay language; mapped to code via languages.
auto_detect_sourcetrueDetect source per call rather than relying on STT lang.
context_window_size2Number of prior translation pairs prepended for context.
rag_threshold100Switch to vector retrieval when glossary exceeds this.

tts β€” GPT-SoVITS

KeyDefaultMeaning
api_host / api_port / api_timeout127.0.0.1, 9880, 60HTTP endpoint of the server subprocess.
ref_audio_path / ref_audio_text / ref_audio_langreference clip + transcript + autoVoice cloning reference.
output_langenSynthesis language code.
streaming_mode3SoVITS streaming chunk size preset.
speed_factor1.3Playback speed scaler.
top_k / top_p / temperature5, 0.7, 0.5Sampling.
text_split_methodcut0Server-side chunking strategy.
batch_size1Per-request batch.

tts_qwen3 β€” Qwen3-TTS Backend

Voice modes:

voice_modeRequired keys
cloneref_audio_path, ref_audio_text, ref_audio_lang
speakerspeaker (built-in id) and optional speaker_instruct
designdesign_instruct (free-form description)

Other parameters mirror typical generation knobs (top_k, top_p, temperature, max_new_tokens, dtype, use_flash_attn).

audio_output

Playback device, sample rate, buffer size, crossfade in milliseconds. The crossfade smooths the boundary between consecutive TTS chunks emitted by the streaming player.

aec β€” Acoustic Echo Cancellation

KeyDefaultMeaning
enabledtrueMaster switch.
filter_taps2048NLMS length. At 16 kHz this is 128 ms of impulse response.
step_size0.01NLMS ΞΌ; 0.005–0.05 is the safe range.
dt_threshold3.0Double-talk gate; freeze adaptation when mic ≫ echo estimate.
max_delay_ms300.0Max delay searched by cross-correlation.

See 07 β€” STT Module for theory.

languages

Display-name β†’ ISO code map used by language pickers.

obs

Overlay server bind, per-language font family, font size, colour, stroke, and shadow. The browser source URL is http://127.0.0.1:5124/.

output

Per-format toggles: save_source_txt, save_source_srt, save_target_txt, save_target_srt, save_wav, plus the destination directory.

logging

level (INFO, DEBUG, …) and the show_latency / show_vram flags that toggle the metrics surfaces.