Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.opendot.ai/llms.txt

Use this file to discover all available pages before exploring further.

Configuration is the second page in the console workflow. It takes the selected draft agent from Agent Studio and turns it into an explicit voice runtime configuration.
OpenDot Configuration screen
The page is intentionally organized around the same four pipeline stages that the runtime executes:
VAD -> STT -> LLM -> TTS

What you see in the UI

Each stage card has the same shape:
  • stage number and icon
  • stage label and purpose
  • provider name
  • model selector
  • stage settings
  • emitted runtime events
If no agent is selected, Configuration shows an empty state. Create or select an agent in Agent Studio first.

Stage 1: Voice activity

FieldDefaultPurpose
ProviderDeepgramUses Deepgram live listen options for turn detection.
Modelendpointing-vadKeeps VAD visible as a product stage.
Endpointing900 msControls how quickly silence closes a turn.
Utterance end1000 msAdds a safer pause for device conversations.
Turn eventsvad_events, interim_results, speech_finalEnables runtime feedback and turn close events.
Noise floor2 charactersIgnores tiny transcript fragments.
Return device to wake wordenabledLets firmware leave active listening after a completed turn.
If turns close too early, increase Endpointing or Utterance end. If tiny noises trigger turns, raise Noise floor.

Stage 2: Speech to text

FieldDefaultPurpose
ProviderDeepgramStreams microphone or device audio to STT.
Modelnova-3Default live transcription model.
Languageen-USRecognition language.
Encodinglinear16Browser runtime audio format.
Sample rate16000Input sample rate sent to the runtime.
Featuressmart_formatTranscript formatting options.
The Browser Test panel shows interim and final transcripts from this stage.

Stage 3: Language model

FieldDefaultPurpose
ProviderOpenAIOpenAI-compatible response generation.
Modelgpt-5.4-miniDefault starter model.
APIResponses APIStreaming text generation path.
System prompt and chunk rulesvoice assistant prompt plus chunk instructionsShapes the assistant response and TTS chunking.
Reasoning effortconfigurableControls model reasoning behavior where supported.
VerbosityconfigurableControls response density where supported.
The runtime expects assistant responses in XML-like chunks:
<chunk>First spoken phrase.</chunk><chunk>Next spoken phrase.</chunk>
Each closed chunk can be sent to TTS while the rest of the answer is still streaming.

Stage 4: Text to speech

FieldDefaultPurpose
ProviderDeepgramSynthesizes assistant text into audio.
Modelaura-2-thalia-enDefault starter voice.
Encodingmp3Output audio encoding.
Sample rate24000Runtime TTS sample rate.
Browser deliverychunked audio filesBrowser playback mode.
Chunk stylefast phrasesControls how short streamed TTS chunks should be.
Use Linear16 PCM with Direct PCM stream when you want raw PCM playback in the browser. Other encodings are retained as chunked audio files for playback.

What gets persisted

Changing Configuration updates the selected agent through the platform API. The API writes a new draft agent version and a new draft pipeline version in PostgreSQL. The runtime later loads the authorized version when Browser Test opens a voice session or a Dot device connects with valid credentials. See Platform architecture for the full system boundary and Browser Test for the next step in the UI workflow.