If you have ever tried to dictate a long thought to inside a Discord voice channel and watched it commit the first six words to memory while your sentence was still going, the new voice.captureSilenceGraceMs setting in v2026.5.7 is for you. The default post-speech silence grace is now 2.5 seconds, up from the previous shorter window, and the setting is exposed so you can tune it per channel when your room is noisy or your speaking cadence is unusual.
This post walks through the actual fix, what voice.captureSilenceGraceMs does, why the related channels capabilities permission audit shipped alongside it, and a sane starting configuration for the three voice setups that break most often.
What the Discord voice capture problem looked like
Before v2026.5.7, treated short pauses inside a sentence as the end of the utterance. The voice pipeline transcribed the partial fragment, fed it to the agent, and started replying — even though you were still talking. The result was the kind of conversation where you say “Pica, can you check whether the…” and pause to think, and the agent immediately answers a question you never finished asking.
The root cause was a too-aggressive end-of-utterance threshold. STT systems use a silence detector to decide when a speaker has finished. If you make that window too short, normal speech rhythms — thinking pauses, breaths, mid-sentence “ums” — trigger false endings. Make it too long and the agent feels unresponsive.
The 2026.5.7 release set the default grace window to 2.5 seconds of trailing silence, which is long enough to absorb most thinking pauses but short enough to keep turn-taking conversational. The change is credited to @vincentkoc in the release notes.
What voice.captureSilenceGraceMs actually controls
voice.captureSilenceGraceMs is a millisecond value in your config that controls how long the voice pipeline waits after detected speech ends before committing the utterance. A higher value means longer pauses are allowed inside one turn. A lower value means faster but more interrupt-prone capture.
Reasonable starting points:
| Environment | Recommended value | Why |
|---|---|---|
| Quiet 1:1 voice DM | 1500 | Lower latency, no background noise to confuse VAD |
| Default mixed-use Discord channel | 2500 (default) | Balanced for normal conversation |
| Noisy server, multi-speaker channel | 3500–4000 | Background voices keep flipping VAD; longer grace prevents fragment commits |
| Dictation / long-form thinking | 5000 | You’re talking to , not with it; pauses are normal |
Set it in your config:
config set voice.captureSilenceGraceMs 3500
Or, if you edit .json directly, add it under the voice block alongside your existing voice settings. Restart your daemon and the new value applies on the next voice session.
Audit Discord voice permissions before you blame the config
The other half of the release matters because most “voice doesn’t work” reports turn out to be permission issues, not VAD issues. v2026.5.7 added voice-channel permission auditing to channels capabilities and channels status --probe, including auto-join targets. If your bot is missing Connect, Speak, or Read Message History on the voice channel, the audit will tell you before /vc join silently fails.
Run the audit on every voice-enabled Discord channel:
channels capabilities discord
channels status --probe
The output flags missing voice permissions per channel. Fix those at the Discord role level before tuning captureSilenceGraceMs. There’s no point lengthening the silence grace if the bot can’t open the audio stream in the first place.
A three-step debugging order
When Discord voice misbehaves, work through these in order. The order matters because each later step is harder to diagnose without the earlier ones being clean.
- Run
channels capabilities discordandchannels status --probe. Confirm Connect, Speak, and Read Message History are present on the channel. Fix any missing permission at the Discord server level. - Listen to a single utterance with the default 2.5s grace. If the agent still cuts you off, your environment is noisy or your cadence has long pauses. Bump
voice.captureSilenceGraceMsto 3500 and retry. - Check the spoken-output prompt. v2026.5.7 also tightened the prompt around live STT fragments. If responses feel verbose against partial transcripts, the tightened prompt should help; if you have a custom voice prompt, review whether it still matches the new behavior.
If you make it through all three and capture is still choppy, the next layer is your microphone hardware and Discord input sensitivity, not . Lower the Discord input sensitivity threshold; the voice pipeline can only work with the audio Discord actually sends it.
Why this is more than a parameter tweak
The captureSilenceGraceMs change is a small surface but it touches a long-standing complaint in voice-first AI agents. Local voice assistants like Alexa or Siri have spent a decade tuning VAD thresholds against millions of utterances, and they still cut people off. An open-source agent like doesn’t have that data, so making the threshold configurable per deployment is the right move. You know your room and your cadence better than any default can.
The release notes credit a community contributor for the change, which fits the pattern of recent voice improvements: doesn’t run its own STT model — it stitches together Discord’s audio pipe, an STT backend, and the agent runtime. The integration points are exactly where rough edges accumulate. Making them tunable is how you ship a voice product without a vertical hardware integration.
FAQ
What’s the default value of voice.captureSilenceGraceMs in v2026.5.7?
2.5 seconds (2500 ms). This is up from the previous shorter default.
Does this only apply to Discord, or to all voice channels? It applies to the voice capture pipeline generally, but the choppiness complaint and the related permission auditing were Discord-specific in v2026.5.7’s release notes. WhatsApp, Telegram, and voice-call plugin behavior aren’t affected by this setting.
Will lengthening the grace window slow down responses? Yes, by exactly the grace duration. A 3.5s setting adds 1 extra second of perceived latency over the 2.5s default. That’s the tradeoff: fewer interruptions, slightly slower turn-taking.
Where do I see what value is currently set?
Run config get voice.captureSilenceGraceMs, or check the voice block in your .json config.
Is there a way to set it per-channel instead of globally?
Not directly via voice.captureSilenceGraceMs in v2026.5.7. If you need per-channel behavior, the workaround is multiple profile configs, one per voice context.
Putting a Discord voice capture fix together
If you have one minute, do this: run channels capabilities discord, fix any missing voice permission at the Discord role level, then leave voice.captureSilenceGraceMs at its 2.5s default. If you have noisy channels or speak in long deliberate sentences, push it to 3500 and re-test. That covers the 90% case. The other 10% is microphone hardware, not config.
For the full picture of how channels work, see how works. If you’re new to voice, the Discord integration page is the right starting point, and the vs. alternatives comparison covers how its voice approach differs from hosted assistants.
Sources: