The <Stream> element streams raw audio from active calls over a WebSocket connection in near real-time. Use it for real-time speech processing, transcription, or AI voice applications.
Basic Usage
< Response >
< Stream > wss://yourserver.example.com/audiostream </ Stream >
</ Response >
Python
Node.js
Ruby
PHP
Java
.NET
Go
from plivo import plivoxml
response = plivoxml.ResponseElement()
response.add(plivoxml.StreamElement( 'wss://yourserver.example.com/audiostream' ))
print (response.to_string())
Attributes
Attribute Type Default Description bidirectionalboolean falseEnable two-way audio (read/write) audioTrackstring inboundWhich audio to stream: inbound, outbound, both streamTimeoutinteger 86400Max stream duration in seconds contentTypestring audio/x-l16;rate=8000Audio codec and sample rate keepCallAliveboolean falseContinue call only after stream ends extraHeadersstring - Custom key-value pairs for WebSocket statusCallbackUrlURL - URL for stream status events statusCallbackMethodstring POSTHTTP method for callback noiseCancellationstring "false"Enable noise cancellation: "true" or "false" noiseCancellationLevelinteger 85Noise reduction intensity (60–100). Only applies when noiseCancellation is "true"
Content Type Description audio/x-l16;rate=8000Linear PCM, 8kHz (default) audio/x-l16;rate=16000Linear PCM, 16kHz audio/x-mulaw;rate=8000G.711 mu-law, 8kHz
Bidirectional Streaming
Enable two-way audio for voice AI applications:
< Response >
< Stream bidirectional = "true" keepCallAlive = "true" >
wss://ai.example.com/voice-agent
</ Stream >
</ Response >
When bidirectional="true", your WebSocket server can send audio back:
{
"event" : "playAudio" ,
"media" : {
"contentType" : "audio/x-l16" ,
"sampleRate" : "8000" ,
"payload" : "<base64-encoded-audio>"
}
}
When bidirectional is true, audioTrack cannot be outbound or both.
Stream Both Directions
Capture audio from both parties:
< Response >
< Stream audioTrack = "both" streamTimeout = "3600" >
wss://transcription.example.com/stream
</ Stream >
< Speak > This call is being transcribed for quality purposes. </ Speak >
</ Response >
Status Callbacks
Monitor stream connection status:
< Response >
< Stream
statusCallbackUrl = "https://example.com/stream-status/"
statusCallbackMethod = "POST" >
wss://yourserver.example.com/audiostream
</ Stream >
</ Response >
Callback Events
Notifications sent when:
Audio stream is connected
Audio stream is stopped (intentionally or timeout)
Audio stream failed or disconnected
Callback Parameters
Parameter Description bidirectionalWhether stream is bidirectional audioTrackWhich audio tracks are streamed streamTimeoutMax stream duration contentTypeAudio codec used extraHeadersCustom headers sent keepCallAliveWhether call waits for stream
Pass metadata to your WebSocket server:
< Response >
< Stream extraHeaders = "userId=12345,sessionId=abc123" >
wss://yourserver.example.com/audiostream
</ Stream >
</ Response >
Constraints:
Max length: 512 bytes
Allowed characters: [A-Z], [a-z], [0-9]
Keep Call Alive
Wait for stream to end before continuing:
< Response >
< Stream keepCallAlive = "true" >
wss://ai.example.com/conversation
</ Stream >
< Speak > Thank you for using our AI assistant. </ Speak >
</ Response >
When keepCallAlive="true":
Stream element runs exclusively
Subsequent XML executes only after stream disconnects
Noise Cancellation
Filter out background noise in real-time to improve voice clarity and transcription accuracy for voice agent applications in noisy environments.
Noise cancellation is an account-level feature. Contact your Plivo account manager or email support@plivo.com to enable it before using these attributes.
< Response >
< Stream bidirectional = "true"
keepCallAlive = "true"
noiseCancellation = "true"
noiseCancellationLevel = "85" >
wss://ai.example.com/voice-agent
</ Stream >
</ Response >
Choosing a cancellation level:
Level Range Environment Notes 60–70Quiet (home, office) Light filtering, preserves voice detail 70–85Moderate noise Good balance for most use cases (default: 85) 85–100Heavy noise (traffic, crowds) Aggressive filtering, may introduce minor artifacts
Start with the default value of 85. Increase toward 100 for heavy background noise. Decrease toward 60 if you notice audio artifacts or voice distortion.
Use Cases
Scenario Configuration Real-time transcription audioTrack="both", contentType="audio/x-l16;rate=16000"Voice AI agent bidirectional="true", keepCallAlive="true"Voice AI in noisy environments bidirectional="true", keepCallAlive="true", noiseCancellation="true"Call monitoring audioTrack="inbound"Quality analysis audioTrack="both"
WebSocket Events
Your WebSocket server receives:
Event Description Connection Initial metadata about the stream and call Media Base64-encoded audio chunks with contentType, sampleRate, payload Stop Notification when stream ends
For detailed event protocol, see Stream Event Protocol .