Audio Output

This page covers the XML elements for audio output: converting text to speech, playing audio files, and sending DTMF tones.

Speak

The <Speak> element converts text to speech and plays it to the caller. Use it for dynamic messages that can’t be prerecorded.

Basic Usage

<Response>
    <Speak>Hello! Welcome to our service.</Speak>
</Response>

from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.SpeakElement('Hello! Welcome to our service.'))
print(response.to_string())

Speak Attributes

Attribute	Type	Default	Description
`voice`	string	`WOMAN`	Voice tone. Allowed: `WOMAN`, `MAN`
`language`	string	`en-US`	Language for speech. See supported languages below
`loop`	integer	`1`	Number of times to repeat. `0` = infinite

Change Voice and Language

<Response>
    <Speak voice="MAN" language="en-GB">
        Good day! This message uses a British male voice.
    </Speak>
</Response>

from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.SpeakElement(
    'Good day! This message uses a British male voice.',
    voice='MAN',
    language='en-GB'
))
print(response.to_string())

Loop a Message

Play a message multiple times:

<Response>
    <Speak loop="3">Please hold. Your call is important to us.</Speak>
</Response>

Set loop="0" to repeat indefinitely until the call ends:

<Response>
    <Speak loop="0">Please wait while we connect you.</Speak>
</Response>

Supported Languages

Language	Code	Woman	Man
Danish	`da-DK`	Yes	No
Dutch	`nl-NL`	Yes	Yes
English (Australian)	`en-AU`	Yes	Yes
English (British)	`en-GB`	Yes	Yes
English (USA)	`en-US`	Yes	Yes
French	`fr-FR`	Yes	Yes
French (Canadian)	`fr-CA`	Yes	No
German	`de-DE`	Yes	Yes
Italian	`it-IT`	Yes	Yes
Polish	`pl-PL`	Yes	Yes
Portuguese	`pt-PT`	No	Yes
Portuguese (Brazilian)	`pt-BR`	Yes	Yes
Russian	`ru-RU`	Yes	No
Spanish	`es-ES`	Yes	Yes
Spanish (USA)	`es-US`	Yes	Yes
Swedish	`sv-SE`	Yes	No

SSML Support

Speech Synthesis Markup Language (SSML) provides fine-grained control over pronunciation, pitch, rate, and pauses. Use Polly voices for SSML support.

<Response>
    <Speak voice="Polly.Joey" language="en-US">
        <prosody rate="medium">
            Hello and welcome to Plivo.
            <break time="500ms"/>
            The word <say-as interpret-as="spell-out">SSML</say-as>
            stands for Speech Synthesis Markup Language.
        </prosody>
    </Speak>
</Response>

from plivo import plivoxml

response = plivoxml.ResponseElement()
speak = plivoxml.SpeakElement(
    content="The word",
    voice="Polly.Joey",
    language="en-US"
)
speak.add_say_as("read", interpret_as="characters")
speak.add_s("may be interpreted as either the present simple form")
speak.add_w("read", role="amazon:VB")
speak.add_s("or the past participle form")
speak.add_w("read", role="amazon:VBD")
response.add(speak)
print(response.to_string())

Common SSML Tags

Tag	Description	Example
`<break>`	Add a pause	`<break time="500ms"/>`
`<say-as>`	Control pronunciation	`<say-as interpret-as="spell-out">ABC</say-as>`
`<prosody>`	Modify pitch, rate, volume	`<prosody rate="slow">Slowly</prosody>`
`<emphasis>`	Add emphasis	`<emphasis level="strong">Important</emphasis>`
`<p>`	Paragraph pause	`<p>First paragraph.</p>`
`<s>`	Sentence pause	`<s>First sentence.</s>`

Speak Nesting

<Speak> can be nested inside:

<GetDigits> - Play message while collecting input
<GetInput> - Play message while collecting speech/digits
<PreAnswer> - Play message before answering

<Response>
    <GetDigits action="/handle-input/" numDigits="1">
        <Speak>Press 1 for sales, press 2 for support.</Speak>
    </GetDigits>
</Response>

Play

The <Play> element plays an audio file to the caller. Use it for pre-recorded messages, music, or sound effects.

Basic Usage

<Response>
    <Play>https://example.com/audio/welcome.mp3</Play>
</Response>

from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.PlayElement('https://example.com/audio/welcome.mp3'))
print(response.to_string())

Play Attributes

Attribute	Type	Default	Description
`loop`	integer	`1`	Number of times to play the audio. `0` = infinite loop

Loop Audio

Play hold music on repeat:

<Response>
    <Play loop="0">https://example.com/audio/hold-music.mp3</Play>
</Response>

from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.PlayElement(
    'https://example.com/audio/hold-music.mp3',
    loop=0
))
print(response.to_string())

Supported Formats

Format	Extension	Notes
MP3	`.mp3`	Recommended for smaller file sizes
WAV	`.wav`	Highest quality, larger files

Requirements:

Audio must be served over HTTPS
Maximum file size: 10 MB
Recommended: 8kHz or 16kHz sample rate, mono

Combine with Speak

<Response>
    <Play>https://example.com/audio/intro-jingle.mp3</Play>
    <Speak>Welcome to Acme Corporation. How can we help you today?</Speak>
</Response>

Play During IVR

Nest <Play> inside <GetDigits> to play audio while collecting input:

<Response>
    <GetDigits action="/handle-input/" numDigits="1" timeout="10">
        <Play>https://example.com/audio/menu-options.mp3</Play>
    </GetDigits>
    <Speak>We didn't receive any input. Goodbye.</Speak>
</Response>

Play Nesting

<Play> can be nested inside:

<GetDigits> - Play while collecting digits
<GetInput> - Play while collecting speech/digits
<PreAnswer> - Play before answering the call

Play Best Practices

Use HTTPS - Audio URLs must use HTTPS
Optimize file size - Compress audio for faster loading
Host reliably - Use a CDN for audio file hosting
Test audio quality - Ensure audio is clear at phone quality (8kHz)
Provide fallback - Use <Speak> as backup if audio fails to load

DTMF

The <DTMF> element sends DTMF (Dual-Tone Multi-Frequency) tones on the current call. Use it to navigate IVR systems, enter PINs, or interact with telephony systems.

Basic Usage

<Response>
    <DTMF>1234</DTMF>
</Response>

from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.DTMFElement('1234'))
print(response.to_string())

DTMF Attributes

Attribute	Type	Default	Description
`async`	boolean	`true`	Send asynchronously and continue to next element

Allowed Characters

Character	Description
`0-9`	Digit tones
`*`	Star key
`#`	Pound/hash key
`w`	Wait 0.5 seconds
`W`	Wait 1 second

With Pauses

Use w (0.5s) or W (1s) to add delays between tones:

<Response>
    <DTMF>1ww2ww3ww4</DTMF>
</Response>

This sends 1, waits 1 second, sends 2, waits 1 second, etc.

Navigate External IVR

When dialing an external number with an IVR:

<Response>
    <Dial>
        <Number sendDigits="wwww1234#">+14155559999</Number>
    </Dial>
</Response>

This is typically done using the sendDigits attribute on <Number> rather than the <DTMF> element.

Send During Call

Send tones during an active call:

<Response>
    <Speak>Sending your confirmation code now.</Speak>
    <DTMF>5678</DTMF>
    <Speak>Code sent.</Speak>
</Response>

Synchronous vs Asynchronous

Async (default): DTMF sends while next element starts

<DTMF async="true">123</DTMF>
<Speak>Processing...</Speak>

Sync: Wait for DTMF to complete before continuing

<DTMF async="false">123</DTMF>
<Speak>DTMF complete.</Speak>

DTMF Use Cases

Scenario	Example
Enter PIN	`<DTMF>1234#</DTMF>`
Navigate IVR menu	`<DTMF>1</DTMF>`
Enter extension	`<DTMF>wwww5678</DTMF>`
Star code	`<DTMF>*67</DTMF>`

Combined with Dial

When using with <Dial>, prefer sendDigits on the <Number> element:

<Response>
    <Dial>
        <Number sendDigits="wwww123#">+14155551234</Number>
    </Dial>
</Response>

Input Collection - GetDigits, GetInput
Call Routing - Dial, Redirect, Hangup, Wait
SSML Concepts - Advanced speech control

Getting Started

Concepts

API Reference

XML Reference

SDKs

Troubleshooting

Voice Tutorials

Migration Guides

Audio Output

Speak

Basic Usage

Speak Attributes

Change Voice and Language

Loop a Message

Supported Languages

SSML Support

Common SSML Tags

Speak Nesting

Play

Basic Usage

Play Attributes

Loop Audio

Supported Formats

Combine with Speak

Play During IVR

Play Nesting

Play Best Practices

DTMF

Basic Usage

DTMF Attributes

Allowed Characters

With Pauses

Navigate External IVR

Send During Call

Synchronous vs Asynchronous

DTMF Use Cases

Combined with Dial

Getting Started

Concepts

API Reference

XML Reference

SDKs

Troubleshooting

Voice Tutorials

Migration Guides

​Speak

​Basic Usage

​Speak Attributes

​Change Voice and Language

​Loop a Message

​Supported Languages

​SSML Support

​Common SSML Tags

​Speak Nesting

​Play

​Basic Usage

​Play Attributes

​Loop Audio

​Supported Formats

​Combine with Speak

​Play During IVR

​Play Nesting

​Play Best Practices

​DTMF

​Basic Usage

​DTMF Attributes

​Allowed Characters

​With Pauses

​Navigate External IVR

​Send During Call

​Synchronous vs Asynchronous

​DTMF Use Cases

​Combined with Dial

​Related

Speak

Basic Usage

Speak Attributes

Change Voice and Language

Loop a Message

Supported Languages

SSML Support

Common SSML Tags

Speak Nesting

Play

Basic Usage

Play Attributes

Loop Audio

Supported Formats

Combine with Speak

Play During IVR

Play Nesting

Play Best Practices

DTMF

Basic Usage

DTMF Attributes

Allowed Characters

With Pauses

Navigate External IVR

Send During Call

Synchronous vs Asynchronous

DTMF Use Cases

Combined with Dial

Related