Skip to main content
This page covers the XML elements for audio output: converting text to speech, playing audio files, and sending DTMF tones.

Speak

The <Speak> element converts text to speech and plays it to the caller. Use it for dynamic messages that can’t be prerecorded.

Basic Usage

<Response>
    <Speak>Hello! Welcome to our service.</Speak>
</Response>
from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.SpeakElement('Hello! Welcome to our service.'))
print(response.to_string())

Speak Attributes

AttributeTypeDefaultDescription
voicestringWOMANVoice tone. Allowed: WOMAN, MAN
languagestringen-USLanguage for speech. See supported languages below
loopinteger1Number of times to repeat. 0 = infinite

Change Voice and Language

<Response>
    <Speak voice="MAN" language="en-GB">
        Good day! This message uses a British male voice.
    </Speak>
</Response>
from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.SpeakElement(
    'Good day! This message uses a British male voice.',
    voice='MAN',
    language='en-GB'
))
print(response.to_string())

Loop a Message

Play a message multiple times:
<Response>
    <Speak loop="3">Please hold. Your call is important to us.</Speak>
</Response>
Set loop="0" to repeat indefinitely until the call ends:
<Response>
    <Speak loop="0">Please wait while we connect you.</Speak>
</Response>

Supported Languages

LanguageCodeWomanMan
Danishda-DKYesNo
Dutchnl-NLYesYes
English (Australian)en-AUYesYes
English (British)en-GBYesYes
English (USA)en-USYesYes
Frenchfr-FRYesYes
French (Canadian)fr-CAYesNo
Germande-DEYesYes
Italianit-ITYesYes
Polishpl-PLYesYes
Portuguesept-PTNoYes
Portuguese (Brazilian)pt-BRYesYes
Russianru-RUYesNo
Spanishes-ESYesYes
Spanish (USA)es-USYesYes
Swedishsv-SEYesNo

SSML Support

Speech Synthesis Markup Language (SSML) provides fine-grained control over pronunciation, pitch, rate, and pauses. Use Polly voices for SSML support.
<Response>
    <Speak voice="Polly.Joey" language="en-US">
        <prosody rate="medium">
            Hello and welcome to Plivo.
            <break time="500ms"/>
            The word <say-as interpret-as="spell-out">SSML</say-as>
            stands for Speech Synthesis Markup Language.
        </prosody>
    </Speak>
</Response>
from plivo import plivoxml

response = plivoxml.ResponseElement()
speak = plivoxml.SpeakElement(
    content="The word",
    voice="Polly.Joey",
    language="en-US"
)
speak.add_say_as("read", interpret_as="characters")
speak.add_s("may be interpreted as either the present simple form")
speak.add_w("read", role="amazon:VB")
speak.add_s("or the past participle form")
speak.add_w("read", role="amazon:VBD")
response.add(speak)
print(response.to_string())

Common SSML Tags

TagDescriptionExample
<break>Add a pause<break time="500ms"/>
<say-as>Control pronunciation<say-as interpret-as="spell-out">ABC</say-as>
<prosody>Modify pitch, rate, volume<prosody rate="slow">Slowly</prosody>
<emphasis>Add emphasis<emphasis level="strong">Important</emphasis>
<p>Paragraph pause<p>First paragraph.</p>
<s>Sentence pause<s>First sentence.</s>

Speak Nesting

<Speak> can be nested inside:
  • <GetDigits> - Play message while collecting input
  • <GetInput> - Play message while collecting speech/digits
  • <PreAnswer> - Play message before answering
<Response>
    <GetDigits action="/handle-input/" numDigits="1">
        <Speak>Press 1 for sales, press 2 for support.</Speak>
    </GetDigits>
</Response>

Play

The <Play> element plays an audio file to the caller. Use it for pre-recorded messages, music, or sound effects.

Basic Usage

<Response>
    <Play>https://example.com/audio/welcome.mp3</Play>
</Response>
from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.PlayElement('https://example.com/audio/welcome.mp3'))
print(response.to_string())

Play Attributes

AttributeTypeDefaultDescription
loopinteger1Number of times to play the audio. 0 = infinite loop

Loop Audio

Play hold music on repeat:
<Response>
    <Play loop="0">https://example.com/audio/hold-music.mp3</Play>
</Response>
from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.PlayElement(
    'https://example.com/audio/hold-music.mp3',
    loop=0
))
print(response.to_string())

Supported Formats

FormatExtensionNotes
MP3.mp3Recommended for smaller file sizes
WAV.wavHighest quality, larger files
Requirements:
  • Audio must be served over HTTPS
  • Maximum file size: 10 MB
  • Recommended: 8kHz or 16kHz sample rate, mono

Combine with Speak

<Response>
    <Play>https://example.com/audio/intro-jingle.mp3</Play>
    <Speak>Welcome to Acme Corporation. How can we help you today?</Speak>
</Response>

Play During IVR

Nest <Play> inside <GetDigits> to play audio while collecting input:
<Response>
    <GetDigits action="/handle-input/" numDigits="1" timeout="10">
        <Play>https://example.com/audio/menu-options.mp3</Play>
    </GetDigits>
    <Speak>We didn't receive any input. Goodbye.</Speak>
</Response>

Play Nesting

<Play> can be nested inside:
  • <GetDigits> - Play while collecting digits
  • <GetInput> - Play while collecting speech/digits
  • <PreAnswer> - Play before answering the call

Play Best Practices

  1. Use HTTPS - Audio URLs must use HTTPS
  2. Optimize file size - Compress audio for faster loading
  3. Host reliably - Use a CDN for audio file hosting
  4. Test audio quality - Ensure audio is clear at phone quality (8kHz)
  5. Provide fallback - Use <Speak> as backup if audio fails to load

DTMF

The <DTMF> element sends DTMF (Dual-Tone Multi-Frequency) tones on the current call. Use it to navigate IVR systems, enter PINs, or interact with telephony systems.

Basic Usage

<Response>
    <DTMF>1234</DTMF>
</Response>
from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.DTMFElement('1234'))
print(response.to_string())

DTMF Attributes

AttributeTypeDefaultDescription
asyncbooleantrueSend asynchronously and continue to next element

Allowed Characters

CharacterDescription
0-9Digit tones
*Star key
#Pound/hash key
wWait 0.5 seconds
WWait 1 second

With Pauses

Use w (0.5s) or W (1s) to add delays between tones:
<Response>
    <DTMF>1ww2ww3ww4</DTMF>
</Response>
This sends 1, waits 1 second, sends 2, waits 1 second, etc. When dialing an external number with an IVR:
<Response>
    <Dial>
        <Number sendDigits="wwww1234#">+14155559999</Number>
    </Dial>
</Response>
This is typically done using the sendDigits attribute on <Number> rather than the <DTMF> element.

Send During Call

Send tones during an active call:
<Response>
    <Speak>Sending your confirmation code now.</Speak>
    <DTMF>5678</DTMF>
    <Speak>Code sent.</Speak>
</Response>

Synchronous vs Asynchronous

Async (default): DTMF sends while next element starts
<DTMF async="true">123</DTMF>
<Speak>Processing...</Speak>
Sync: Wait for DTMF to complete before continuing
<DTMF async="false">123</DTMF>
<Speak>DTMF complete.</Speak>

DTMF Use Cases

ScenarioExample
Enter PIN<DTMF>1234#</DTMF>
Navigate IVR menu<DTMF>1</DTMF>
Enter extension<DTMF>wwww5678</DTMF>
Star code<DTMF>*67</DTMF>

Combined with Dial

When using with <Dial>, prefer sendDigits on the <Number> element:
<Response>
    <Dial>
        <Number sendDigits="wwww123#">+14155551234</Number>
    </Dial>
</Response>