Use this file to discover all available pages before exploring further.
What: Use SSML markup in Plivo’s <Speak> XML element to control TTS voice, pitch, volume, rate, pronunciation, and pauses. Powered by Amazon Polly (27 languages, 40+ voices)
How: Set voice="Polly.<VoiceName>" in your Speak tag, then use SSML tags like <prosody>, <break>, <emphasis>, <phoneme>, <say-as>
Cost: Currently free (beta). Will eventually be billed per character synthesized
Limitation: Max 3,000 characters per Speak tag. Only works with Polly voices (not legacy Plivo voices). Some Amazon Polly tags are not supported (<amazon:effect>, <amazon:auto-breaths>)
The World Wide Web Consortium (W3C) created Speech Synthesis Markup Language (SSML) as an XML-based markup language to assist in generating natural-sounding synthesized speech. The Plivo Speak XML element supports the generation of SSML-based speech, powered by Amazon Polly. It supports 27 languages and more than 40 voices, and allows developers to control pronunciation, pitch, and volume.Here‘s how SSML appears within Plivo Speak XML elements:
<Response> <Speak voice="MAN">Go Green, Go Plivo</Speak> //Basic Text-to-Speech <Speak voice="Polly.Joey"> <emphasis level="moderate">Go Green, Go Plivo</emphasis> //Text-to-Speech using SSML </Speak></Response>
To synthesize SSML speech on Plivo, specify one of the Amazon Polly voices in the voice attribute of Plivo’s <Speak> XML tag. Note that Polly voices must be namespaced with a Polly prefix.For example:
<Response> <Speak voice="Polly.Joey"> <emphasis level="moderate">Go Green, Go Plivo</emphasis> </Speak></Response>
Support for SSML-based speech synthesis is currently in beta and free for all Plivo users. We expect to eventually charge for text-to-speech on the basis of the number of characters synthesized.
The w tag lets you customize the pronunciation of a word by specifying its part of speech.
from flask import Flask, Response, request, url_forfrom plivo import plivoxmlapp = Flask(__name__)@app.route("/ssml/", methods=["GET", "POST"])def ssml(): element = plivoxml.ResponseElement() response = ( element.add( plivoxml.SpeakElement(content="The word", voice="Polly.Joey", language="en-US") .add_say_as("read", interpret_as="characters") .add_s("may be interpreted as either the present simple form") .add_w("read", role="amazon:VB") .add_s("or the past participle form") .add_w("read", role="amazon:VBD") ) .to_string(False) ) print(response) return Response(response, mimetype="text/xml")if __name__ == "__main__": app.run(host="0.0.0.0", debug=True)
The rendered XML document would be:
<Response> <Speak voice="Polly.Joey">The word <say-as interpret-as="characters">read</say-as> <s> may be interpreted as either the present simple form </s> <w role="amazon:VB">read</w> <s>or the past participle form</s> <w role="amazon:VBD">read</w> </Speak></Response>
<Response> <Speak>I can speak in a <prosody pitch="high">higher pitched voice</prosody> , or I can speak <prosody pitch="low">in a lower pitched voice</prosody> </Speak></Response><Response> <Speak>I can speak <prosody rate="x-slow">really slowly</prosody> , or I can speak <prosody rate="x-fast">really fast</prosody> </Speak></Response><Response> <Speak>I can also speak <prosody volume="x-loud">very loudly</prosody> , or I can speak <prosody volume="x-soft">very quietly</prosody>. </Speak></Response>