Introducing Automatic Speech Recognition

Aug 10, 2020

Speech recognition has become increasingly pervasive in recent years. This technology helps businesses provide a better customer experience and reduce organizational costs.

You can now take advantage of automatic speech recognition (ASR) with Plivo’s GetInput XML, an easy way to configure engaging, voice-driven user experiences.

Plivo’s ASR technology eliminates the heavy lifting that’s often associated with building AI-driven voice interactions. Our ASR helps build responsive applications that act on partial recognition results as your customer speaks, and we’re able to make voice transcriptions available to your application in real time.

Here are some ways ASR can enhance your end user experience, as well as help your customer service agents work more efficiently.

Conversational IVR: Upgrade a manual, traditional IVR menu to a speech-driven experience that gets callers the answers they seek faster. Conversational IVR can do more than just say “Press 1.”
Voice search: Build virtual assistants that intelligently provide relevant information based on the user’s query.
Surveys and form fills: Prompt users with questions and automatically capture and transcribe their answers to fill out forms and surveys.

How does it work?

Plivo routes user responses based on a speech or a digit-selection prompt. When collecting a user’s speech as input, Plivo transcribes and relays the spoken phrases to the specified action URL in real time. When collecting input through digit press, the digits entered by the user are relayed to the specified action URL. For more information, see our documentation.

How much does it cost?

The amount you’re charged for ASR is based on the duration of the analyzed speech. Charges are calculated as USD $0.02 per 15-second pulse (rounded up). For example, if speech was recognized for 35 seconds, the account would be billed for 45 seconds (15 * 3) of speech.

What are some key features of Plivo’s ASR functionality?

Extensive language support: Plivo supports speech recognition for 27 major languages and their regional variants.

Speech adaptation with hints: Improve speech recognition accuracy by providing a set of hint words and phrases expected from the speaker. This feature can improve transcription accuracy of proper nouns, homophones (one, won), and domain-specific words rarely used in everyday conversation.

Prebuilt models: Reduce the amount of time spent configuring an IVR system and select from a range of prebuilt models, depending on your use case.

Profanity filter: Keep your transcriptions clean, and identify and monitor the use of profanity. The profanity filter masks specific words in the transcriptions programmatically forwarded to your application.

Simultaneous input detection: Augment your existing IVR applications with speech by enabling the simultaneous detection of DTMF and speech input — in other words, you can now prompt, “Press 1 or say ‘yes’ to accept.”

Advanced end-of-speech detection: Automatically detect the end of the user’s speech. Advanced timeout controls help configure end of speech detection behavior.

Interim transcription results: Reduce response times by receiving transcription results in real time with each word spoken by the caller.

Getting started

Getting started with ASR is easy. Head over to our product guide for detailed references and code samples. All Plivo server SDKs come with helper functions to work with GetInput XML. Read our language-specific guides on getting started with Plivo Voice API.

Not using Plivo yet? Getting started takes just five minutes. Sign up today!