You can use speech input or dual-tone multi-frequency (DTMF) tones (a.k.a. Touch-Tone) to route callers or otherwise change call flows for applications such as interactive voice response (IVR), virtual assistants, and mobile surveys.
To get started, you need a Plivo account — sign up with your work email address if you don’t have one already. You must have a voice-enabled Plivo phone number to receive incoming calls; you can rent numbers from the Numbers page of the Plivo console, or by using the Numbers API. If this is your first time using Plivo APIs, follow our instructions to set up a PHP development environment and a web server and safely expose that server to the internet.
This example shows a multilevel IVR phone application that uses digit press input captured using the GetInput XML element. A virtual assistant answers incoming calls and offers the caller three choices: “Press 1 for your account balance. Press 2 for your account status. Press 3 to speak to a representative.” If the caller enters 1 or 2, the application will retrieve the requested information and play the caller a text-to-speech message. If the caller presses 3, the application will redirect the caller to the second branch, which offers two new choices: “Press 1 for sales. Press 2 for support.” The application then connects the caller with the requested department.
Change to the project directory and run
$ php artisan make:controller MultilevelivrController
Edit app/http/controllers/MultilevelivrController.php and paste into it this code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
<?php
namespace App\Http\Controllers;
require '../../vendor/autoload.php';
use Plivo\RestClient;
use Plivo\XML\Response;
use Illuminate\Http\Request;
class MultilevelivrController extends Controller
{
public function detectDtmf()
{
$welcome_message = "Welcome to the demo. Press 1 for your account balance. Press 2 for your account status. Press 3 to speak to a representative";// Welcome message, first branch
$no_input = "Sorry, I didn't catch that. Please hang up and try again"; // Message that Plivo reads when the caller does nothing
$response = new Response();
$get_input = $response->addGetInput(
[
'action' => "https://<ngrok_identifier>.ngrok.io/firstBranch/",
'method' => "POST",
'digitEndTimeout' => "5",
'inputType' => "dtmf",
'redirect' => "true",
]);
$get_input->addSpeak($welcome_message, ['language'=>"en-US", 'voice'=>"Polly.Salli"]);
$response->addSpeak($no_input);
$xml_response = $response->toXML();
return response($xml_response, 200)->header('Content-Type', 'application/xml');
}
// Action URL block for DTMF
public function firstBranch(Request $request)
{
$representative_branch = "Press 1 for sales. Press 2 for support"; // Message for second branch
$no_input = "Sorry, I didn't catch that. Please hang up and try again"; // Message that Plivo reads when the caller does nothing
$digit = $request->query('Digits');
$response = new Response();
if ($digit=="1") {
$bal_message = "Your account balance is $20";
$response->addSpeak($bal_message);
} elseif($digit=="2") {
$stat_message = "Your account status is active";
$response->addSpeak($stat_message);
} elseif($digit=="3") {
$get_input = $response->addGetInput(
[
'action' => "https://<ngrok_identifier>.ngrok.io/secondBranch/",
'method' => "POST",
'digitEndTimeout' => "5",
'inputType' => "dtmf",
'redirect' => "true",
]);
$get_input->addSpeak($representative_branch, ['language'=>"en-US", 'voice'=>"Polly.Salli"]);
} else {
$response->addSpeak($no_input);
}
$xml_response = $response->toXML();
return response($xml_response, 200)->header('Content-Type', 'application/xml');
}
// Action URL block for sales and support branch
public function secondBranch(Request $request)
{
$wrong_input = "Sorry, that's not a valid input"; // Message that Plivo reads when the caller inputs a wrong digit
$digit = $request->query('Digits');
$from_number = $request->query('From');
$response = new Response();
$params = array(
'callerId' => $from_number
);
if ($digit=="1") {
$dial = $response->addDial($params);
$number = "<number_1>";
$dial->addNumber($number);
} elseif($digit=="2") {
$dial = $response->addDial($params);
$number = "<number_2>";
$dial->addNumber($number);
} else {
$response->addSpeak($wrong_input);
}
$xml_response = $response->toXML();
return response($xml_response, 200)->header('Content-Type', 'application/xml');
}
}
Add a route for all the functions in the MultilevelivrController class. Edit routes/web.php and add these lines at the end of the file.
Route::match(['get', 'post'], '/detectdtmf', 'MultilevelivrController@detectDtmf');
Route::match(['get', 'post'], '/firstbranch', 'MultilevelivrController@firstBranch');
Route::match(['get', 'post'], '/secondbranch', 'MultilevelivrController@secondBranch');
You can improve DTMF collection by using attributes available for the GetInput XML element, such as digitEndTimeout, numDigit, finishOnKey, and executionTimeout.
digitEndTimeout sets the maximum time interval between successive digit inputs. The default value is auto
and other allowed values are 2 to 10 seconds. If the user provides no new digits within the digitEndTimeout period, the digits entered to that point will be processed.
numDigits sets the maximum number of digits the user can provide on the current call. The default value is 32 and the allowed values are 1 to 32.
If the user provides more digits than the value of numDigits, Plivo will send only the number of digits specified as numDigits to the action URL; additional digit inputs will be ignored. For example, if numDigits is specified as “4” and the user enters five digits, the last digit will be ignored.
finishOnKey defines a key that users can press to submit the digits they entered. The default value is # and additional allowed values are 0-9, *, <empty string>, and ”none.” When you set the value to <empty string> or “none,” DTMF input collection ends depending on the digitEndTimeout or the numDigits attribute.
executionTimeout sets the maximum time during which Plivo detects input. You can use this timeout to tell the application to process the next element in the XML response when a user doesn‘t provide input during the call. The default value is 15 seconds, and allowed values are 5 to 60 seconds.
The GetInput XML element can also capture speech input.
This example shows how to implement a simple IVR phone tree. A virtual assistant answers the call and offers the caller two choices: “Say sales to talk to a sales representative. Say support to talk to a support representative.”
If the caller says “sales,” the caller will be connected to a sales representative; if the caller says “support,” they will be connected to a support representative.
Change to the project directory and run
$ php artisan make:controller SpeechdetectionController
Edit app/http/controllers/SpeechdetectionController.php and paste into it this code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
<?php
namespace App\Http\Controllers;
require '../../vendor/autoload.php';
use Plivo\RestClient;
use Plivo\XML\Response;
use Illuminate\Http\Request;
class SpeechdetectionController extends Controller
{
public function detectSpeech()
{
$welcome_message = "Welcome to the demo. Say sales to talk to a sales representative. Say support to talk to a support representative"; // Welcome message, first branch
$no_input = "Sorry, I didn't catch that. Please hang up and try again"; // Message that Plivo reads when the caller does nothing
$response = new Response();
$get_input = $response->addGetInput(
[
'action' => "https://<ngrok_identifier>.ngrok.io/repBranch/",
'method' => "POST",
'interimSpeechResultsCallback' => 'https://<ngrok_identifier>.ngrok.io/repBranch/',
'interimSpeechResultsCallbackMethod' => 'POST',
'inputType' => "speech",
'redirect' => "true",
]);
$get_input->addSpeak($welcome_message, ['language'=>"en-US", 'voice'=>"Polly.Salli"]);
$response->addSpeak($no_input);
$xml_response = $response->toXML();
return response($xml_response, 200)->header('Content-Type', 'application/xml');
}
// Action URL block for sales and support branch
public function repBranch(Request $request)
{
$wrong_input = "Sorry, that's not a valid input"; // Message that Plivo reads when the caller speaks something unrecognized
$speech = $request->query('Speech');
$from_number = $request->query('From');
$response = new Response();
$params = array(
'callerId' => $from_number
);
if ($speech=="sales") {
$dial = $response->addDial($params);
$number = "<number_1>";
$dial->addNumber($number);
} elseif($speech=="support") {
$dial = $response->addDial($params);
$number = "<number_2>";
$dial->addNumber($number);
} else {
$response->addSpeak($wrong_input);
}
$xml_response = $response->toXML();
return response($xml_response, 200)->header('Content-Type', 'application/xml');
}
}
Add a route for all the functions in the SpeechdetectionController class. Edit routes/web.php and add these lines at the end of the file.
Route::match(['get', 'post'], '/detectspeech', 'SpeechdetectionController@detectSpeech');
Route::match(['get', 'post'], '/repbranch', 'SpeechdetectionController@repBranch');
Different applications may benefit from different automatic speech recognition (ASR) models, which you can specify using the the GetInput XML element‘s speechModel attribute. By default, it has a value of default, which is suitable for long-form audio, such as dictation, but you can also try command_and_search for shorter audio clips, such as when you expect callers to use voice commands or voice search, or phone_call, if you want to transcribe audio from a phone call. Explore the models and see which works best for your use case.
Example XML:
<Response>
<GetInput action="https://<yourdomain>.com/action/" method="POST" inputType="speech" speechModel="command_and_search" redirect="true">
<Speak>Welcome to the demo. Say sales to talk to a sales representative. Say support to talk to a support representative</Speak>
</GetInput>
<Speak>Sorry, I didn't catch that. Please hang up and try again later.</Speak>
</Response>
You can use the hints attribute to potentially improve speech transcription results by defining words and phrases that are common in your use case. For example, a call center where callers use voice commands to connect to various departments can use the names of the departments as hints.
Example XML:
<Response>
<GetInput action="https://<yourdomain>.com/action/" method="POST" inputType="speech" hints="sales,support" redirect="true">
<Speak>Welcome to the demo. Say sales to talk to a sales representative. Say support to talk to a support representative</Speak>
</GetInput>
<Speak>Sorry, I didn't catch that. Please hang up and try again later.</Speak>
</Response>
You can improve the functionality of speech input collection by using GetInput XML attributes such as speechEndTimeout, language, profanityFilter, and executionTimeout.
speechEndTimeout sets the time that Plivo waits for more speech input after silence is detected. The default value is auto; other allowed values are 2 to 10 seconds. If the user doesn‘t provide new speech input within the speechEndTimeout period, the speech collected to that point will be processed.
language specifies the language and national/regional dialect of the audio to be recognized on calls. The default language for speech detection is en-US. You can choose your preferred language from the list of supported languages.
profanityFilter: If a user speaks any profane words, Plivo can filter them out during transcription if you set this attribute to true. The profanity filter applies only to single words — it doesn‘t work for a combination of words. The default value is false.
executionTimeout sets the maximum time during which Plivo detects input. You can use this timeout to tell the application to process the next element in the XML response when a user doesn‘t provide input during the call. The default value is 15 seconds, and allowed values are 5 to 60 seconds.
Example XML:
<Response>
<GetInput action="https://<yourdomain>.com/action/" method="POST" inputType="speech" speechEndTimeout="5" language="en-IN" profanityFilter="true" executionTimeout="25" redirect="true">
<Speak>Welcome to the demo. Say sales to talk to a sales representative. Say support to talk to a support representative</Speak>
</GetInput>
<Speak>Sorry, I didn't catch that. Please hang up and try again later.</Speak>
</Response>
You can use the interimSpeechResultsCallback attribute to perform real-time speech recognition. If you specify a URL for your application server to this attribute, you can receive real-time callbacks of the user’s recognized speech while the user is still speaking on the call. Plivo sends the transcribed result to your server URL with attributes such as UnstableSpeech, Stability, StableSpeech, and SequenceNumber.
Example XML:
<Response>
<GetInput action="https://<yourdomain>.com/action/" method="POST" interimSpeechResultsCallback="https://<yourdomain>.com/interimcallback/" interimSpeechResultsCallbackMethod="POST" inputType="speech" redirect="true">
<Speak>Welcome to the demo. Say sales to talk to a sales representative. Say support to talk to a support representative</Speak>
</GetInput>
<Speak>Sorry, I didn't catch that. Please hang up and try again later.</Speak>
</Response>
You can use the GetInput XML element’s log attribute to manage input logging preferences. It defaults to true, but if you define it to false, logging will be disabled and Plivo will not log digit and speech input.