Build an AI Voice Agent by Integrating OpenAI’s Real-time Speech API with Plivo

Plivo helps businesses leverage OpenAI’s cutting-edge Real-time Speech-to-Speech (S2S) capabilities through seamless integration with Plivo’s Audio Streaming API. This powerful combination enables you to create sophisticated AI voice assistants that can engage in natural conversations, handle interruptions gracefully, and provide real-time responses to user queries.

Get started with Plivo

Before beginning your AI voice assistant development, sign up for Plivo or sign in to your existing account. You’ll need to purchase a voice-enabled number through the Voice API or Plivo console.

Prerequisites

Ensure you have the following before starting:

Node.js version 22.6.0 or later (download here)
Python version 3.10.5 or later (download here)
A Plivo account with a voice-enabled number
An OpenAI account (sign up here)
- Valid API key
- Access to OpenAI’s Real-time API
ngrok installed for local development testing

Clone the Plivo audio stream integration guides repository

git clone https://github.com/plivo/AI-Voice-Agents.git
cd AI-Voice-Agents/Openai-realtime-api/Python

Was this code helpful

git clone https://github.com/plivo/AI-Voice-Agents.git
cd AI-Voice-Agents/Openai-realtime-api/NodeJS

Was this code helpful

Setup Your Local Environment

1. Create a Tunnel with ngrok For local development, you’ll need a public URL to receive webhooks. Open a terminal and run:

ngrok http 5000

Was this code helpful

Copy the Forwarding URL (format: https://[your-ngrok-subdomain].ngrok.app). You’ll need this for the Plivo Answer XML.

Note: The port 5000 is this application’s default. If you change the PORT in index.js (in case of Node) or server.py (in case of Python), update the ngrok command accordingly. Remember that each new ngrok session creates a new URL requiring configuration updates.

2. Install Required Packages

pip install -r requirements.txt

Was this code helpful

If you are using Node.js:

npm install

Was this code helpful

3. Configure Environment Variables

Create a .env file in your project root and set up the following:

Add Plivo Credentials

PLIVO_AUTH_ID=<YOUR_PLIVO_AUTH_ID>
PLIVO_AUTH_TOKEN=<YOUR_PLIVO_AUTH_TOKEN>
PLIVO_FROM_NUMBER=<YOUR_PLIVO_NUMBER>
PLIVO_TO_NUMBER=<CALLER_PHONE_NUMBER>

Was this code helpful

Add OpenAI API Key

OPENAI_API_KEY=<YOUR_OPEN_AI_API_KEY>

Was this code helpful

Configure Answer XML

Use this template for your Plivo application’s Answer XML:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
        <Speak>Connected to AI Assistant. You may begin speaking.</Speak>
 <Stream keepCallAlive="true" audioTrack="both">
       wss://[your-ngrok-subdomain].ngrok.app/stream
 </Stream> 
</Response>

Was this code helpful

Update the PLIVO_ANSWER_XML variable in your .env file with your Answer URL.

Launch Your Application

Ensure ngrok is running and you’ve noted the Forwarding URL
Verify all environment variables are properly configured
Start the application:

python server.py

Was this code helpful

node index.js

Was this code helpful

The application will automatically initiate a call to the number specified in PLIVO_TO_NUMBER. Once the call is answered, you can begin interacting with your AI assistant.

Key Features

Your AI voice assistant includes:

Real-time audio streaming through Plivo’s WebSocket
Natural voice communication using OpenAI’s Real-time model
Intelligent interruption handling for natural conversation flow
Function calling support for enhanced capabilities
Bi-directional audio streaming for seamless interaction

Troubleshooting Guide

If you encounter issues:

Check WebSocket Connection:
- Verify ngrok is running
- Confirm the WebSocket URL in your Answer XML matches your ngrok URL
- Check for WebSocket connection errors in your logs
Verify Environment Setup:
- Confirm all environment variables are correctly set
- Ensure OpenAI API key is valid
- Verify Plivo credentials are correct
Audio Issues:
- Check audio stream configuration in Answer XML
- Verify audio format compatibility
- Monitor WebSocket data transfer logs

Next Steps

Consider these enhancements for your AI assistant:

Implement custom conversation flows
Add specific business logic through function calling
Create detailed conversation logs
Add support for multiple languages
Implement analytics and monitoring

For additional support:

Visit Plivo Documentation
Check OpenAI API Documentation
Contact Plivo Support for technical assistance

Rate this page

🥳 Thank you! It means a lot to us!