Build an AI Voice Agent by Integrating OpenAI’s Real-time Speech API with Plivo

Plivo helps businesses leverage OpenAI’s cutting-edge Real-time Speech-to-Speech (S2S) capabilities through seamless integration with Plivo’s Audio Streaming API. This powerful combination enables you to create sophisticated AI voice assistants that can engage in natural conversations, handle interruptions gracefully, and provide real-time responses to user queries.

Get started with Plivo

Before beginning your AI voice assistant development, sign up for Plivo or sign in to your existing account. You’ll need to purchase a voice-enabled number through the Voice API or Plivo console.

Prerequisites

Ensure you have the following before starting:

  • Node.js version 22.6.0 or later (download here)
  • Python version 3.10.5 or later (download here)
  • A Plivo account with a voice-enabled number
  • An OpenAI account (sign up here)
    • Valid API key
    • Access to OpenAI’s Real-time API
  • ngrok installed for local development testing

Clone the Plivo audio stream integration guides repository

git clone https://github.com/plivo/AI-Voice-Agents.git
cd AI-Voice-Agents/Openai-realtime-api/Python
git clone https://github.com/plivo/AI-Voice-Agents.git
cd AI-Voice-Agents/Openai-realtime-api/NodeJS

Setup Your Local Environment

1. Create a Tunnel with ngrok For local development, you’ll need a public URL to receive webhooks. Open a terminal and run:

ngrok http 5000

Copy the Forwarding URL (format: https://[your-ngrok-subdomain].ngrok.app). You’ll need this for the Plivo Answer XML.

Note: The port 5000 is this application’s default. If you change the PORT in index.js (in case of Node) or server.py (in case of Python), update the ngrok command accordingly. Remember that each new ngrok session creates a new URL requiring configuration updates.

2. Install Required Packages

pip install -r requirements.txt

If you are using Node.js:

npm install

3. Configure Environment Variables

Create a .env file in your project root and set up the following:

Add Plivo Credentials

PLIVO_AUTH_ID=<YOUR_PLIVO_AUTH_ID>
PLIVO_AUTH_TOKEN=<YOUR_PLIVO_AUTH_TOKEN>
PLIVO_FROM_NUMBER=<YOUR_PLIVO_NUMBER>
PLIVO_TO_NUMBER=<CALLER_PHONE_NUMBER>

Add OpenAI API Key

OPENAI_API_KEY=<YOUR_OPEN_AI_API_KEY>

Configure Answer XML

Use this template for your Plivo application’s Answer XML:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
        <Speak>Connected to AI Assistant. You may begin speaking.</Speak>
 <Stream keepCallAlive="true" audioTrack="both">
       wss://[your-ngrok-subdomain].ngrok.app/stream
 </Stream> 
</Response>

Update the PLIVO_ANSWER_XML variable in your .env file with your Answer URL.

Launch Your Application

  1. Ensure ngrok is running and you’ve noted the Forwarding URL
  2. Verify all environment variables are properly configured
  3. Start the application:
python server.py
node index.js

The application will automatically initiate a call to the number specified in PLIVO_TO_NUMBER. Once the call is answered, you can begin interacting with your AI assistant.

Key Features

Your AI voice assistant includes:

  • Real-time audio streaming through Plivo’s WebSocket
  • Natural voice communication using OpenAI’s Real-time model
  • Intelligent interruption handling for natural conversation flow
  • Function calling support for enhanced capabilities
  • Bi-directional audio streaming for seamless interaction

Troubleshooting Guide

If you encounter issues:

  1. Check WebSocket Connection:
    • Verify ngrok is running
    • Confirm the WebSocket URL in your Answer XML matches your ngrok URL
    • Check for WebSocket connection errors in your logs
  2. Verify Environment Setup:
    • Confirm all environment variables are correctly set
    • Ensure OpenAI API key is valid
    • Verify Plivo credentials are correct
  3. Audio Issues:
    • Check audio stream configuration in Answer XML
    • Verify audio format compatibility
    • Monitor WebSocket data transfer logs

Next Steps

Consider these enhancements for your AI assistant:

  • Implement custom conversation flows
  • Add specific business logic through function calling
  • Create detailed conversation logs
  • Add support for multiple languages
  • Implement analytics and monitoring

For additional support: