Prerequisites
| Service | What You Need |
|---|---|
| Plivo | Auth ID, Auth Token, Voice-enabled phone number |
| Deepgram | API key from console.deepgram.com |
| API key from AI Studio | |
| Cartesia | API key from play.cartesia.ai |
Installation
Environment Variables
Pipeline Configuration
Service Details
Deepgram STT
Real-time speech recognition with interim results and language detection.| Option | Description |
|---|---|
DeepgramSTTService | Standard WebSocket transcription |
DeepgramFluxSTTService | Enhanced turn detection for conversations |
Google Gemini LLM
Streaming responses with function calling and multimodal input support.| Model | Description |
|---|---|
gemini-1.5-flash | Fast, cost-effective |
gemini-1.5-pro | Most capable |
gemini-2.0-flash-exp | Latest experimental |
- Streaming responses
- Function calling
- Multimodal inputs (text, images)
- OpenAI-compatible context format
Cartesia TTS
Real-time voice synthesis with word-level timing and interruption handling.| Feature | Method |
|---|---|
| Spell out text | SPELL("ABC") |
| Add emotion | EMOTION_TAG("SARCASM") |
| Insert pause | PAUSE_TAG(0.5) |
| Adjust speed | SPEED_TAG(1.2) |
| Adjust volume | VOLUME_TAG(0.8) |
Quick Start
Inbound Calls
Outbound Calls
Related
- Pipecat Overview - Architecture and setup
- Deepgram Docs - STT configuration
- Gemini Docs - LLM configuration
- Cartesia Docs - TTS configuration