Model picker guide
Every Rymi agent runs a language model (the brain) and a voice (the sound). All models are available on every billing tier — tier only changes per-minute call cost.
Quick recommendations
I want the cheapest agent that still feels good
Pick GPT-4o Mini or Claude Haiku 4.5 for the LLM, OpenAI TTS for voice. Latency low, cost low, quality fine for most short flows.
I want the most natural-sounding agent
Pick Claude Sonnet 4.6 for the LLM (or Opus if budget allows), ElevenLabs for voice. Best for high-stakes calls — concierge, executive support, premium sales.
I need the lowest latency possible
Pick a realtime path: GPT-4o Realtime + native voice, or Gemini 2.5 Flash with native audio. Avoid stacking separate TTS providers — each hop adds 100–200 ms.
My users speak Hindi or other Indic languages
Pick Sarvam 30B or 105B for the LLM, Sarvam Bulbul v3 for voice. The full Sarvam stack is tuned together and runs from Indian regions for lower latency.
Language models
Anthropic (Claude)
Strong reasoning, careful tone — good default for most production agents.
| Model | Best for | Notes |
|---|---|---|
claude-haiku-4-5 | Fast, friendly tone, handles 80% of support / qualification flows | Cheapest Claude |
claude-sonnet-4-6 | Balanced quality + speed. Solid for sales discovery and multi-step playbooks | |
claude-opus-4-6 | Highest reasoning. Use when nuance matters — complex objection handling, escalations | Most expensive |
OpenAI (GPT)
Wide tool support, strong realtime variant for low-latency calls.
| Model | Best for | Notes |
|---|---|---|
gpt-4o-mini | Short verification or routing flows | Cheapest |
gpt-4o | General-purpose flagship. Reliable for most agent shapes | |
gpt-realtime-mini | Low-latency native-audio voice on a budget. Strong default for production realtime calls | Realtime |
gpt-realtime-1.5 | The most natural-sounding voice agent on the market. Pick for premium concierge experiences | Realtime · premium |
Google (Gemini)
Native multimodal audio path — strong default for voice-first agents.
| Model | Best for | Notes |
|---|---|---|
gemini-1.5-flash | High-volume top-of-funnel | |
gemini-2.0-flash | Better quality with similar speed | |
gemini-2.5-flash | Newest Gemini Flash. Pair with native audio for low end-to-end latency | Native audio |
Sarvam (India-optimized)
Tuned for Indian English, Hindi, and other Indic languages. Lower latency in India.
| Model | Best for |
|---|---|
sarvam-m | Routing and simple dialogs |
sarvam-30b | Mid-tier quality. Solid for most India support flows |
sarvam-105b | Highest Sarvam quality. Use for nuanced Indic conversations |
Voices
| Provider | What it is | Best for |
|---|---|---|
| Gemini native audio | Built into the Gemini stack with no extra hop | Lowest end-to-end latency. Default if you pick a Gemini model. |
| OpenAI TTS | 8 neutral voices (alloy, echo, shimmer, ash, ballad, coral, sage, verse) | Cheap, fast, consistent. Limited expressive range. |
| ElevenLabs | 90+ voices, multilingual, accent control, BYO supported | Highest perceived quality and variety. Best for brand-sensitive deployments. |
| Deepgram Aura 2 | 100+ voices across 60+ languages | Wide language coverage, strong fallback option. |
| Sarvam Bulbul v3 | Indic-optimized TTS | Hindi and other Indic languages with natural prosody. |
| Cartesia Sonic | 4.7 MOS (rated above ElevenLabs in blind tests) | Newest premium voice tier. |
Bring your own keys
Connect your OpenAI / Anthropic / ElevenLabs / Cartesia / Groq / and other provider keys under Settings → BYO Providers to route through your own accounts. If no key is connected, Rymi falls back to the platform default. See Voice Providers API for the full BYOK list.
What's next
- Get your number on a Rymi agent
- Custom Personas — tune voice and identity in detail
- API: Voice Providers

