Model picker guide

Every Rymi agent runs a language model (the brain) and a voice (the sound). All models are available on every billing tier — tier only changes per-minute call cost.

Quick recommendations

I want the cheapest agent that still feels good

Pick GPT-4o Mini or Claude Haiku 4.5 for the LLM, OpenAI TTS for voice. Latency low, cost low, quality fine for most short flows.

I want the most natural-sounding agent

Pick Claude Sonnet 4.6 for the LLM (or Opus if budget allows), ElevenLabs for voice. Best for high-stakes calls — concierge, executive support, premium sales.

I need the lowest latency possible

Pick a realtime path: GPT-4o Realtime + native voice, or Gemini 2.5 Flash with native audio. Avoid stacking separate TTS providers — each hop adds 100–200 ms.

My users speak Hindi or other Indic languages

Pick Sarvam 30B or 105B for the LLM, Sarvam Bulbul v3 for voice. The full Sarvam stack is tuned together and runs from Indian regions for lower latency.

Language models

Anthropic (Claude)

Strong reasoning, careful tone — good default for most production agents.

Model	Best for	Notes
`claude-haiku-4-5`	Fast, friendly tone, handles 80% of support / qualification flows	Cheapest Claude
`claude-sonnet-4-6`	Balanced quality + speed. Solid for sales discovery and multi-step playbooks
`claude-opus-4-6`	Highest reasoning. Use when nuance matters — complex objection handling, escalations	Most expensive

OpenAI (GPT)

Wide tool support, strong realtime variant for low-latency calls.

Model	Best for	Notes
`gpt-4o-mini`	Short verification or routing flows	Cheapest
`gpt-4o`	General-purpose flagship. Reliable for most agent shapes
`gpt-realtime-mini`	Low-latency native-audio voice on a budget. Strong default for production realtime calls	Realtime
`gpt-realtime-1.5`	The most natural-sounding voice agent on the market. Pick for premium concierge experiences	Realtime · premium

Google (Gemini)

Native multimodal audio path — strong default for voice-first agents.

Model	Best for	Notes
`gemini-2.5-flash-lite`	High-volume top-of-funnel	Cheapest
`gemini-2.5-flash`	Balanced quality and speed. Pair with native audio for low end-to-end latency	Native audio
`gemini-2.5-pro`	Highest Gemini quality for nuanced, multi-step flows

Sarvam (India-optimized)

Tuned for Indian English, Hindi, and other Indic languages. Lower latency in India.

Model	Best for
`sarvam-m`	Routing and simple dialogs
`sarvam-30b`	Mid-tier quality. Solid for most India support flows
`sarvam-105b`	Highest Sarvam quality. Use for nuanced Indic conversations

Voices

Provider	What it is	Best for
Gemini native audio	Built into the Gemini stack with no extra hop	Lowest end-to-end latency. Default if you pick a Gemini model.
OpenAI TTS	8 neutral voices (alloy, echo, shimmer, ash, ballad, coral, sage, verse)	Cheap, fast, consistent. Limited expressive range.
ElevenLabs	90+ voices, multilingual, accent control, BYO supported	Highest perceived quality and variety. Best for brand-sensitive deployments.
Deepgram Aura 2	100+ voices across 60+ languages	Wide language coverage, strong fallback option.
Sarvam Bulbul v3	Indic-optimized TTS	Hindi and other Indic languages with natural prosody.
Cartesia Sonic	Low-latency premium TTS with high naturalness, BYO key supported	Newest premium voice tier.

Bring your own keys

Connect your OpenAI / Anthropic / ElevenLabs / Cartesia / Groq / and other provider keys under Settings → BYO Providers to route through your own accounts. If no key is connected, Rymi falls back to the platform default. See Voice Providers API for the full BYOK list.

What's next

Get your number on a Rymi agent
Custom Personas — tune voice and identity in detail
API: Voice Providers

Model picker guide ​

Quick recommendations ​

I want the cheapest agent that still feels good ​

I want the most natural-sounding agent ​

I need the lowest latency possible ​

My users speak Hindi or other Indic languages ​

Language models ​

Anthropic (Claude) ​

OpenAI (GPT) ​

Google (Gemini) ​

Sarvam (India-optimized) ​

Voices ​

Bring your own keys ​

What's next ​