Skip to content

Model picker guide

Every Rymi agent runs a language model (the brain) and a voice (the sound). All models are available on every billing tier — tier only changes per-minute call cost.

Quick recommendations

I want the cheapest agent that still feels good

Pick GPT-4o Mini or Claude Haiku 4.5 for the LLM, OpenAI TTS for voice. Latency low, cost low, quality fine for most short flows.

I want the most natural-sounding agent

Pick Claude Sonnet 4.6 for the LLM (or Opus if budget allows), ElevenLabs for voice. Best for high-stakes calls — concierge, executive support, premium sales.

I need the lowest latency possible

Pick a realtime path: GPT-4o Realtime + native voice, or Gemini 2.5 Flash with native audio. Avoid stacking separate TTS providers — each hop adds 100–200 ms.

My users speak Hindi or other Indic languages

Pick Sarvam 30B or 105B for the LLM, Sarvam Bulbul v3 for voice. The full Sarvam stack is tuned together and runs from Indian regions for lower latency.

Language models

Anthropic (Claude)

Strong reasoning, careful tone — good default for most production agents.

ModelBest forNotes
claude-haiku-4-5Fast, friendly tone, handles 80% of support / qualification flowsCheapest Claude
claude-sonnet-4-6Balanced quality + speed. Solid for sales discovery and multi-step playbooks
claude-opus-4-6Highest reasoning. Use when nuance matters — complex objection handling, escalationsMost expensive

OpenAI (GPT)

Wide tool support, strong realtime variant for low-latency calls.

ModelBest forNotes
gpt-4o-miniShort verification or routing flowsCheapest
gpt-4oGeneral-purpose flagship. Reliable for most agent shapes
gpt-realtime-miniLow-latency native-audio voice on a budget. Strong default for production realtime callsRealtime
gpt-realtime-1.5The most natural-sounding voice agent on the market. Pick for premium concierge experiencesRealtime · premium

Google (Gemini)

Native multimodal audio path — strong default for voice-first agents.

ModelBest forNotes
gemini-1.5-flashHigh-volume top-of-funnel
gemini-2.0-flashBetter quality with similar speed
gemini-2.5-flashNewest Gemini Flash. Pair with native audio for low end-to-end latencyNative audio

Sarvam (India-optimized)

Tuned for Indian English, Hindi, and other Indic languages. Lower latency in India.

ModelBest for
sarvam-mRouting and simple dialogs
sarvam-30bMid-tier quality. Solid for most India support flows
sarvam-105bHighest Sarvam quality. Use for nuanced Indic conversations

Voices

ProviderWhat it isBest for
Gemini native audioBuilt into the Gemini stack with no extra hopLowest end-to-end latency. Default if you pick a Gemini model.
OpenAI TTS8 neutral voices (alloy, echo, shimmer, ash, ballad, coral, sage, verse)Cheap, fast, consistent. Limited expressive range.
ElevenLabs90+ voices, multilingual, accent control, BYO supportedHighest perceived quality and variety. Best for brand-sensitive deployments.
Deepgram Aura 2100+ voices across 60+ languagesWide language coverage, strong fallback option.
Sarvam Bulbul v3Indic-optimized TTSHindi and other Indic languages with natural prosody.
Cartesia Sonic4.7 MOS (rated above ElevenLabs in blind tests)Newest premium voice tier.

Bring your own keys

Connect your OpenAI / Anthropic / ElevenLabs / Cartesia / Groq / and other provider keys under Settings → BYO Providers to route through your own accounts. If no key is connected, Rymi falls back to the platform default. See Voice Providers API for the full BYOK list.

What's next