Agent Identity Guide
Agent Identity is who the agent is and how it sounds on the call — the parts callers experience first. Get this layer right and everything else has a foundation to build on.
You configure Identity in the Agent Studio under the "Agent Identity" section. It has five fields, covered below.
Your callers
What it is: A plain-language description of the people your agent talks to.
Why it matters: This is the single biggest lever for how the agent speaks. A dentist's patients and a SaaS CTO don't want to be talked to the same way. When the agent knows who's on the other end, it adjusts vocabulary, pacing, and assumed expertise.
Good examples
Appointment booking:
"People booking dental appointments — patients of all ages, some nervous, most non-technical."
Customer support:
"Users of our SaaS product. Mixed technical level. Usually frustrated and want a quick resolution."
Sales outreach:
"Heads of RevOps at mid-market B2B SaaS companies. Technical, busy, skeptical of cold calls."
Tips
- Include emotional context. "Nervous", "frustrated", "skeptical" — these shape delivery as much as vocabulary.
- Describe technical level. "Non-technical" vs. "technical" changes whether the agent uses jargon.
- Two to three sentences is plenty. More is fine, but this isn't a marketing persona — just enough for the agent to calibrate.
Language
What it is: The primary language the agent speaks first.
Why it matters: Picks the baseline language for STT (speech recognition), TTS (speech synthesis), and the LLM. Setting it to Auto-detect (inbound only) lets the agent switch languages based on what the caller speaks first, which is useful for multilingual businesses.
When to use "Auto-detect"
- You serve customers who speak multiple languages and you don't know which they'll use.
- Inbound calls only — outbound calls still need an explicit language because there's no incoming speech to detect.
When to pick a specific language
- Most of your callers speak one language.
- You need consistent voice selection (voice catalogs differ by language).
Voice tone
What it is: The personality your agent projects across every sentence.
Why it matters: Voice tone is the cross-cutting flavor on top of what the agent says. Same script, different tone → different caller experience. A "Warm & Friendly" confirmation and a "Professional & Direct" confirmation feel completely different.
Presets
Rymi ships six preset tones that cover most cases:
- Warm & Friendly — approachable, conversational. Good default for consumer-facing agents (booking, restaurants, wellness).
- Professional & Direct — clear, efficient. Good for B2B, sales, and anything where callers value their time.
- Casual & Conversational — relaxed, like a friend. Good for lifestyle brands and modern consumer apps.
- Empathetic & Patient — supportive, takes its time. Good for support, healthcare, and grief-sensitive contexts.
- Energetic & Confident — upbeat, assertive. Good for fitness, sales outreach, and motivational contexts.
- Formal & Precise — measured, authoritative. Good for legal, financial, and regulated industries.
Custom tone
If none of the presets fit, type your own in the "Or type your own…" field. Good custom tones are two or three adjectives: "Warm, efficient, and accommodating" or "Direct and confident, with a dry sense of humor."
Tips
- Don't overthink it. A preset that's 80% right beats a custom description that tries to be perfect.
- Mismatch tone and audience and the call will feel off. A legal agent with an "Energetic & Confident" tone will sound like an ambulance-chaser ad.
Voice
What it is: The TTS (text-to-speech) voice identity your agent uses — the actual audio on the call.
Why it matters: The voice is the part callers hear most. A clear, natural voice makes everything easier; a robotic or inconsistent one undermines trust no matter how good the script is.
Availability by tier
- Operator tier — Standard voice catalog. Good for reminders, FAQs, and high-volume simple calls.
- Specialist tier — Premium voice catalog. Higher naturalness, wider selection, multiple languages.
- Executive tier — Studio-grade real-time audio. Voice is bound to the selected real-time AI model; you don't pick a voice independently.
Deepgram Aura
If your TTS provider is Deepgram Aura, the voice is tied to the selected model. Change the model in Advanced Settings to switch voices.
ElevenLabs
If you've connected an ElevenLabs account, voices you've created in ElevenLabs show up in the selector. Use the "Refresh Voices" button next to the Voice field to pull newly added voices without waiting for the next automatic sync.
Tips
- Listen before you commit. The preview button plays a short sample of the selected voice — use it.
- Match voice to audience. A clinical, neutral voice works for support; a warmer voice works for booking.
- Voice consistency matters. Once users hear your agent, they'll map that voice to your brand. Changing it later can feel jarring.
Speaking style
What it is: An optional hint passed to the TTS engine — accent, delivery style, pacing preference.
Why it matters: Most agents don't need this, which is why the field is collapsed by default. But when you do need it, it gives a fine-grained lever over voice delivery that goes beyond the tone preset.
When to use it
- Accent: "Neutral American", "British Professional", "Southern US with gentle pacing".
- Delivery style: "Theatrical delivery", "Measured, deliberate pacing", "Conversational with natural pauses".
- Context-specific: "Like a news anchor reading the weather", "Relaxed and reassuring, no medical jargon".
Tips
- Only fill this in if you have a specific delivery in mind. Leaving it empty uses the TTS default, which is usually fine.
- Not all TTS engines honor all hints. Experimental — test and iterate.
Putting it together: a worked example
A complete Identity block for a dental booking agent:
name: Maya
persona:
audienceDescription: "Patients of all ages booking dental appointments. Some nervous, most non-technical."
toneOverride: "Warm & Friendly"
voiceConfig:
voiceId: sarah-v2
language: en-US
accent: "Relaxed and reassuring. No medical jargon."
language: en-US
voice: sarah-v2Next steps
- Write Your callers first — it's the field with the biggest ripple effect on the rest of the config.
- Pick a tone preset and a voice, then run a test call to hear how they sound together.
- Iterate. The first voice you pick rarely stays. Listen to 10 real calls and refine.

