Custom Personas (Agents)
Agents are the reusable voice personas that power your calls. Each agent stores its identity, conversation structure, AI stack preferences, runtime behavior, and post-call intelligence settings.
Agent Structure
An agent is built from layered configuration:
┌──────────────────────────┐
│ System Prompt (raw) │ ← Simple mode: just a string
├──────────────────────────┤
│ Persona │ ← Structured: role, tone, audience, voice
├──────────────────────────┤
│ Playbook │ ← Conversation flow: opener, scripts, CTAs
├──────────────────────────┤
│ Settings │ ← Runtime controls: barge-in, silence, turns
├──────────────────────────┤
│ Features │ ← Toggles: recording, transcription
├──────────────────────────┤
│ AI Stack │ ← LLM, STT, TTS provider/model selection
├──────────────────────────┤
│ Post-Call Intelligence │ ← Summary, extraction, evaluation config
└──────────────────────────┘Simple Mode vs Structured Mode
Simple Mode
Pass a system_prompt string and Rymi uses it directly as the LLM context:
{
"name": "Alex - Support Agent",
"system_prompt": "You are Alex, a friendly customer support agent for TechCorp..."
}Structured Mode
Use persona and playbook objects for more control. Rymi's Prompt Compiler merges these into an optimized system prompt at call time.
{
"name": "Priya - Sales Specialist",
"persona": {
"name": "Priya",
"role": "Insurance sales specialist",
"toneOverride": "Warm and confident",
"audienceDescription": "Small business owners in India",
"companyName": "Acme Insurance",
"successCriteria": ["Qualify the lead", "Book a follow-up call"],
"voiceConfig": {
"voiceId": "Aoede",
"language": "en-US"
},
"callerPersonas": [
{ "type": "interested", "approach": "Mirror enthusiasm, move to qualification" },
{ "type": "skeptical", "approach": "Lead with social proof and case studies" }
]
},
"playbook": {
"opener": "Hi, this is Priya from Acme Insurance. Is this a good time?",
"qualificationFlow": [
{ "question": "How many employees does your company have?", "listensFor": "Company size" },
{ "question": "What's your current insurance provider?", "listensFor": "Current provider" }
],
"objectionHandlers": [
{ "trigger": "too expensive", "response": "I understand cost is important. Our plans start at just..." }
],
"closingCTA": "I'd love to set up a quick demo. Does Thursday work for you?",
"fallbackCTA": "Can I send you some information to review at your convenience?"
}
}Rymi compiles the agent instructions from the persona and playbook fields and returns the compiled prompt with the agent.
AI Stack Configuration
Each agent can be configured with specific LLM, STT, and TTS providers. The AI stack is organized by agent role:
The Executive role uses the executive API value.
| Role | Pipeline | Best For |
|---|---|---|
operator | Separate STT → LLM → TTS | Cost-efficient, flexible provider mix |
specialist | Separate STT → LLM → TTS | Higher-quality models with Google Gemini Pro TTS by default; ElevenLabs can be selected as a premium/custom voice override |
executive | Bundled realtime (Gemini Live / OpenAI Realtime) | Lowest latency, end-to-end |
Setting the AI Stack
{
"agent_role": "operator",
"llm_model": "gemini-2.5-flash",
"stt_provider": "google",
"tts_provider": "google",
"tts_model": "gemini-2.5-flash-preview-tts",
"voice": "Aoede"
}For the Executive role (executive API value), STT and TTS are handled by the realtime LLM itself:
{
"agent_role": "executive",
"llm_model": "gemini-live"
}Use GET /v1/agents/llm-options to fetch the catalog of available models and voices.
Language Routing
Rymi derives the provider route from the agent role, primary language, supported languages, and provider capabilities. Set language to a locale such as en-US or hi-IN, and use supported_languages for every language the agent may run. Rymi resolves the role-safe STT, LLM, and TTS stack for each selected language before save. Automatic language detection is not default MVP behavior.
Runtime Controls
The advanced object tunes how the agent behaves during calls:
{
"advanced": {
"bargeInEnabled": true,
"maxTurnLength": 30,
"postSilenceHangup": 15,
"endpointingThreshold": 500
}
}| Control | Type | Description |
|---|---|---|
bargeInEnabled | boolean | Allow user to interrupt the agent mid-response |
maxTurnLength | number | Maximum agent response duration in seconds |
postSilenceHangup | number | End call after this many seconds of user silence |
endpointingThreshold | number | Silence duration (ms) before treating speech as complete |
TIP
These controls are enforced at runtime by the gateway — they override any contradictory instructions in the system prompt.
Feature Flags
Toggle capabilities per agent:
{
"features": {
"recording": { "enabled": true },
"transcription_enabled": true
}
}| Feature | Effect When Disabled |
|---|---|
recording | No call recording is started |
transcription_enabled | No transcript persistence, no transcript data packets, no post-call transcript analysis |
Post-Call Intelligence
Configure what analysis runs after each call ends. See the Post-Call Intelligence guide for full details.
{
"post_call": {
"recording": { "enabled": true },
"summary": { "enabled": true },
"structured_extraction": {
"json_schema": {
"type": "object",
"properties": {
"appointment_booked": { "type": "boolean" },
"follow_up_date": { "type": "string" }
}
}
},
"evaluation": {
"rubric": "Did the agent successfully qualify the lead and book a follow-up?"
}
}
}Auto-Generation
Describe your ideal agent in plain English and let Rymi generate the full persona/playbook bundle:
curl -X POST https://api.rymi.live/v1/agents/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A friendly female sales agent who speaks English with an American accent and sells insurance plans",
"options": { "llm_provider": "gemini", "voice": "Aoede" }
}'The response includes a draft object and a compiled_prompt_preview you can review before creating the agent.

