Custom Personas (Agents)
Agents are the reusable voice personas that power your calls. Each agent stores its identity, conversation structure, AI stack preferences, runtime behavior, and post-call intelligence settings.
Agent Structure
An agent is built from layered configuration:
┌──────────────────────────┐
│ System Prompt (raw) │ ← Simple mode: just a string
├──────────────────────────┤
│ Persona │ ← Structured: role, tone, audience, voice
├──────────────────────────┤
│ Playbook │ ← Conversation flow: opener, scripts, CTAs
├──────────────────────────┤
│ Advanced │ ← Runtime controls: barge-in, silence, turns
├──────────────────────────┤
│ Features │ ← Toggles: recording, transcription
├──────────────────────────┤
│ AI Stack │ ← LLM, STT, TTS provider/model selection
├──────────────────────────┤
│ Post-Call Intelligence │ ← Summary, extraction, evaluation config
└──────────────────────────┘Simple Mode vs Structured Mode
Simple Mode
Pass a system_prompt string and Rymi uses it directly as the LLM context:
{
"name": "Alex - Support Agent",
"system_prompt": "You are Alex, a friendly customer support agent for TechCorp..."
}Structured Mode
Use persona and playbook objects for more control. Rymi's Prompt Compiler merges these into an optimized system prompt at call time.
{
"name": "Priya - Sales Specialist",
"persona": {
"name": "Priya",
"role": "Insurance sales specialist",
"toneOverride": "Warm and confident",
"audienceDescription": "Small business owners in India",
"companyName": "Acme Insurance",
"successCriteria": ["Qualify the lead", "Book a follow-up call"],
"voiceConfig": {
"voiceId": "Aoede",
"language": "en-US"
},
"callerPersonas": [
{ "type": "interested", "approach": "Mirror enthusiasm, move to qualification" },
{ "type": "skeptical", "approach": "Lead with social proof and case studies" }
]
},
"playbook": {
"opener": "Hi, this is Priya from Acme Insurance. Is this a good time?",
"qualificationFlow": [
{ "question": "How many employees does your company have?", "listensFor": "Company size" },
{ "question": "What's your current insurance provider?", "listensFor": "Current provider" }
],
"objectionHandlers": [
{ "trigger": "too expensive", "response": "I understand cost is important. Our plans start at just..." }
],
"closingCTA": "I'd love to set up a quick demo. Does Thursday work for you?",
"fallbackCTA": "Can I send you some information to review at your convenience?"
}
}The Prompt Compiler output is stored as compiled_prompt on the agent and returned in GET /agents/:id.
AI Stack Configuration
Each agent can be configured with specific LLM, STT, and TTS providers. The AI stack is organized by agent role:
The Executive role uses the executive API value.
| Role | Pipeline | Best For |
|---|---|---|
operator | Separate STT → LLM → TTS | Cost-efficient, flexible provider mix |
specialist | Separate STT → LLM → TTS | Higher-quality models with Google Gemini Pro TTS by default; ElevenLabs can be selected as a premium/custom voice override |
executive | Bundled realtime (Gemini Live / OpenAI Realtime) | Lowest latency, end-to-end |
Setting the AI Stack
{
"agent_role": "operator",
"llm_model": "gemini-2.5-flash",
"stt_provider": "google",
"tts_provider": "google",
"tts_model": "gemini-2.5-flash-preview-tts",
"voice": "Aoede"
}For the Executive role (executive API value), STT and TTS are handled by the realtime LLM itself:
{
"agent_role": "executive",
"llm_model": "gemini-live"
}Use GET /v1/agents/llm-options to fetch the catalog of available models and voices.
Provider Config (Advanced)
For fine-grained control, use provider_config to define the full routing role:
{
"provider_config": {
"active_role": "operator",
"tiers": {
"operator": {
"llm": { "primary": { "provider": "google", "model": "gemini-2.5-flash" } },
"stt": { "primary": { "provider": "deepgram", "model": "nova-2-phonecall" } },
"tts": { "primary": { "provider": "openai", "model": "tts-1" } }
}
}
}
}Runtime Controls
The advanced object tunes how the agent behaves during calls:
{
"advanced": {
"bargeInEnabled": true,
"maxTurnLength": 30,
"postSilenceHangup": 15,
"endpointingThreshold": 500
}
}| Control | Type | Description |
|---|---|---|
bargeInEnabled | boolean | Allow user to interrupt the agent mid-response |
maxTurnLength | number | Maximum agent response duration in seconds |
postSilenceHangup | number | End call after this many seconds of user silence |
endpointingThreshold | number | Silence duration (ms) before treating speech as complete |
TIP
These controls are enforced at runtime by the gateway — they override any contradictory instructions in the system prompt.
Feature Flags
Toggle capabilities per agent:
{
"features": {
"recording": { "enabled": true },
"transcription_enabled": true
}
}| Feature | Effect When Disabled |
|---|---|
recording | No LiveKit Egress recording is started |
transcription_enabled | No transcript persistence, no transcript data packets, no post-call transcript analysis |
Post-Call Intelligence
Configure what analysis runs after each call ends. See the Post-Call Intelligence guide for full details.
{
"post_call": {
"recording": { "enabled": true },
"summary": { "enabled": true },
"structured_extraction": {
"json_schema": {
"type": "object",
"properties": {
"appointment_booked": { "type": "boolean" },
"follow_up_date": { "type": "string" }
}
}
},
"evaluation": {
"rubric": "Did the agent successfully qualify the lead and book a follow-up?"
}
}
}Auto-Generation
Describe your ideal agent in plain English and let Rymi generate the full persona/playbook bundle:
curl -X POST https://api.rymi.live/v1/agents/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A friendly female sales agent who speaks English with an American accent and sells insurance plans",
"options": { "llm_provider": "gemini", "voice": "Aoede" }
}'The response includes a draft object and a compiled_prompt_preview you can review before creating the agent.

