AI Voice Guidelines
AI Voice Guidelines
AI Voice Guidelines
Designing best practices for persona-aligned AI voice
Designing best practices for persona-aligned AI voice
AI voices often work functionally, but they rarely feel right. At Piramal, AI-powered voice experiences lacked emotional depth and a human touch. There was no existing research or framework to answer a fundamental question.
AI voices often work functionally, but they rarely feel right. At Piramal, AI-powered voice experiences lacked emotional depth and a human touch. There was no existing research or framework to answer a fundamental question.
How should an AI sound when its role, intent, or persona changes?
How should an AI sound when its role, intent, or persona changes?
As a result, voices felt robotic, generic, and disconnected from user expectations. This project set out to change that by treating voice as a designed system, not just an output.
As a result, voices felt robotic, generic, and disconnected from user expectations. This project set out to change that by treating voice as a designed system, not just an output.
Why This Was Needed?
Why This Was Needed?
Why This Was Needed?
As AI rapidly evolved, most efforts focused on capability — not communication.
As AI rapidly evolved, most efforts focused on capability — not communication.
Voice decisions were made intuitively, often through trial and error. This made it difficult to:
Voice decisions were made intuitively, often through trial and error. This made it difficult to:

Without a structured approach, “human-sounding AI” remained subjective and unrepeatable.
Without a structured approach, “human-sounding AI” remained subjective and unrepeatable.
Problem Statement
Problem Statement
Problem Statement
Instead of asking “What should an AI voice sound like?”, I reframed the problem to:
Instead of asking “What should an AI voice sound like?”, I reframed the problem to:
How can we systematically define an AI voice that aligns with a persona’s role and responsibilities while still feeling human?
How can we systematically define an AI voice that aligns with a persona’s role and responsibilities while still feeling human?


Research & Discovery
Research & Discovery
Research & Discovery
How Do We Define a ‘Human’ Voice?
How Do We Define a ‘Human’ Voice?
Before defining AI voices, I needed to understand how real humans communicate.
Before defining AI voices, I needed to understand how real humans communicate.
To do this, I analysed 5–6 outbound call recordings, focusing not just on what was said, but how it was said?
To do this, I analysed 5–6 outbound call recordings, focusing not just on what was said, but how it was said?


These transcripts captured more than words — they revealed emotional cues that shape subconscious user responses.
These transcripts captured more than words — they revealed emotional cues that shape subconscious user responses.
Identified Human Voice Trait
Identified Human Voice Trait
Identified Human Voice Trait
From Conversations to Voice Traits
From Conversations to Voice Traits
From Conversations to Voice Traits
By closely observing agent behaviour across these conversations, I Identified 72 distinct voice traits.
By closely observing agent behaviour across these conversations, I Identified 72 distinct voice traits.

This gave us confidence to design the experience around AI outputs, not assumptions.
This gave us confidence to design the experience around AI outputs, not assumptions.
Why Traits Alone Were Not Enough?
Why Traits Alone Were Not Enough?
Why Traits Alone Were Not Enough?
While these traits described how humans speak, they didn’t yet answer:
While these traits described how humans speak, they didn’t yet answer:





In other words, the traits lacked persona context.
In other words, the traits lacked persona context.
Defining Trait Levers Based on Persona Needs
Defining Trait Levers Based on Persona Needs
Defining Trait Levers Based on Persona Needs
To bridge this gap, I introduced Voice levers/ Settings for each voice trait → High, Medium, Low
To bridge this gap, I introduced Voice levers/ Settings for each voice trait → High, Medium, Low

These levers defined when and how strongly a trait should be expressed, depending on:
Persona role
Responsibility
These levers defined when and how strongly a trait should be expressed, depending on:
Persona role
Responsibility

This helped translate abstract traits into practical voice decisions.
This helped translate abstract traits into practical voice decisions.
The Remaining Problem: Manual Effort
The Remaining Problem: Manual Effort
The Remaining Problem: Manual Effort
Even with traits and levers defined, one challenge remained.
Even with traits and levers defined, one challenge remained.

This made the process hard to scale and inconsistent over time.
This made the process hard to scale and inconsistent over time.
Automating Persona-to-Voice Mapping
Automating Persona-to-Voice Mapping
Automating Persona-to-Voice Mapping
To remove manual decision-making, I built a custom GPT trained on voice settings of the traits.
To remove manual decision-making, I built a custom GPT trained on voice settings of the traits.

Now, when a persona’s role and responsibilities are defined, the system automatically generates a voice blueprint — a structured set of traits with recommended intensity levels.
Now, when a persona’s role and responsibilities are defined, the system automatically generates a voice blueprint — a structured set of traits with recommended intensity levels.
This turned voice design from a manual exercise into a repeatable system.
This turned voice design from a manual exercise into a repeatable system.
Was the Voice Actually Working?
Was the Voice Actually Working?
Was the Voice Actually Working?
Using ElevenLabs, I created a voice agent to test the GPT-generated voice blueprints and evaluate how the defined traits translated into real audio output.
Using ElevenLabs, I created a voice agent to test the GPT-generated voice blueprints and evaluate how the defined traits translated into real audio output.

Finale Outcome
Finale Outcome
Finale Outcome
Through 3–4 iterative refinements, I arrived at an AI voice that felt human, expressive, and aligned with the intended persona.
Through 3–4 iterative refinements, I arrived at an AI voice that felt human, expressive, and aligned with the intended persona.

Real World Impact
Real World Impact
Real World Impact
This project established a best-practice system for AI voice design at Piramal
This project established a best-practice system for AI voice design at Piramal

By simply defining persona responsibilities, teams can now generate, test, and refine AI voices with clarity and confidence.
By simply defining persona responsibilities, teams can now generate, test, and refine AI voices with clarity and confidence.

