Lucy
Talk
Voice AI Glossary · 2026

What Is Speaker Diarization?

Speaker diarization is the process of automatically segmenting an audio recording to identify 'who spoke when.' It labels each speech segment with a speaker identifier, enabling transcripts that show which person said what.

Try Lucy OS1 →

Definition in Full

Diarization is valuable for meeting transcription, podcast notes, interview documentation, and any setting with multiple speakers. The AI does not know speaker names by default — it labels them Speaker 1, Speaker 2, etc. Speaker identification (a related but different technology) matches voice segments to known voice profiles.

How Lucy OS1 Uses Speaker Diarization

Lucy OS1 does not use diarization because it is designed as a one-on-one voice AI for a single user. Diarization is most useful in meeting recording tools — Lucy OS1 can summarise and process meeting notes you paste in, but it is not a meeting recorder itself.

Try Lucy OS1 →

Key Concepts

Speaker segmentation

Splitting audio into segments where each segment contains only one speaker. The foundation of diarization.

Speaker embedding clustering

Grouping segments by speaker based on voice similarity. Segments from the same speaker cluster together.

Speaker count estimation

Automatically detecting how many unique speakers are in the audio — useful when the number of participants is not known in advance.

Speaker identification

Matching speaker clusters to known voice profiles. Requires a pre-enrolled voice database. Different from diarization, which only labels relative speaker identity.

Frequently Asked Questions

What tools offer speaker diarization?

Deepgram, AssemblyAI, and AWS Transcribe all offer diarization as an optional feature. Otter.ai and Fireflies.ai use diarization for their meeting transcription products.

How accurate is AI speaker diarization?

In clean audio with well-separated speakers, diarization error rates of 5-10% are typical. Accuracy drops significantly with overlapping speech, similar voices, or poor audio quality.

Can diarization identify speakers by name?

Standard diarization assigns anonymous labels (Speaker 1, 2...). To identify speakers by name, you need speaker identification with enrolled voice profiles for each person.

Related Terms

Speech-to-Text (STT) Speech Recognition Voice AI

Experience Speaker Diarization in Action

Lucy OS1 puts these concepts to work in a real, streaming voice AI pipeline — Deepgram STT, GPT-4o-mini, and Cartesia TTS delivering natural voice conversation.

Start talking to Lucy →

Welcome