Lucy
Talk
Voice AI Glossary · 2026

What Is a Token in AI?

A token is the basic unit of text that large language models process. Rather than working with whole words, LLMs break text into tokens — roughly corresponding to word fragments, common words, or punctuation. One token is approximately 0.75 words in English.

Try Lucy OS1 →

Definition in Full

The tokenization process (converting text to token IDs) is done by a tokenizer specific to each model family. Common words like 'the', 'is', and 'you' are single tokens. Longer or less common words are split into multiple tokens — 'programming' might be 'program' + 'ming'. This approach gives models efficiency across diverse vocabularies and languages.

How Lucy OS1 Uses Token

Lucy OS1 uses GPT-4o-mini, which processes at high token throughput — enabling real-time streaming responses. Understanding token economics also helps Lucy stay efficient: shorter, more targeted responses where appropriate reduce cost and latency without sacrificing quality.

Try Lucy OS1 →

Key Concepts

Tokenization

The process of splitting text into tokens using a tokenizer. Different model families (GPT, Claude, Gemini) use different tokenizers with different vocabularies.

Token cost

LLM APIs charge per token. Input tokens (what you send) and output tokens (what the model generates) are typically priced separately. Output tokens cost 3-5x more than input tokens.

Tokens per second (TPS)

How fast a model generates output tokens. Higher TPS = lower latency in streaming responses. GPT-4o-mini generates 100-200 TPS; larger models generate 30-80 TPS.

Context length in tokens

Context windows are measured in tokens. A 128k token window = approximately 96,000 words or 380 pages of text.

Frequently Asked Questions

How many tokens is a typical conversation?

A single conversational turn (question + answer) is typically 100-500 tokens. An hour-long voice conversation might be 5,000-15,000 tokens total.

Why are tokens used instead of words?

Tokenization handles the full range of language — including rare words, code, multiple languages, and emoji — without needing a word in a fixed vocabulary. The tokenizer handles any input gracefully.

How does token count affect AI speed?

More input tokens = more to process before generating. More output tokens = longer to generate. Keeping prompts concise and responses appropriately brief improves latency.

Related Terms

Large Language Model (LLM) Context Window AI Latency Real-Time AI

Experience Token in Action

Lucy OS1 puts these concepts to work in a real, streaming voice AI pipeline — Deepgram STT, GPT-4o-mini, and Cartesia TTS delivering natural voice conversation.

Start talking to Lucy →

Welcome