A token is the basic unit of text that large language models process. Rather than working with whole words, LLMs break text into tokens — roughly corresponding to word fragments, common words, or punctuation. One token is approximately 0.75 words in English.
Try Lucy OS1 →The tokenization process (converting text to token IDs) is done by a tokenizer specific to each model family. Common words like 'the', 'is', and 'you' are single tokens. Longer or less common words are split into multiple tokens — 'programming' might be 'program' + 'ming'. This approach gives models efficiency across diverse vocabularies and languages.
Lucy OS1 uses GPT-4o-mini, which processes at high token throughput — enabling real-time streaming responses. Understanding token economics also helps Lucy stay efficient: shorter, more targeted responses where appropriate reduce cost and latency without sacrificing quality.
Try Lucy OS1 →The process of splitting text into tokens using a tokenizer. Different model families (GPT, Claude, Gemini) use different tokenizers with different vocabularies.
LLM APIs charge per token. Input tokens (what you send) and output tokens (what the model generates) are typically priced separately. Output tokens cost 3-5x more than input tokens.
How fast a model generates output tokens. Higher TPS = lower latency in streaming responses. GPT-4o-mini generates 100-200 TPS; larger models generate 30-80 TPS.
Context windows are measured in tokens. A 128k token window = approximately 96,000 words or 380 pages of text.
How many tokens is a typical conversation?
A single conversational turn (question + answer) is typically 100-500 tokens. An hour-long voice conversation might be 5,000-15,000 tokens total.
Why are tokens used instead of words?
Tokenization handles the full range of language — including rare words, code, multiple languages, and emoji — without needing a word in a fixed vocabulary. The tokenizer handles any input gracefully.
How does token count affect AI speed?
More input tokens = more to process before generating. More output tokens = longer to generate. Keeping prompts concise and responses appropriately brief improves latency.
Lucy OS1 puts these concepts to work in a real, streaming voice AI pipeline — Deepgram STT, GPT-4o-mini, and Cartesia TTS delivering natural voice conversation.
Start talking to Lucy →Welcome