Voice AI Glossary · 2026

What Is a Large Language Model (LLM)?

A large language model (LLM) is an AI system trained on vast quantities of text to predict the next most likely word in a sequence. By doing this at enormous scale with sophisticated architectures, LLMs develop the ability to reason, summarise, translate, write, and hold extended conversations.

Try Lucy OS1 →

Definition in Full

LLMs are trained through a process called self-supervised learning, the model is given text with some words masked and must predict the missing ones. After training on trillions of tokens of text (books, websites, code, conversations), the model develops an internal representation of language, knowledge, and reasoning. The 'large' in LLM refers to the number of parameters, weights in the neural network, which range from millions to hundreds of billions.

How Lucy OS1 Uses Large Language Model (LLM)

Lucy OS1 uses GPT-4o-mini as its reasoning engine, a frontier-quality LLM optimised for speed and conversational naturalness. It generates responses within 200-400ms in streaming mode, making voice conversation feel live rather than transactional.

Try Lucy OS1 →

Key Concepts

Context window

The amount of text an LLM can 'see' at once. Modern models have context windows of 128k-1M tokens, enough for hours of conversation without losing track of early context.

Temperature

A setting that controls how creative vs. deterministic the LLM's outputs are. Lower temperature = more predictable; higher = more varied and creative.

System prompt

Instructions given to the LLM before the user conversation begins. This is how Lucy OS1 instils Lucy's personality, memory context, and behavioural guidelines into every session.

Token generation speed

Measured in tokens per second. Faster token generation directly reduces conversation latency. GPT-4o-mini generates tokens 3-5x faster than GPT-4 at comparable quality for conversational tasks.

Frequently Asked Questions

What is the difference between GPT-4 and GPT-4o-mini?

GPT-4o-mini is a smaller, faster version optimised for tasks where speed matters more than maximum reasoning depth. For natural conversation, GPT-4o-mini is often preferred over full GPT-4 because the latency difference is more noticeable than the quality difference.

Do LLMs actually understand, or just predict?

This is a genuine philosophical debate. LLMs produce outputs that behave as if they understand context, nuance, and intent. Whether this constitutes 'understanding' in a deep sense is contested, but for practical applications, the distinction matters less than the capability.

Can LLMs be wrong?

Yes. LLMs can hallucinate, confidently stating incorrect information. This is reduced (but not eliminated) by retrieval-augmented generation (RAG), grounding the model in real-time data, and using the model's self-uncertainty signals.

Why do different AI tools feel so different if they use the same model?

The system prompt, memory integration, retrieval pipeline, and post-processing all shape the experience significantly. Two products using GPT-4o-mini can feel completely different based on how they prompt and contextualise the model.