How does a Large Language Model work?

Large Language Models (LLMs) are a type of generative AI model that generate text. Here is the 5-step process of how they "think".

Diagram of how an LLM processes text
1

Tokenization

Input text is tokenized — broken down into units (words or sub-words) that are represented as a numeric ID.

2

Embedding

The tokens are embedded — converted into multi-dimensional lists of numbers (vectors) that represent their semantic meaning.

3

Prediction

The model predicts the next most likely token based on the patterns it learned during training from the input context.

4

The "Stop" Signal

The model continues generating one token at a time until it predicts a special "Stop Token", indicating the end of the thought.

5

Detokenization

The generated numeric IDs are converted back into human-readable text.

Important Note

This is an accurate, but simplified explanation. Modern transformers perform complex attention mechanisms in step 3 to understand context.