How does a Large Language Model work?

Large Language Models (LLMs) are a type of generative AI model that generate text. Here is the 5-step process of how they "think".

Input text is tokenized — broken down into units (words or sub-words) that are represented as a numeric ID.

The tokens are embedded — converted into multi-dimensional lists of numbers (vectors) that represent their semantic meaning.

The model predicts the next most likely token based on the patterns it learned during training from the input context.

The model continues generating one token at a time until it predicts a special "Stop Token", indicating the end of the thought.

The generated numeric IDs are converted back into human-readable text.

This is an accurate, but simplified explanation. Modern transformers perform complex attention mechanisms in step 3 to understand context.