« Back to Glossary Index

Tokens: pieces of words, 1 token ~= 4 chars in English.

The GPT family of models process text using tokens, which are common sequences of characters found in text. The models understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.

Tokenization is the process of breaking down a text into tokens.
See OpenAI Tokenizer.

« Back to Glossary Index