Tokens are model units
Before a model can work with text, a tokenizer turns that text into numbered pieces the model has learned to handle.
AI token guide
An AI token is a piece of text after it has been prepared for a language model. Token counts decide how much context fits in one request, how much an API call may cost, and how much room is left for the model's answer.
Before a model can work with text, a tokenizer turns that text into numbered pieces the model has learned to handle.
Context windows, prompt checks, and answer limits are measured in tokens, so word count can only get you part of the way.
LLM APIs generally price input and output by token volume. Bigger prompts and longer answers tend to cost more.
Different model families use different tokenizers, so the exact count can move around. The examples below show why a simple word count often misses what the model will actually process.
| Text | Why it matters |
|---|---|
| hello world | Short everyday phrases tend to stay compact after tokenization. |
| internationalization | Long words may be broken into smaller subword pieces. |
| { "city": "Tokyo", "unit": "celsius" } | JSON adds quotes, braces, keys, and punctuation, not just human-readable words. |
| 你好,世界 | Non-English text can follow a very different token-to-character pattern. |
Words are written for people. Tokens are the units a model receives after text has been converted into something it can process. A common English word may become one token, while a long word can be split into several pieces. Punctuation, URLs, numbers, code, and JSON can add tokens even when the human word count looks small.
That is why word count is best treated as an early planning shortcut. If a prompt has to fit a context window or stay inside a cost budget, it is worth measuring the actual text.
A context window is the amount of information a model can consider in one request. System instructions, user messages, retrieved documents, prior chat history, tool results, and the generated answer all draw from that budget. When the combined request gets too large, you may need to trim it, split it, or summarize older material.
In practice, token counts help teams decide when to shorten instructions, chunk a document, summarize old messages, or move to a model with more room.
Commercial LLM APIs typically separate input tokens from output tokens. Input is the text you send in; output is the text the model produces. Some providers also discount repeated prompt prefixes through cached-input pricing.
A useful estimate starts with a count, then applies the provider's rate per million tokens. Estimate input and output separately because the two sides are often priced differently.
An AI token is a piece of text after it has been converted for a language model. It might be a word, part of a word, punctuation, whitespace, a number, or a symbol.
Words are too uneven for model input. Tokens handle languages, code, punctuation, misspellings, and rare words in a more consistent way.
Most LLM APIs bill by input and output token volume. Longer prompts, more context, and longer answers usually mean a larger bill.
Not exactly. OpenAI, Claude, Gemini, and other model families can split the same text in different ways. Use counts as estimates unless you are using the provider's exact tokenizer.
For English prose, 1,000 AI tokens is often about 670 to 770 words. Code, JSON, punctuation-heavy text, and non-English content can change the ratio.