Question 1

What are tokens in AI and LLMs?

Accepted Answer

Tokens are the fundamental units that large language models use to process text. Rather than reading whole words, LLMs break text into tokens using Byte Pair Encoding (BPE). Common words like 'the' are single tokens, while longer words get split into multiple pieces. A good rule of thumb is 1 token ≈ 4 characters in English, or 100 tokens ≈ 75 words.

Question 2

What is the difference between input tokens and output tokens?

Accepted Answer

Input tokens are everything you send to the model (system prompt, conversation history, your message). Output tokens are what the model generates in response. Output tokens cost 5-8x more because generation requires sequential inference passes, while input can be processed in parallel. For example, GPT-5.4 charges $2.50/M input vs $15.00/M output.

Question 3

What is a context window in AI?

Accepted Answer

The context window is the maximum number of tokens an LLM can handle in a single request (input + output combined). GPT-5.4 supports 272K tokens (1M extended), Claude Opus 4.6 supports 1M tokens, and Gemini 2.5 Pro supports 1M tokens. Exceeding the context window returns an error.

Question 4

Why do AI conversation costs grow quadratically?

Accepted Answer

Chat APIs are stateless — each turn resends the entire conversation history as input. So turn 1 sends 1 message, turn 2 sends 3 messages, turn 3 sends 5 messages, etc. A 10-turn conversation uses roughly 3.6x more tokens than a naive estimate of 10 × first-turn tokens. This quadratic growth makes long conversations surprisingly expensive.

Question 5

What is prompt caching and how does it save money?

Accepted Answer

Prompt caching lets providers reuse cached computations for repeated request prefixes, reducing costs. Anthropic (Claude) offers 90% off cached reads, OpenAI offers 50% automatic caching, Google Gemini offers 90% off, and xAI Grok offers 75% off. It's most effective for large system prompts or reference documents that stay the same across requests.

Question 6

How accurate is this AI token calculator?

Accepted Answer

For OpenAI models (GPT-5.4, GPT-4.1, o3, o4-mini, etc.), token counts are EXACT — we use the official tiktoken tokenizer with o200k_base encoding. For all other models (Claude, Gemini, Grok, DeepSeek, Mistral, Llama), counts are approximate using a ~4 characters per token heuristic, which is typically within 10-20% of actual counts.