Prompt Caching

Usage & Pricing

Chat Completions API

Cached tokens appear in usage.prompt_tokens_details.cached_tokens:

{
  "usage": {
    "prompt_tokens": 125,
    "completion_tokens": 48,
    "total_tokens": 173,
    "prompt_tokens_details": {
      "text_tokens": 125,
      "audio_tokens": 0,
      "image_tokens": 0,
      "cached_tokens": 98
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

Responses API

Cached tokens appear in usage.input_tokens_details.cached_tokens:

{
  "usage": {
    "input_tokens": 125,
    "output_tokens": 48,
    "total_tokens": 173,
    "input_tokens_details": {
      "cached_tokens": 98
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

Verifying cache hits

To determine whether your request benefitted from prompt caching, check the cached_tokens value in the response:

`cached_tokens` value	What it means
Equal to `prompt_tokens`	Full cache hit — your entire prompt was served from cache (rare, typically happens when resending the exact same request).
`0`	Cache miss — the entire prompt was computed from scratch. This is expected on the first request or after cache eviction.
`> 0`	Cache hit — some or all of your prompt prefix was served from cache. The number indicates how many tokens were reused.

A typical multi-turn conversation shows increasing cached_tokens over time:

Turn 1: prompt_tokens=50,  cached_tokens=0    # First request, cache established
Turn 2: prompt_tokens=120, cached_tokens=50   # Previous 50 tokens cached
Turn 3: prompt_tokens=200, cached_tokens=120  # Previous 120 tokens cached

[!NOTE]

If cached_tokens is consistently 0 across multiple requests in the same conversation, verify that you're setting x-grok-conv-id (or prompt_cache_key) and that you're not modifying earlier messages between requests.

Pricing

Cached tokens are billed at the cached prompt token price, which is substantially lower than the regular prompt token price. The exact rates vary by model — check the Pricing page for current prices.

Token type	Billing rate
Cached prompt tokens	Reduced cached prompt token price
Completion tokens	Full completion token price
Prompt tokens (non-cached)	Full prompt token price
Reasoning tokens	Full completion token price

[!NOTE]

Long context pricing applies when total prompt tokens (including cached tokens) exceed the model's long context threshold. Both cached and non-cached tokens use their respective long-context rates in this case.

Best Practices & FAQ

Prompt Caching

Usage & Pricing

Chat Completions API

Cached tokens appear in usage.prompt_tokens_details.cached_tokens:

{
  "usage": {
    "prompt_tokens": 125,
    "completion_tokens": 48,
    "total_tokens": 173,
    "prompt_tokens_details": {
      "text_tokens": 125,
      "audio_tokens": 0,
      "image_tokens": 0,
      "cached_tokens": 98
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

Responses API

Cached tokens appear in usage.input_tokens_details.cached_tokens:

{
  "usage": {
    "input_tokens": 125,
    "output_tokens": 48,
    "total_tokens": 173,
    "input_tokens_details": {
      "cached_tokens": 98
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

Verifying cache hits

To determine whether your request benefitted from prompt caching, check the cached_tokens value in the response:

`cached_tokens` value	What it means
Equal to `prompt_tokens`	Full cache hit — your entire prompt was served from cache (rare, typically happens when resending the exact same request).
`0`	Cache miss — the entire prompt was computed from scratch. This is expected on the first request or after cache eviction.
`> 0`	Cache hit — some or all of your prompt prefix was served from cache. The number indicates how many tokens were reused.

A typical multi-turn conversation shows increasing cached_tokens over time:

Turn 1: prompt_tokens=50,  cached_tokens=0    # First request, cache established
Turn 2: prompt_tokens=120, cached_tokens=50   # Previous 50 tokens cached
Turn 3: prompt_tokens=200, cached_tokens=120  # Previous 120 tokens cached

[!NOTE]

If cached_tokens is consistently 0 across multiple requests in the same conversation, verify that you're setting x-grok-conv-id (or prompt_cache_key) and that you're not modifying earlier messages between requests.

Pricing

Token type	Billing rate
Cached prompt tokens	Reduced cached prompt token price
Completion tokens	Full completion token price
Prompt tokens (non-cached)	Full prompt token price
Reasoning tokens	Full completion token price

[!NOTE]

Long context pricing applies when total prompt tokens (including cached tokens) exceed the model's long context threshold. Both cached and non-cached tokens use their respective long-context rates in this case.

Best Practices & FAQ

pages/developers/advanced-api-usage/prompt-caching/usage-and-pricing.md

Prompt Caching

Usage & Pricing

Chat Completions API

Responses API

Verifying cache hits

Pricing

Next

pages/developers/advanced-api-usage/prompt-caching/usage-and-pricing.md

Prompt Caching

Usage & Pricing

Chat Completions API

Responses API

Verifying cache hits

Pricing

Next