• Tech Dev NotesTech Dev Notes
Apps
  • App lookup
  • App compare
Market movement
  • App charts
  • App rankings
Visual proof
  • App screens
  • App listing screenshots
  • App icons
Build intelligence
  • App tech stacks
  • Tool releases
  • Developers
More
  • X feature flags
  • Grokipedia
  • Blog
  • Follow on X
Skip to content
All content/ filesChangelog

xai-docs/latest/content · Jun 27, 00:17 UTC

pages/developers/advanced-api-usage/prompt-caching/usage-and-pricing.md

MD·2.9 KB·91 lines

content/

  • .

    • llms.txt
  • pages

    • overview.md
  • pages/build

    • enterprise.md
    • modes-and-commands.md
    • overview.md
    • settings.md
  • pages/build/cli

    • headless-scripting.md
  • pages/build/features

    • skills-plugins-marketplaces.md
  • pages/console

    • billing.md
    • collections.md
    • usage.md
  • pages/console/faq

    • accounts.md
    • billing.md
    • security.md
  • pages/developers

    • community.md
    • cost-tracking.md
    • debugging.md
    • docs-mcp.md
    • files.md
    • grpc-api-reference.md
    • management-api-guide.md
    • models.md
    • pricing.md
    • quickstart.md
    • rate-limits.md
    • release-notes.md
  • pages/developers/advanced-api-usage

    • async.md
    • batch-api.md
    • context-compaction.md
    • deferred-chat-completions.md
    • mtls.md
    • priority-processing.md
    • prompt-caching.md
    • websocket-mode.md
  • pages/developers/advanced-api-usage/prompt-caching

    • best-practices.md
    • how-it-works.md
    • maximizing-cache-hits.md
    • multi-turn.md
    • usage-and-pricing.md
  • pages/developers/faq

    • accounts.md
    • billing.md
    • general.md
    • security.md
    • team-management.md
  • pages/developers/files

    • collections.md
    • managing-files.md
    • public-urls.md
  • pages/developers/files/collections

    • api.md
    • metadata.md
  • pages/developers/migration

    • may-15-retirement.md
  • pages/developers/model-capabilities

    • imagine.md
  • pages/developers/model-capabilities/audio

    • custom-voices.md
    • ephemeral-tokens.md
    • speech-to-text.md
    • text-to-speech.md
    • voice-agent.md
    • voice.md
  • pages/developers/model-capabilities/audio/voice-agent

    • sip.md
  • pages/developers/model-capabilities/files

    • chat-with-files.md
  • pages/developers/model-capabilities/images

    • editing.md
    • generation.md
    • multi-image-editing.md
    • understanding.md
  • pages/developers/model-capabilities/imagine

    • files.md
  • pages/developers/model-capabilities/imagine/files

    • inputs.md
    • outputs.md
  • pages/developers/model-capabilities/legacy

    • chat-completions.md
  • pages/developers/model-capabilities/text

    • comparison.md
    • generate-text.md
    • multi-agent.md
    • reasoning.md
    • streaming.md
    • structured-outputs.md
  • pages/developers/model-capabilities/video

    • editing.md
    • extension.md
    • generation.md
    • image-to-video.md
    • reference-to-video.md
  • pages/developers/models

    • speech-to-text.md
    • text-to-speech.md
    • voice-agent-api.md
  • pages/developers/rest-api-reference

    • collections.md
    • files.md
    • inference.md
    • management.md
  • pages/developers/rest-api-reference/collections

    • collection.md
    • search.md
  • pages/developers/rest-api-reference/files

    • download.md
    • manage.md
    • upload.md
  • pages/developers/rest-api-reference/inference

    • batches.md
    • chat.md
    • images.md
    • legacy.md
    • models.md
    • other.md
    • speech-to-text.md
    • videos.md
    • voice.md
  • pages/developers/rest-api-reference/management

    • audit.md
    • auth.md
    • billing.md
  • pages/developers/tools

    • advanced-usage.md
    • citations.md
    • code-execution.md
    • collections-search.md
    • function-calling.md
    • overview.md
    • remote-mcp.md
    • streaming-and-sync.md
    • tool-usage-details.md
    • web-search.md
    • x-search.md
  • pages/grok

    • connector-management.md
    • connectors.md
    • faq.md
    • management.md
    • organization.md
    • user-guide.md
  • pages/grok/connectors

    • custom-mcp-tunneling.md
    • gmail-google-calendar.md
    • google-drive.md
    • microsoft-teams.md
    • onedrive.md
    • outlook.md
    • salesforce.md
    • sharepoint.md
  • pages/grok/faq

    • team-management.md
  • pages/integrations

    • hubspot-mcp-setup.md

Prompt Caching

Usage & Pricing

Chat Completions API

Cached tokens appear in usage.prompt_tokens_details.cached_tokens:

{
  "usage": {
    "prompt_tokens": 125,
    "completion_tokens": 48,
    "total_tokens": 173,
    "prompt_tokens_details": {
      "text_tokens": 125,
      "audio_tokens": 0,
      "image_tokens": 0,
      "cached_tokens": 98
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

Responses API

Cached tokens appear in usage.input_tokens_details.cached_tokens:

{
  "usage": {
    "input_tokens": 125,
    "output_tokens": 48,
    "total_tokens": 173,
    "input_tokens_details": {
      "cached_tokens": 98
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

Verifying cache hits

To determine whether your request benefitted from prompt caching, check the cached_tokens value in the response:

cached_tokens value What it means
Equal to prompt_tokens Full cache hit — your entire prompt was served from cache (rare, typically happens when resending the exact same request).
0 Cache miss — the entire prompt was computed from scratch. This is expected on the first request or after cache eviction.
> 0 Cache hit — some or all of your prompt prefix was served from cache. The number indicates how many tokens were reused.

A typical multi-turn conversation shows increasing cached_tokens over time:

Turn 1: prompt_tokens=50,  cached_tokens=0    # First request, cache established
Turn 2: prompt_tokens=120, cached_tokens=50   # Previous 50 tokens cached
Turn 3: prompt_tokens=200, cached_tokens=120  # Previous 120 tokens cached

[!NOTE]

If cached_tokens is consistently 0 across multiple requests in the same conversation, verify that you're setting x-grok-conv-id (or prompt_cache_key) and that you're not modifying earlier messages between requests.

Pricing

Cached tokens are billed at the cached prompt token price, which is substantially lower than the regular prompt token price. The exact rates vary by model — check the Pricing page for current prices.

Token type Billing rate
Cached prompt tokens Reduced cached prompt token price
Completion tokens Full completion token price
Prompt tokens (non-cached) Full prompt token price
Reasoning tokens Full completion token price

[!NOTE]

Long context pricing applies when total prompt tokens (including cached tokens) exceed the model's long context threshold. Both cached and non-cached tokens use their respective long-context rates in this case.

Next

  • Best Practices & FAQ
Previouspages/developers/advanced-api-usage/prompt-caching/multi-turn.mdNextpages/developers/advanced-api-usage/websocket-mode.md

© 2026 Tech Dev Notes

RSSAboutAPIPrivacyTermsSitemap@techdevnotes