Advanced API Usage

Prompt Caching

When consecutive requests share the same starting messages, the xAI API automatically caches them. On the next request, messages at the beginning that match exactly are served from cache:

Faster time-to-first-token — the model skips re-computing cached messages
Lower cost — cached tokens are billed at a reduced rate

[!NOTE]

The xAI API performs prompt caching automatically. However, we recommend setting the x-grok-conv-id HTTP header to maximize your cache hit rate.

In this section

How It Works — Understand how caching works from the start of your messages array
Maximizing Cache Hits — Set up x-grok-conv-id and prompt_cache_key for optimal caching
What Breaks Caching — Common mistakes that cause cache misses
Usage & Pricing — Read cached token counts and understand billing
Best Practices & FAQ — Tips, supported models, and common questions

Advanced API Usage

Prompt Caching

When consecutive requests share the same starting messages, the xAI API automatically caches them. On the next request, messages at the beginning that match exactly are served from cache:

Faster time-to-first-token — the model skips re-computing cached messages
Lower cost — cached tokens are billed at a reduced rate

[!NOTE]

The xAI API performs prompt caching automatically. However, we recommend setting the x-grok-conv-id HTTP header to maximize your cache hit rate.

In this section

How It Works — Understand how caching works from the start of your messages array
Maximizing Cache Hits — Set up x-grok-conv-id and prompt_cache_key for optimal caching
What Breaks Caching — Common mistakes that cause cache misses
Usage & Pricing — Read cached token counts and understand billing
Best Practices & FAQ — Tips, supported models, and common questions

pages/developers/advanced-api-usage/prompt-caching.md

Advanced API Usage

Prompt Caching

In this section

pages/developers/advanced-api-usage/prompt-caching.md

Advanced API Usage

Prompt Caching

In this section