• Tech Dev NotesTech Dev Notes
Apps
  • App lookup
  • App compare
Market movement
  • App charts
  • App rankings
Visual proof
  • App screens
  • App listing screenshots
  • App icons
Build intelligence
  • App tech stacks
  • Tool releases
  • Developers
More
  • X feature flags
  • Grokipedia
  • Blog
  • Follow on X
Skip to content
All content/ filesChangelog

xai-docs/latest/content · Jun 27, 00:17 UTC

pages/developers/advanced-api-usage/prompt-caching/multi-turn.md

MD·7.8 KB·201 lines

content/

  • .

    • llms.txt
  • pages

    • overview.md
  • pages/build

    • enterprise.md
    • modes-and-commands.md
    • overview.md
    • settings.md
  • pages/build/cli

    • headless-scripting.md
  • pages/build/features

    • skills-plugins-marketplaces.md
  • pages/console

    • billing.md
    • collections.md
    • usage.md
  • pages/console/faq

    • accounts.md
    • billing.md
    • security.md
  • pages/developers

    • community.md
    • cost-tracking.md
    • debugging.md
    • docs-mcp.md
    • files.md
    • grpc-api-reference.md
    • management-api-guide.md
    • models.md
    • pricing.md
    • quickstart.md
    • rate-limits.md
    • release-notes.md
  • pages/developers/advanced-api-usage

    • async.md
    • batch-api.md
    • context-compaction.md
    • deferred-chat-completions.md
    • mtls.md
    • priority-processing.md
    • prompt-caching.md
    • websocket-mode.md
  • pages/developers/advanced-api-usage/prompt-caching

    • best-practices.md
    • how-it-works.md
    • maximizing-cache-hits.md
    • multi-turn.md
    • usage-and-pricing.md
  • pages/developers/faq

    • accounts.md
    • billing.md
    • general.md
    • security.md
    • team-management.md
  • pages/developers/files

    • collections.md
    • managing-files.md
    • public-urls.md
  • pages/developers/files/collections

    • api.md
    • metadata.md
  • pages/developers/migration

    • may-15-retirement.md
  • pages/developers/model-capabilities

    • imagine.md
  • pages/developers/model-capabilities/audio

    • custom-voices.md
    • ephemeral-tokens.md
    • speech-to-text.md
    • text-to-speech.md
    • voice-agent.md
    • voice.md
  • pages/developers/model-capabilities/audio/voice-agent

    • sip.md
  • pages/developers/model-capabilities/files

    • chat-with-files.md
  • pages/developers/model-capabilities/images

    • editing.md
    • generation.md
    • multi-image-editing.md
    • understanding.md
  • pages/developers/model-capabilities/imagine

    • files.md
  • pages/developers/model-capabilities/imagine/files

    • inputs.md
    • outputs.md
  • pages/developers/model-capabilities/legacy

    • chat-completions.md
  • pages/developers/model-capabilities/text

    • comparison.md
    • generate-text.md
    • multi-agent.md
    • reasoning.md
    • streaming.md
    • structured-outputs.md
  • pages/developers/model-capabilities/video

    • editing.md
    • extension.md
    • generation.md
    • image-to-video.md
    • reference-to-video.md
  • pages/developers/models

    • speech-to-text.md
    • text-to-speech.md
    • voice-agent-api.md
  • pages/developers/rest-api-reference

    • collections.md
    • files.md
    • inference.md
    • management.md
  • pages/developers/rest-api-reference/collections

    • collection.md
    • search.md
  • pages/developers/rest-api-reference/files

    • download.md
    • manage.md
    • upload.md
  • pages/developers/rest-api-reference/inference

    • batches.md
    • chat.md
    • images.md
    • legacy.md
    • models.md
    • other.md
    • speech-to-text.md
    • videos.md
    • voice.md
  • pages/developers/rest-api-reference/management

    • audit.md
    • auth.md
    • billing.md
  • pages/developers/tools

    • advanced-usage.md
    • citations.md
    • code-execution.md
    • collections-search.md
    • function-calling.md
    • overview.md
    • remote-mcp.md
    • streaming-and-sync.md
    • tool-usage-details.md
    • web-search.md
    • x-search.md
  • pages/grok

    • connector-management.md
    • connectors.md
    • faq.md
    • management.md
    • organization.md
    • user-guide.md
  • pages/grok/connectors

    • custom-mcp-tunneling.md
    • gmail-google-calendar.md
    • google-drive.md
    • microsoft-teams.md
    • onedrive.md
    • outlook.md
    • salesforce.md
    • sharepoint.md
  • pages/grok/faq

    • team-management.md
  • pages/integrations

    • hubspot-mcp-setup.md

Prompt Caching

What Breaks Caching

Any change to earlier messages breaks the cache. Only append new messages at the end.

[!WARNING]

Keep messages unchanged. For cache hits in multi-turn conversations, never edit, remove, or reorder earlier messages — only append new ones. For reasoning models, you must include reasoning_content from previous responses; omitting it is the top cause of cache misses.

For reasoning models, you can maintain cache hits by either:

  • Sending back the encrypted reasoning content — Include the reasoning_content from the previous response. See Encrypted Reasoning Content for details.
  • Using stateful responses — Use previous_response_id to automatically continue the conversation. See Chaining the Conversation for details.

Cache hit — appending a new message

The prompt prefix is identical to the previous request, with only a new user message appended:

# Turn 1: Initial request (establishes the cache)
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}
    ]
  }'

# Turn 2: Cache HIT — exact prefix preserved, new message appended
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_XAI_API_KEY",
    base_url="https://api.x.ai/v1",
)

conversation_id = "conv_abc123"
messages = [
    {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
    {"role": "user", "content": "What is prompt caching?"},
]

# Turn 1: Initial request (establishes the cache)
response = client.chat.completions.create(
    model="grok-4.3",
    messages=messages,
    extra_headers={"x-grok-conv-id": conversation_id},
)
print(f"Turn 1 — Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")

# Append the assistant's reply and the next user message
messages.append({"role": "assistant", "content": response.choices[0].message.content})
messages.append({"role": "user", "content": "Show me a code example."})

# Turn 2: Cache HIT — prefix is unchanged, only new messages appended
response = client.chat.completions.create(
    model="grok-4.3",
    messages=messages,
    extra_headers={"x-grok-conv-id": conversation_id},
)
print(f"Turn 2 — Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_XAI_API_KEY',
  baseURL: 'https://api.x.ai/v1',
});

const conversationId = 'conv_abc123';
const messages = [
  {
    role: 'system',
    content:
      'You are Grok, a helpful and truthful AI assistant built by xAI.',
  },
  { role: 'user', content: 'What is prompt caching?' },
];

// Turn 1: Initial request (establishes the cache)
const turn1 = await client.chat.completions.create(
  { model: 'grok-4.3', messages },
  { headers: { 'x-grok-conv-id': conversationId } },
);
console.log(
  `Turn 1 — Cached tokens: ${turn1.usage.prompt_tokens_details.cached_tokens}`,
);

// Append the assistant reply and next user message
messages.push({ role: 'assistant', content: turn1.choices[0].message.content });

messages.push({ role: 'user', content: 'Show me a code example.' });

// Turn 2: Cache HIT — prefix unchanged, new message appended
const turn2 = await client.chat.completions.create(
  { model: 'grok-4.3', messages },
  { headers: { 'x-grok-conv-id': conversationId } },
);
console.log(
  `Turn 2 — Cached tokens: ${turn2.usage.prompt_tokens_details.cached_tokens}`,
);

Cache miss — editing an earlier message

Changing the content of any earlier message breaks the prefix match:

# Cache MISS — editing the assistant message content
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "assistant", "content": "It stores KV pairs."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant response on line 11 was shortened to "It stores KV pairs." (line 12).

Cache miss — removing a message

Removing any message from the conversation breaks the prefix:

# Cache MISS — the assistant message was removed
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant message on line 11 was removed entirely.

Cache miss — reordering messages

Changing the order of messages also breaks the prefix:

# Cache MISS — user and system messages are swapped
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: Lines 9 and 10 were swapped — the user message now comes before the system message.

Next

  • Usage & Pricing
Previouspages/developers/advanced-api-usage/prompt-caching/maximizing-cache-hits.mdNextpages/developers/advanced-api-usage/prompt-caching/usage-and-pricing.md

© 2026 Tech Dev Notes

RSSAboutAPIPrivacyTermsSitemap@techdevnotes