Prompt Caching

What Breaks Caching

Any change to earlier messages breaks the cache. Only append new messages at the end.

[!WARNING]

Keep messages unchanged. For cache hits in multi-turn conversations, never edit, remove, or reorder earlier messages — only append new ones. For reasoning models, you must include reasoning_content from previous responses; omitting it is the top cause of cache misses.

For reasoning models, you can maintain cache hits by either:

Sending back the encrypted reasoning content — Include the reasoning_content from the previous response. See Encrypted Reasoning Content for details.
Using stateful responses — Use previous_response_id to automatically continue the conversation. See Chaining the Conversation for details.

Cache hit — appending a new message

The prompt prefix is identical to the previous request, with only a new user message appended:

# Turn 1: Initial request (establishes the cache)
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}
    ]
  }'

# Turn 2: Cache HIT — exact prefix preserved, new message appended
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_XAI_API_KEY",
    base_url="https://api.x.ai/v1",
)

conversation_id = "conv_abc123"
messages = [
    {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
    {"role": "user", "content": "What is prompt caching?"},
]

# Turn 1: Initial request (establishes the cache)
response = client.chat.completions.create(
    model="grok-4.3",
    messages=messages,
    extra_headers={"x-grok-conv-id": conversation_id},
)
print(f"Turn 1 — Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")

# Append the assistant's reply and the next user message
messages.append({"role": "assistant", "content": response.choices[0].message.content})
messages.append({"role": "user", "content": "Show me a code example."})

# Turn 2: Cache HIT — prefix is unchanged, only new messages appended
response = client.chat.completions.create(
    model="grok-4.3",
    messages=messages,
    extra_headers={"x-grok-conv-id": conversation_id},
)
print(f"Turn 2 — Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_XAI_API_KEY',
  baseURL: 'https://api.x.ai/v1',
});

const conversationId = 'conv_abc123';
const messages = [
  {
    role: 'system',
    content:
      'You are Grok, a helpful and truthful AI assistant built by xAI.',
  },
  { role: 'user', content: 'What is prompt caching?' },
];

// Turn 1: Initial request (establishes the cache)
const turn1 = await client.chat.completions.create(
  { model: 'grok-4.3', messages },
  { headers: { 'x-grok-conv-id': conversationId } },
);
console.log(
  `Turn 1 — Cached tokens: ${turn1.usage.prompt_tokens_details.cached_tokens}`,
);

// Append the assistant reply and next user message
messages.push({ role: 'assistant', content: turn1.choices[0].message.content });

messages.push({ role: 'user', content: 'Show me a code example.' });

// Turn 2: Cache HIT — prefix unchanged, new message appended
const turn2 = await client.chat.completions.create(
  { model: 'grok-4.3', messages },
  { headers: { 'x-grok-conv-id': conversationId } },
);
console.log(
  `Turn 2 — Cached tokens: ${turn2.usage.prompt_tokens_details.cached_tokens}`,
);

Cache miss — editing an earlier message

Changing the content of any earlier message breaks the prefix match:

# Cache MISS — editing the assistant message content
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "assistant", "content": "It stores KV pairs."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant response on line 11 was shortened to "It stores KV pairs." (line 12).

Cache miss — removing a message

Removing any message from the conversation breaks the prefix:

# Cache MISS — the assistant message was removed
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant message on line 11 was removed entirely.

Cache miss — reordering messages

Changing the order of messages also breaks the prefix:

# Cache MISS — user and system messages are swapped
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: Lines 9 and 10 were swapped — the user message now comes before the system message.

Usage & Pricing

Prompt Caching

What Breaks Caching

Any change to earlier messages breaks the cache. Only append new messages at the end.

[!WARNING]

Keep messages unchanged. For cache hits in multi-turn conversations, never edit, remove, or reorder earlier messages — only append new ones. For reasoning models, you must include reasoning_content from previous responses; omitting it is the top cause of cache misses.

For reasoning models, you can maintain cache hits by either:

Sending back the encrypted reasoning content — Include the reasoning_content from the previous response. See Encrypted Reasoning Content for details.
Using stateful responses — Use previous_response_id to automatically continue the conversation. See Chaining the Conversation for details.

Cache hit — appending a new message

The prompt prefix is identical to the previous request, with only a new user message appended:

# Turn 1: Initial request (establishes the cache)
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}
    ]
  }'

# Turn 2: Cache HIT — exact prefix preserved, new message appended
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_XAI_API_KEY",
    base_url="https://api.x.ai/v1",
)

conversation_id = "conv_abc123"
messages = [
    {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
    {"role": "user", "content": "What is prompt caching?"},
]

# Turn 1: Initial request (establishes the cache)
response = client.chat.completions.create(
    model="grok-4.3",
    messages=messages,
    extra_headers={"x-grok-conv-id": conversation_id},
)
print(f"Turn 1 — Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")

# Append the assistant's reply and the next user message
messages.append({"role": "assistant", "content": response.choices[0].message.content})
messages.append({"role": "user", "content": "Show me a code example."})

# Turn 2: Cache HIT — prefix is unchanged, only new messages appended
response = client.chat.completions.create(
    model="grok-4.3",
    messages=messages,
    extra_headers={"x-grok-conv-id": conversation_id},
)
print(f"Turn 2 — Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_XAI_API_KEY',
  baseURL: 'https://api.x.ai/v1',
});

const conversationId = 'conv_abc123';
const messages = [
  {
    role: 'system',
    content:
      'You are Grok, a helpful and truthful AI assistant built by xAI.',
  },
  { role: 'user', content: 'What is prompt caching?' },
];

// Turn 1: Initial request (establishes the cache)
const turn1 = await client.chat.completions.create(
  { model: 'grok-4.3', messages },
  { headers: { 'x-grok-conv-id': conversationId } },
);
console.log(
  `Turn 1 — Cached tokens: ${turn1.usage.prompt_tokens_details.cached_tokens}`,
);

// Append the assistant reply and next user message
messages.push({ role: 'assistant', content: turn1.choices[0].message.content });

messages.push({ role: 'user', content: 'Show me a code example.' });

// Turn 2: Cache HIT — prefix unchanged, new message appended
const turn2 = await client.chat.completions.create(
  { model: 'grok-4.3', messages },
  { headers: { 'x-grok-conv-id': conversationId } },
);
console.log(
  `Turn 2 — Cached tokens: ${turn2.usage.prompt_tokens_details.cached_tokens}`,
);

Cache miss — editing an earlier message

Changing the content of any earlier message breaks the prefix match:

# Cache MISS — editing the assistant message content
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "assistant", "content": "It stores KV pairs."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant response on line 11 was shortened to "It stores KV pairs." (line 12).

Cache miss — removing a message

Removing any message from the conversation breaks the prefix:

# Cache MISS — the assistant message was removed
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant message on line 11 was removed entirely.

Cache miss — reordering messages

Changing the order of messages also breaks the prefix:

# Cache MISS — user and system messages are swapped
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: Lines 9 and 10 were swapped — the user message now comes before the system message.

Usage & Pricing

pages/developers/advanced-api-usage/prompt-caching/multi-turn.md

Prompt Caching

What Breaks Caching

Cache hit — appending a new message

Cache miss — editing an earlier message

Cache miss — removing a message

Cache miss — reordering messages

Next

pages/developers/advanced-api-usage/prompt-caching/multi-turn.md

Prompt Caching

What Breaks Caching

Cache hit — appending a new message

Cache miss — editing an earlier message

Cache miss — removing a message

Cache miss — reordering messages

Next