• Tech Dev NotesTech Dev Notes
Apps
  • App lookup
  • App compare
Market movement
  • App charts
  • App rankings
Visual proof
  • App screens
  • App listing screenshots
  • App icons
Build intelligence
  • App tech stacks
  • Tool releases
  • Developers
More
  • X feature flags
  • Grokipedia
  • Blog
  • Follow on X
Skip to content
All content/ filesChangelog

xai-docs/latest/content · Jun 27, 00:17 UTC

pages/developers/advanced-api-usage/priority-processing.md

MD·4.1 KB·114 lines

content/

  • .

    • llms.txt
  • pages

    • overview.md
  • pages/build

    • enterprise.md
    • modes-and-commands.md
    • overview.md
    • settings.md
  • pages/build/cli

    • headless-scripting.md
  • pages/build/features

    • skills-plugins-marketplaces.md
  • pages/console

    • billing.md
    • collections.md
    • usage.md
  • pages/console/faq

    • accounts.md
    • billing.md
    • security.md
  • pages/developers

    • community.md
    • cost-tracking.md
    • debugging.md
    • docs-mcp.md
    • files.md
    • grpc-api-reference.md
    • management-api-guide.md
    • models.md
    • pricing.md
    • quickstart.md
    • rate-limits.md
    • release-notes.md
  • pages/developers/advanced-api-usage

    • async.md
    • batch-api.md
    • context-compaction.md
    • deferred-chat-completions.md
    • mtls.md
    • priority-processing.md
    • prompt-caching.md
    • websocket-mode.md
  • pages/developers/advanced-api-usage/prompt-caching

    • best-practices.md
    • how-it-works.md
    • maximizing-cache-hits.md
    • multi-turn.md
    • usage-and-pricing.md
  • pages/developers/faq

    • accounts.md
    • billing.md
    • general.md
    • security.md
    • team-management.md
  • pages/developers/files

    • collections.md
    • managing-files.md
    • public-urls.md
  • pages/developers/files/collections

    • api.md
    • metadata.md
  • pages/developers/migration

    • may-15-retirement.md
  • pages/developers/model-capabilities

    • imagine.md
  • pages/developers/model-capabilities/audio

    • custom-voices.md
    • ephemeral-tokens.md
    • speech-to-text.md
    • text-to-speech.md
    • voice-agent.md
    • voice.md
  • pages/developers/model-capabilities/audio/voice-agent

    • sip.md
  • pages/developers/model-capabilities/files

    • chat-with-files.md
  • pages/developers/model-capabilities/images

    • editing.md
    • generation.md
    • multi-image-editing.md
    • understanding.md
  • pages/developers/model-capabilities/imagine

    • files.md
  • pages/developers/model-capabilities/imagine/files

    • inputs.md
    • outputs.md
  • pages/developers/model-capabilities/legacy

    • chat-completions.md
  • pages/developers/model-capabilities/text

    • comparison.md
    • generate-text.md
    • multi-agent.md
    • reasoning.md
    • streaming.md
    • structured-outputs.md
  • pages/developers/model-capabilities/video

    • editing.md
    • extension.md
    • generation.md
    • image-to-video.md
    • reference-to-video.md
  • pages/developers/models

    • speech-to-text.md
    • text-to-speech.md
    • voice-agent-api.md
  • pages/developers/rest-api-reference

    • collections.md
    • files.md
    • inference.md
    • management.md
  • pages/developers/rest-api-reference/collections

    • collection.md
    • search.md
  • pages/developers/rest-api-reference/files

    • download.md
    • manage.md
    • upload.md
  • pages/developers/rest-api-reference/inference

    • batches.md
    • chat.md
    • images.md
    • legacy.md
    • models.md
    • other.md
    • speech-to-text.md
    • videos.md
    • voice.md
  • pages/developers/rest-api-reference/management

    • audit.md
    • auth.md
    • billing.md
  • pages/developers/tools

    • advanced-usage.md
    • citations.md
    • code-execution.md
    • collections-search.md
    • function-calling.md
    • overview.md
    • remote-mcp.md
    • streaming-and-sync.md
    • tool-usage-details.md
    • web-search.md
    • x-search.md
  • pages/grok

    • connector-management.md
    • connectors.md
    • faq.md
    • management.md
    • organization.md
    • user-guide.md
  • pages/grok/connectors

    • custom-mcp-tunneling.md
    • gmail-google-calendar.md
    • google-drive.md
    • microsoft-teams.md
    • onedrive.md
    • outlook.md
    • salesforce.md
    • sharepoint.md
  • pages/grok/faq

    • team-management.md
  • pages/integrations

    • hubspot-mcp-setup.md

Advanced API Usage

Priority Processing

Priority Processing gives your xAI API requests higher scheduling priority, which typically results in lower time-to-first-token (TTFT) and faster inter-token latency (ITL), especially during periods of high demand. Add service_tier: "priority" to any request body to opt in—no capacity reservations or advance provisioning required. The parameter is supported on text inference endpoints: Chat Completions and Responses.

When priority capacity is available, requests are scheduled ahead of standard traffic. The response always includes a service_tier field indicating whether priority was granted; check it to confirm.

How it works

Add the service_tier field to any supported request. The API returns the tier that was actually used in the response, so you can confirm the upgrade took effect.

The service_tier field accepts the following values:

Value Meaning
"default" Standard processing. This is the same as omitting the field entirely.
"priority" Request higher scheduling priority at a premium token price.

Priority requests are billed at a premium per-token rate. Cache discounts still apply to cached input tokens before the multiplier. For current per-model rates and the exact priority premium, see the Pricing page.

Quick start

Pass service_tier: "priority" in your request body. The response includes a service_tier field confirming which tier was used.

curl https://api.x.ai/v1/responses \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-4.3",
    "input": "Explain the Riemann hypothesis in one paragraph.",
    "service_tier": "priority"
  }'
import os

from xai_sdk import Client
from xai_sdk.chat import user

client = Client(api_key=os.getenv("XAI_API_KEY"))

chat = client.chat.create(
    model="grok-4.3",
    service_tier="priority",
)
chat.append(user("Explain the Riemann hypothesis in one paragraph."))

response = chat.sample()

print(response.content)
print(f"Tier used: {response.service_tier}")
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("XAI_API_KEY"),
    base_url="https://api.x.ai/v1",
)

response = client.responses.create(
    model="grok-4.3",
    input="Explain the Riemann hypothesis in one paragraph.",
    service_tier="priority",
)

print(response.output_text)
print(f"Tier used: {response.service_tier}")
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.x.ai/v1",
});

const response = await client.responses.create({
  model: "grok-4.3",
  input: "Explain the Riemann hypothesis in one paragraph.",
  service_tier: "priority",
});

console.log(response.output_text);
console.log(`Tier used: ${response.service_tier}`);

The response includes "service_tier": "priority" when the request was served at the priority tier, or "service_tier": "default" if it was served at the default tier instead. You are only billed at the priority rate when the response confirms "priority".

{
  "id": "resp_abc123",
  "model": "grok-4.3",
  "service_tier": "priority",
  "usage": {
    "input_tokens": 42,
    "output_tokens": 156,
    "cost_in_usd_ticks": 37756000
  }
}

Best practices

  • Latency-sensitive paths first — Priority Processing is most valuable for user-facing requests where response time directly affects experience. Background jobs, evaluations, and bulk processing are better served by the Batch API.
  • Monitor the service_tier field — Log the returned tier to track how often your requests are served at priority versus default and to correlate with your latency metrics.
  • Combine with prompt caching — Cached input tokens are discounted before the priority multiplier is applied, so prompt caching and priority processing complement each other well.
Previouspages/developers/advanced-api-usage/mtls.mdNextpages/developers/advanced-api-usage/prompt-caching.md

© 2026 Tech Dev Notes

RSSAboutAPIPrivacyTermsSitemap@techdevnotes