Developers – API Performance
- New page: Priority Processing — explains how to request higher scheduling priority for Chat Completions and Responses endpoints to reduce time-to-first-token and inter-token latency during high demand. Includes quick start examples (cURL, Python, JavaScript) and best practices.
Developers – Pricing
- Added Priority Processing Pricing section: requests using
service_tier: "priority"are billed at 2× standard rates (applies to all token types). Standard rates apply if the request falls back to the default tier. Prompt caching discounts are applied before the multiplier. - Clarified that Priority Processing is available only for Chat Completions and Responses; not supported for image generation, video generation, or Batch API.
Developers – Batch API
- Added cross-reference to the new Priority Processing page for users seeking lower latency on real-time requests instead of batch processing.