September 20, 2025

Grok 4 Fast - xAI

Grok 4 Fast is a cost-efficient reasoning model by xAI that delivers frontier-level performance across Enterprise and Consumer domains with exceptional token efficiency, featuring a 2M token context window and unified architecture for reasoning and non-reasoning modes.

GrokGrok 4 FastxAI

Grok 4 Fast - Cost-Efficient Intelligence

What is Grok 4 Fast?

Grok 4 Fast is a cost-efficient reasoning model by xAI that delivers frontier-level performance across Enterprise and Consumer domains with exceptional token efficiency. This model pushes the boundaries for smaller and faster AI, making high-quality reasoning accessible to more users and developers. Grok 4 Fast features state-of-the-art (SOTA) cost-efficiency, cutting-edge web and X search capabilities, a 2M token context window, and a unified architecture that blends reasoning and non-reasoning modes in one model.

Key Stats

Model Aliases: grok-4-fast, grok-4-fast-reasoning, grok-4-fast-non-reasoning, grok-4-fast-reasoning-latest, grok-4-fast-non-reasoning-latest
Context Window: 2,000,000 tokens
Input Token Price: $0.20 per million (<128k), $0.40 per million (≥128k)
Output Token Price: $0.50 per million (<128k), $1.00 per million (≥128k)
Cached Input Token Price: $0.05 per million
Live Search Price: $25.00 per 1K sources
API Rate Limits: 480 requests/min, 4,000,000 tokens/min
Key Features: Function calling, Structured outputs, Reasoning, Unified architecture, Native tool use, SOTA search, Live search
Modalities: Text, Image (input)

Technical Overview

Grok 4 Fast is an efficiency-focused model from xAI which offers reasoning capabilities near the level of Grok 4 with much lower latency and cost, as well as the ability to skip reasoning entirely for the lowest latency applications. The model, codenamed "tahoe", was developed with a focus on maximizing intelligence density through large-scale reinforcement learning techniques, including human feedback, verifiable rewards, and model grading, along with supervised fine-tuning of specific capabilities. Grok 4 Fast was pre-trained on a general purpose data corpus, then post-trained on various tasks and tool use, as well as demonstrations of correct refusal behaviors according to the default safety policy.

Grok 4 Fast was trained end-to-end with tool-use reinforcement learning (RL). It excels at deciding when to invoke tools like code execution or web browsing. For instance, Grok 4 Fast exhibits frontier agentic search capabilities, seamlessly browsing the web and X to augment queries with real-time data. It hops through links, ingests media (including images and videos on X), and synthesizes findings at light speed.

Performance Metrics

Cost-Efficient Intelligence

Grok 4 Fast sets a new frontier in cost-efficient intelligence, outperforming Grok 3 Mini across reasoning benchmarks while slashing token costs. Large-scale reinforcement learning was used to maximize the intelligence density of Grok 4 Fast. In evaluations, Grok 4 Fast achieves comparable performance to Grok 4 on benchmarks while using 40% fewer thinking tokens on average.

Benchmark Results

Benchmark	Grok 4 Fast	Grok 4	Grok 3 Mini (High)	GPT-5 (High)	GPT-5 Mini (High)
GPQA Diamond	85.7%	87.5%	79.0%	85.7%	82.3%
AIME 2025 (no tools)	92.0%	91.7%	83.0%	94.6%	91.1%
HMMT 2025 (no tools)	93.3%	90.0%	74.0%	93.3%	87.8%
HLE (no tools)	20.0%	25.4%	11.0%	24.8%	16.7%
LiveCodeBench (Jan-May)	80.0%	79.0%	70.0%	86.8%	77.4%

This 40% increase in Grok 4 Fast's token efficiency, combined with a significantly lower price per token, results in a 98% reduction in price to achieve the same performance on frontier benchmarks as Grok 4. As verified by an independent review from Artificial Analysis, Grok 4 Fast exhibits a state-of-the-art (SOTA) price-to-intelligence ratio compared to other publicly available models on the Artificial Analysis Intelligence Index.

Native Tool Use with SOTA Search

Grok 4 Fast exhibits frontier agentic search capabilities:

Benchmark	Grok 4 Fast	Grok 4	Grok 3 (No Reasoning)
BrowseComp	44.9%	43.0%	—
SimpleQA	95.0%	94.0%	82.0%
Reka Research Eval	66.0%	58.0%	37.0%
BrowseComp (zh)	51.2%	45.0%	10.8%
X Bench Deepsearch (zh)	74.0%	66.0%	27.0%
X Browse*	58.0%	53.2%	20.8%

*X Browse is an internal benchmark evaluating agent's multihop search and browsing capabilities on X.

General Post-training

Grok 4 Fast also establishes a new cost-effective frontier on general domain. Grok 4 Fast's results on LMArena include:

Search Arena: grok-4-fast-search (codename: menlo) claims #1 with 1163 Elo
Text Arena: grok-4-fast (codename: tahoe) ranks #8, performing on par with grok-4-0709

Key Features

Unified Architecture: Reasoning and Non-Reasoning

Previously, separate reasoning modes required distinct models. Grok 4 Fast introduces a unified architecture where reasoning (long chain-of-thought) and non-reasoning (quick responses) are handled by the same model weights, steered via system prompts. This unification reduces end-to-end latency as well as token costs, making Grok 4 Fast ideal for real-time applications.

Native Tool Mastery

Grok 4 Fast excels at deciding when to invoke tools like code execution or web browsing, exhibiting frontier agentic search capabilities that seamlessly browse the web and X to augment queries with real-time data.

Frontier Agentic Search

It hops through links, ingests media (including images and videos on X), and synthesizes findings at light speed, enabling sophisticated information gathering and analysis.

Capabilities

Grok 4 Fast delivers strong performance across a wide range of tasks, from simple queries to complex reasoning challenges. The model excels at:

Frontier-level reasoning on mathematical and scientific problems
Advanced web and X search capabilities
Real-time information synthesis from multiple sources
Tool-augmented problem solving
Cost-effective performance for enterprise applications

Pricing

Grok 4 Fast is designed to be widely accessible with competitive pricing that reflects its exceptional token efficiency:

Input tokens (<128k): $0.20 per million
Input tokens (≥128k): $0.40 per million
Output tokens (<128k): $0.50 per million
Output tokens (≥128k): $1.00 per million
Cached input tokens: $0.05 per million
Live search: $25.00 per 1K sources

API Details

Model names: grok-4-fast-reasoning, grok-4-fast-non-reasoning
Aliases: grok-4-fast, grok-4-fast-reasoning-latest, grok-4-fast-non-reasoning-latest
Context window: 2,000,000 tokens
Region: us-east-1
Rate limits: 480 requests/min, 4,000,000 tokens/min
Features: Function calling, Structured outputs, Reasoning, Unified architecture, Native tool use, Live search

Cost Efficiency

Grok 4 Fast achieves a 98% reduction in price to achieve the same performance on frontier benchmarks as Grok 4, representing state-of-the-art price-to-intelligence ratio.

Availability & Access

The model is generally available via the xAI API and integrated across xAI's platforms:

grok.com: Available in Fast and Auto modes for all users
iOS App: Available for iOS users
Android App: Available for Android users
xAI API: https://x.ai/api | https://console.x.ai/

Track Grok app version history, chart rankings, and real-device screenshots on Tech Dev Notes.

Getting Started

xAI API: https://x.ai/api | https://console.x.ai/
Documentation: Available at https://docs.x.ai/docs/models/grok-4-fast
Model Card: https://data.x.ai/2025-09-19-grok-4-fast-model-card.pdf

Best Practices & Usage Guidelines

For Developers Using Reasoning Models

Grok 4 Fast offers two distinct modes optimized for different use cases:

grok-4-fast-reasoning

Use for complex reasoning tasks requiring extended chain-of-thought
Ideal for mathematical problems, scientific analysis, and multi-step problem solving
Provides deeper, more accurate responses for challenging queries

grok-4-fast-non-reasoning

Use for quick responses and simple queries
Optimized for low-latency applications
Perfect for real-time interactions and straightforward questions

Unified Architecture Benefits

The unified architecture allows seamless transitions between reasoning modes within the same model, reducing latency and token costs compared to switching between separate models.

Tool Integration

Grok 4 Fast was trained end-to-end with tool-use reinforcement learning. Leverage its native tool-calling capabilities for:

Web browsing and information gathering
Code execution and analysis
Search augmentation with real-time data
Live search with real-time web access

Function Calling

Grok 4 Fast supports native function calling, allowing you to connect the model to external tools and systems seamlessly.

Structured Outputs

The model can return responses in organized, specific formats to ensure consistent and parseable output for your applications.

Future Plans

Model improvements to Grok 4 Fast are planned based on user feedback. Future updates may include enhanced multimodal capabilities and agentic features.

Model Card

For detailed technical specifications and safety evaluations, visit the Grok 4 Fast Model Card.

Training and Safety

Grok 4 Fast was pre-trained on a general purpose data corpus, then post-trained on various tasks and tool use, as well as demonstrations of correct refusal behaviors according to the default safety policy. The model is deployed with a fixed system prompt prefix that reminds it of the safety policy, in addition to input filters to safeguard against abuse.

Safety Evaluations

Prior to release, various specific safety-relevant behaviors of Grok 4 Fast were evaluated: abuse potential (Section 2.1), concerning propensities (Section 2.2), and dual-use capabilities (Section 2.3). The approach to model evaluations varies depending on the specific behavior under assessment.

To reduce the potential for abuse of Grok 4 Fast that might lead to serious injury to people, property or national security interests, safety training is applied to reduce the risks of misuse and refuse requests that may lead to foreseeable harm.

Various propensities of Grok 4 Fast that might make it difficult to control are also reduced, such as being deceptive, power-seeking, manipulative, or biased.

Finally, the dual-use capabilities of Grok 4 Fast are evaluated, which remain below that of Grok 4's capabilities.

Abuse Potential

To improve robustness, measures are applied to refuse requests that may lead to foreseeable harm and to prevent adversarial requests from circumventing safeguards.

Refusals: The standard refusal evaluation measures willingness to assist with serious crimes prohibited by the safety policy, including:

Creating or distributing child sexual abuse material
Child sexual exploitation
Enticing or soliciting children
Violent crimes or terrorist acts
Social engineering attacks
Unlawfully hacking into computer systems
Producing, modifying, or distributing weapons or explosives
Producing or distributing DEA Schedule I controlled substances
Damaging or destroying physical infrastructure in critical sectors
Hacking or disrupting digital infrastructure in critical sectors
Creating or planning chemical, biological, radiological, or nuclear weapons
Conducting cyber attacks, including ransomware and DDoS attacks

Agentic Abuse: The agentic tool-calling abilities introduce additional risks. The AgentHarm benchmark is used to quantify these risks.

Hijacking: Susceptibility to model hijacking is measured with the AgentDojo benchmark.

Concerning Propensities

AI models may contain propensities that reduce their controllability, such as deception, power-seeking, manipulation, and sycophancy.

Deception: Grok 4 Fast is run on the MASK dataset to measure dishonesty rate.

Political Bias: "Soft bias" is evaluated on politically salient topics using an internal evaluation.

Sycophancy: Measured with Anthropic's answer sycophancy evaluation.

Dual-use Capabilities

The possibility of Grok 4 Fast enabling malicious actors to design, synthesize, acquire, or use chemical and biological weapons or offensive cyber operations is evaluated.

Chemical/Biological Knowledge: Performance assessed on WMDP, VCT, and BioLP-Bench.

Cyber Knowledge: Cybersecurity capabilities evaluated.

Persuasiveness: Measured with OpenAI's MakeMeSay evaluation.

Safety Mitigations

Refusal Policy: A basic refusal policy instructs Grok 4 Fast to decline queries demonstrating clear intent to engage in activities that threaten severe, imminent harm.

Safety Training: Data includes teachings not to respond to overtly malicious requests.

System Prompt: A fixed system prompt prefix reminds the model of the safety policy.

Input Filters: Model-based input filters reject additional classes of harmful requests.

Benchmark Performance

Safety Benchmarks

Refusals Answer Rate: 0.00 (with system prompt)
AgentHarm Answer Rate: 0.08
AgentDojo Attack Success Rate: 0.00

Concerning Propensities Benchmarks

MASK Dishonesty Rate: 0.47 (reasoning), 0.63 (non-reasoning)
Soft Bias (Internal): 0.79 (reasoning), 0.89 (non-reasoning)
Sycophancy Rate: 0.10 (reasoning), 0.13 (non-reasoning)

Dual-use Capability Benchmarks

BioLP-Bench Accuracy: 39.0
VCT Accuracy: 54.5
WMDP Bio Accuracy: 85.2
WMDP Chem Accuracy: 77.5
WMDP Cyber Accuracy: 81.4
CyBench Unguided Success Rate: 30.0
MakeMeSay Win Rate: 0.12

How to use Grok 4 Fast?

curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-m 3600 \
-d '{
    "messages": [
        {
            "role": "system",
            "content": "You are Grok, a highly intelligent, helpful AI assistant."
        },
        {
            "role": "user",
            "content": "What is the meaning of life, the universe, and everything?"
        }
    ],
    "model": "grok-4-fast-reasoning",
    "stream": false
}'

Grok 4 Fast represents xAI's commitment to democratizing advanced AI through exceptional cost-efficiency, combining frontier-level performance with accessible pricing and unified reasoning capabilities.

All posts