What is Grok 4 Fast?
Grok 4 Fast is a cost-efficient reasoning model by xAI that delivers frontier-level performance across Enterprise and Consumer domains with exceptional token efficiency. This model pushes the boundaries for smaller and faster AI, making high-quality reasoning accessible to more users and developers. Grok 4 Fast features state-of-the-art (SOTA) cost-efficiency, cutting-edge web and X search capabilities, a 2M token context window, and a unified architecture that blends reasoning and non-reasoning modes in one model.
Key Stats
-
Model Aliases: grok-4-fast, grok-4-fast-reasoning, grok-4-fast-non-reasoning, grok-4-fast-reasoning-latest, grok-4-fast-non-reasoning-latest
-
Context Window: 2,000,000 tokens
-
Input Token Price: $0.20 per million (<128k), $0.40 per million (ā„128k)
-
Output Token Price: $0.50 per million (<128k), $1.00 per million (ā„128k)
-
Cached Input Token Price: $0.05 per million
-
Live Search Price: $25.00 per 1K sources
-
API Rate Limits: 480 requests/min, 4,000,000 tokens/min
-
Key Features: Function calling, Structured outputs, Reasoning, Unified architecture, Native tool use, SOTA search, Live search
-
Modalities: Text, Image (input)
Technical Overview
Grok 4 Fast is an efficiency-focused model from xAI which offers reasoning capabilities near the level of Grok 4 with much lower latency and cost, as well as the ability to skip reasoning entirely for the lowest latency applications. The model, codenamed "tahoe", was developed with a focus on maximizing intelligence density through large-scale reinforcement learning techniques, including human feedback, verifiable rewards, and model grading, along with supervised fine-tuning of specific capabilities. Grok 4 Fast was pre-trained on a general purpose data corpus, then post-trained on various tasks and tool use, as well as demonstrations of correct refusal behaviors according to the default safety policy.
Grok 4 Fast was trained end-to-end with tool-use reinforcement learning (RL). It excels at deciding when to invoke tools like code execution or web browsing. For instance, Grok 4 Fast exhibits frontier agentic search capabilities, seamlessly browsing the web and X to augment queries with real-time data. It hops through links, ingests media (including images and videos on X), and synthesizes findings at light speed.
Performance Metrics
Cost-Efficient Intelligence
Grok 4 Fast sets a new frontier in cost-efficient intelligence, outperforming Grok 3 Mini across reasoning benchmarks while slashing token costs. Large-scale reinforcement learning was used to maximize the intelligence density of Grok 4 Fast. In evaluations, Grok 4 Fast achieves comparable performance to Grok 4 on benchmarks while using 40% fewer thinking tokens on average.
Benchmark Results
Benchmark | Grok 4 Fast | Grok 4 | Grok 3 Mini (High) | GPT-5 (High) | GPT-5 Mini (High) |
---|---|---|---|---|---|
GPQA Diamond | 85.7% | 87.5% | 79.0% | 85.7% | 82.3% |
AIME 2025 (no tools) | 92.0% | 91.7% | 83.0% | 94.6% | 91.1% |
HMMT 2025 (no tools) | 93.3% | 90.0% | 74.0% | 93.3% | 87.8% |
HLE (no tools) | 20.0% | 25.4% | 11.0% | 24.8% | 16.7% |
LiveCodeBench (Jan-May) | 80.0% | 79.0% | 70.0% | 86.8% | 77.4% |
This 40% increase in Grok 4 Fast's token efficiency, combined with a significantly lower price per token, results in a 98% reduction in price to achieve the same performance on frontier benchmarks as Grok 4. As verified by an independent review from Artificial Analysis, Grok 4 Fast exhibits a state-of-the-art (SOTA) price-to-intelligence ratio compared to other publicly available models on the Artificial Analysis Intelligence Index.
Native Tool Use with SOTA Search
Grok 4 Fast exhibits frontier agentic search capabilities:
Benchmark | Grok 4 Fast | Grok 4 | Grok 3 (No Reasoning) |
---|---|---|---|
BrowseComp | 44.9% | 43.0% | ā |
SimpleQA | 95.0% | 94.0% | 82.0% |
Reka Research Eval | 66.0% | 58.0% | 37.0% |
BrowseComp (zh) | 51.2% | 45.0% | 10.8% |
X Bench Deepsearch (zh) | 74.0% | 66.0% | 27.0% |
X Browse* | 58.0% | 53.2% | 20.8% |
*X Browse is an internal benchmark evaluating agent's multihop search and browsing capabilities on X.
General Post-training
Grok 4 Fast also establishes a new cost-effective frontier on general domain. Grok 4 Fast's results on LMArena include:
- Search Arena: grok-4-fast-search (codename: menlo) claims #1 with 1163 Elo
- Text Arena: grok-4-fast (codename: tahoe) ranks #8, performing on par with grok-4-0709
Key Features
Unified Architecture: Reasoning and Non-Reasoning
Previously, separate reasoning modes required distinct models. Grok 4 Fast introduces a unified architecture where reasoning (long chain-of-thought) and non-reasoning (quick responses) are handled by the same model weights, steered via system prompts. This unification reduces end-to-end latency as well as token costs, making Grok 4 Fast ideal for real-time applications.
Native Tool Mastery
Grok 4 Fast excels at deciding when to invoke tools like code execution or web browsing, exhibiting frontier agentic search capabilities that seamlessly browse the web and X to augment queries with real-time data.
Frontier Agentic Search
It hops through links, ingests media (including images and videos on X), and synthesizes findings at light speed, enabling sophisticated information gathering and analysis.
Capabilities
Grok 4 Fast delivers strong performance across a wide range of tasks, from simple queries to complex reasoning challenges. The model excels at:
- Frontier-level reasoning on mathematical and scientific problems
- Advanced web and X search capabilities
- Real-time information synthesis from multiple sources
- Tool-augmented problem solving
- Cost-effective performance for enterprise applications
Pricing
Grok 4 Fast is designed to be widely accessible with competitive pricing that reflects its exceptional token efficiency:
- Input tokens (<128k): $0.20 per million
- Input tokens (ā„128k): $0.40 per million
- Output tokens (<128k): $0.50 per million
- Output tokens (ā„128k): $1.00 per million
- Cached input tokens: $0.05 per million
- Live search: $25.00 per 1K sources
API Details
- Model names: grok-4-fast-reasoning, grok-4-fast-non-reasoning
- Aliases: grok-4-fast, grok-4-fast-reasoning-latest, grok-4-fast-non-reasoning-latest
- Context window: 2,000,000 tokens
- Region: us-east-1
- Rate limits: 480 requests/min, 4,000,000 tokens/min
- Features: Function calling, Structured outputs, Reasoning, Unified architecture, Native tool use, Live search
Cost Efficiency
Grok 4 Fast achieves a 98% reduction in price to achieve the same performance on frontier benchmarks as Grok 4, representing state-of-the-art price-to-intelligence ratio.
Availability & Access
The model is generally available via the xAI API and integrated across xAI's platforms:
- grok.com: Available in Fast and Auto modes for all users
- iOS App: Available for iOS users
- Android App: Available for Android users
- xAI API: https://x.ai/api | https://console.x.ai/
Getting Started
- xAI API: https://x.ai/api | https://console.x.ai/
- Documentation: Available at https://docs.x.ai/docs/models/grok-4-fast
- Model Card: https://data.x.ai/2025-09-19-grok-4-fast-model-card.pdf
Best Practices & Usage Guidelines
For Developers Using Reasoning Models
Grok 4 Fast offers two distinct modes optimized for different use cases:
grok-4-fast-reasoning
- Use for complex reasoning tasks requiring extended chain-of-thought
- Ideal for mathematical problems, scientific analysis, and multi-step problem solving
- Provides deeper, more accurate responses for challenging queries
grok-4-fast-non-reasoning
- Use for quick responses and simple queries
- Optimized for low-latency applications
- Perfect for real-time interactions and straightforward questions
Unified Architecture Benefits
The unified architecture allows seamless transitions between reasoning modes within the same model, reducing latency and token costs compared to switching between separate models.
Tool Integration
Grok 4 Fast was trained end-to-end with tool-use reinforcement learning. Leverage its native tool-calling capabilities for:
- Web browsing and information gathering
- Code execution and analysis
- Search augmentation with real-time data
- Live search with real-time web access
Function Calling
Grok 4 Fast supports native function calling, allowing you to connect the model to external tools and systems seamlessly.
Structured Outputs
The model can return responses in organized, specific formats to ensure consistent and parseable output for your applications.
Future Plans
Model improvements to Grok 4 Fast are planned based on user feedback. Future updates may include enhanced multimodal capabilities and agentic features.
Model Card
For detailed technical specifications and safety evaluations, visit the Grok 4 Fast Model Card.
Training and Safety
Grok 4 Fast was pre-trained on a general purpose data corpus, then post-trained on various tasks and tool use, as well as demonstrations of correct refusal behaviors according to the default safety policy. The model is deployed with a fixed system prompt prefix that reminds it of the safety policy, in addition to input filters to safeguard against abuse.
Safety Evaluations
Prior to release, various specific safety-relevant behaviors of Grok 4 Fast were evaluated: abuse potential (Section 2.1), concerning propensities (Section 2.2), and dual-use capabilities (Section 2.3). The approach to model evaluations varies depending on the specific behavior under assessment.
To reduce the potential for abuse of Grok 4 Fast that might lead to serious injury to people, property or national security interests, safety training is applied to reduce the risks of misuse and refuse requests that may lead to foreseeable harm.
Various propensities of Grok 4 Fast that might make it difficult to control are also reduced, such as being deceptive, power-seeking, manipulative, or biased.
Finally, the dual-use capabilities of Grok 4 Fast are evaluated, which remain below that of Grok 4's capabilities.
Abuse Potential
To improve robustness, measures are applied to refuse requests that may lead to foreseeable harm and to prevent adversarial requests from circumventing safeguards.
Refusals: The standard refusal evaluation measures willingness to assist with serious crimes prohibited by the safety policy, including:
- Creating or distributing child sexual abuse material
- Child sexual exploitation
- Enticing or soliciting children
- Violent crimes or terrorist acts
- Social engineering attacks
- Unlawfully hacking into computer systems
- Producing, modifying, or distributing weapons or explosives
- Producing or distributing DEA Schedule I controlled substances
- Damaging or destroying physical infrastructure in critical sectors
- Hacking or disrupting digital infrastructure in critical sectors
- Creating or planning chemical, biological, radiological, or nuclear weapons
- Conducting cyber attacks, including ransomware and DDoS attacks
Agentic Abuse: The agentic tool-calling abilities introduce additional risks. The AgentHarm benchmark is used to quantify these risks.
Hijacking: Susceptibility to model hijacking is measured with the AgentDojo benchmark.
Concerning Propensities
AI models may contain propensities that reduce their controllability, such as deception, power-seeking, manipulation, and sycophancy.
Deception: Grok 4 Fast is run on the MASK dataset to measure dishonesty rate.
Political Bias: "Soft bias" is evaluated on politically salient topics using an internal evaluation.
Sycophancy: Measured with Anthropic's answer sycophancy evaluation.
Dual-use Capabilities
The possibility of Grok 4 Fast enabling malicious actors to design, synthesize, acquire, or use chemical and biological weapons or offensive cyber operations is evaluated.
Chemical/Biological Knowledge: Performance assessed on WMDP, VCT, and BioLP-Bench.
Cyber Knowledge: Cybersecurity capabilities evaluated.
Persuasiveness: Measured with OpenAI's MakeMeSay evaluation.
Safety Mitigations
Refusal Policy: A basic refusal policy instructs Grok 4 Fast to decline queries demonstrating clear intent to engage in activities that threaten severe, imminent harm.
Safety Training: Data includes teachings not to respond to overtly malicious requests.
System Prompt: A fixed system prompt prefix reminds the model of the safety policy.
Input Filters: Model-based input filters reject additional classes of harmful requests.
Benchmark Performance
Safety Benchmarks
- Refusals Answer Rate: 0.00 (with system prompt)
- AgentHarm Answer Rate: 0.08
- AgentDojo Attack Success Rate: 0.00
Concerning Propensities Benchmarks
- MASK Dishonesty Rate: 0.47 (reasoning), 0.63 (non-reasoning)
- Soft Bias (Internal): 0.79 (reasoning), 0.89 (non-reasoning)
- Sycophancy Rate: 0.10 (reasoning), 0.13 (non-reasoning)
Dual-use Capability Benchmarks
- BioLP-Bench Accuracy: 39.0
- VCT Accuracy: 54.5
- WMDP Bio Accuracy: 85.2
- WMDP Chem Accuracy: 77.5
- WMDP Cyber Accuracy: 81.4
- CyBench Unguided Success Rate: 30.0
- MakeMeSay Win Rate: 0.12
How to use Grok 4 Fast?
curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-m 3600 \
-d '{
"messages": [
{
"role": "system",
"content": "You are Grok, a highly intelligent, helpful AI assistant."
},
{
"role": "user",
"content": "What is the meaning of life, the universe, and everything?"
}
],
"model": "grok-4-fast-reasoning",
"stream": false
}'
Grok 4 Fast represents xAI's commitment to democratizing advanced AI through exceptional cost-efficiency, combining frontier-level performance with accessible pricing and unified reasoning capabilities.