cursor-docs/latest/content · Jun 26, 20:20 UTC

pages/agent/tools/browser.txt

TXT10.1 KB100 lines

Browser
Agent can control a web browser to test applications, visually edit layouts and styles, audit accessibility, convert designs into code, and more. With full access to console logs and network traffic, Agent can debug issues and automate comprehensive testing workflows.
For enterprise customers, browser controls are governed by MCP allowlist or denylist.
Native integration
Agent displays browser actions like screenshots and actions in the chat, as well as the browser window itself either in a separate window or an inline pane.
We've optimized the browser tools to be more efficient and reduce token usage, as well as:
Efficient log handling: Browser logs are written to files that Agent can grep and selectively read. Instead of summarizing verbose output after every action, Agent reads only the relevant lines it needs. This preserves full context while minimizing token usage.
Visual feedback with images: Screenshots are integrated directly with the file reading tool, so Agent actually sees the browser state as images rather than relying on text descriptions. This enables better understanding of visual layouts and UI elements.
Smart prompting: Agent receives additional context about browser logs, including total line counts and preview snippets, helping it make informed decisions about what to inspect.
Development server awareness: Agent is prompted to detect running development servers and use the correct ports instead of starting duplicate servers or guessing port numbers.
You can use Browser without installing or configuring any external tools.
Browser capabilities
Agent has access to the following browser tools:
Navigate
Visit URLs and browse web pages. Agent can navigate anywhere on the web by visiting URLs, following links, going back and forward in history, and refreshing pages.
Click
Interact with buttons, links, and form elements. Agent can identify and interact with page elements, performing click, double-click, right-click, and hover actions on any visible element.
Type
Enter text into input fields and forms. Agent can fill out forms, submit data, and interact with form fields, search boxes, and text areas.
Scroll
Navigate through long pages and content. Agent can scroll to reveal additional content, find specific elements, and explore lengthy documents.
Screenshot
Capture visual representations of web pages. Screenshots help Agent understand page layout, verify visual elements, and provide you with confirmation of browser actions.
Console Output
Read browser console messages, errors, and logs. Agent can monitor JavaScript errors, debugging output, and network warnings to troubleshoot issues and verify page behavior.
Network Traffic
Monitor HTTP requests and responses made by the page. Agent can track API calls, analyze request payloads, check response status codes, and diagnose network-related issues. This is currently only available in the Agent panel, coming soon to the layout.
Design sidebar
The browser includes a design sidebar for modifying your site directly in Cursor. Design and code simultaneously with real-time visual adjustments.
Browser design sidebar showing layout controls, positioning, and CSS properties for a selected element.
Visual editing capabilities
The sidebar provides powerful visual editing controls:
Position and layout: Move and rearrange elements on the page. Change flex direction, alignment, and grid layouts.
Dimensions: Adjust width, height, padding, and margins with precise pixel values.
Colors: Update colors from your design system or add new gradients. Access color tokens through a visual picker.
Appearance: Experiment with shadows, opacity, and border radius using visual sliders.
Theme testing: Test your designs across light and dark themes instantly.
Applying changes
When your visual adjustments match your vision, click the apply button to trigger an agent that updates your codebase. The agent translates your visual changes into the appropriate code modifications.
You can also select multiple elements across your site and describe changes in text. Agents kick off in parallel, and your changes appear live on the page after hot-reload.
Session persistence
Browser state persists between Agent sessions based on your workspace. This means:
Cookies: Authentication cookies and session data remain available across browser sessions
Local Storage: Data stored in localStorage and sessionStorage persists
IndexedDB: Database content is retained between sessions
The browser context is isolated per workspace, ensuring that different projects maintain separate storage and cookie states.
Use cases
Web development workflow
Browser integrates into web development workflows alongside tools like Figma and Linear. See the Web Development cookbook for a complete guide on using Browser with design systems, project management tools, and component libraries.
Accessibility improvements
Agent can audit and improve web accessibility to meet WCAG compliance standards.
Automated testing
Agent can execute comprehensive test suites and capture screenshots for visual regression testing.
Design to code
Agent can convert designs into working code with responsive layouts.
Adjusting UI design from screenshots
Agent can refine existing interfaces by identifying visual discrepancies and updating component styles.
Security
Browser runs as a secure web view and is controlled using an MCP server running as an extension. Multiple layers protect you from unauthorized access and malicious actions.
Cursor's Browser integrations have also been reviewed by multiple external security auditors.
Authentication and isolation
The browser implements several security measures:
Token authentication: Agent layout generates a random authentication token before each browser session starts
Tab isolation: Each browser tab receives a unique random ID to prevent cross-tab interference
Session-based security: Tokens regenerate for each new browser session
Tool approval
Browser tools require your approval by default. Review each action before Agent executes it. This prevents unexpected navigation, data submission, or script execution.
You can configure approval settings in Agent Settings. Available modes:
Mode
Description
Manual approval
Review and approve each browser action individually (recommended)
Allow-listed actions
Actions matching your allow list run automatically; others require approval
Auto-run
All browser actions execute immediately without approval (use with caution)
Allow and block lists
Browser tools integrate with Cursor's security guardrails. Configure which browser actions run automatically:
Allow list: Specify trusted actions that skip approval prompts
Block list: Define actions that should always be blocked
Access settings through: Cursor Settings > Agents > Auto-Run
The allow/block list system provides best-effort protection. AI behavior can be unpredictable due to prompt injection and other issues. Review auto-approved actions regularly.
Never use auto-run mode with untrusted code or unfamiliar websites. Agent could execute malicious scripts or submit sensitive data without your knowledge.
Browser context
The browser opens as a pane within Cursor, giving Agent full control through MCP tools.
Recommended models
We recommend using Sonnet 4.5, GPT-5, and Auto for the best performance.
Enterprise usage
For enterprise customers, browser functionality is managed through toggling availability under MCP controls. Admins have granular controls over each MCP server, as well as over browser access.
Enabling browser for enterprise
To enable browser capabilities for your enterprise team:
Navigate to your Settings Dashboard
Go to MCP Configuration
Toggle "browser features"
Once configured, users in your organization will have access to browser tools based
…

route: /docs/agent/tools/browser title: Browser description: Agent can control a web browser to test applications, visually edit layouts and styles, audit accessibility, convert designs into code, and more. Browser Agent can control a web browser to test applications, visually edit layouts and styles, audit accessibility, convert designs into code, and more. With full access to console logs and network traffic, Agent can debug issues and automate comprehensive testing workflows. For enterprise customers, browser controls are governed by MCP allowlist or denylist. Native integration Agent displays browser actions like screenshots and actions in the chat, as well as the browser window itself either in a separate window or an inline pane. We've optimized the browser tools to be more efficient and reduce token usage, as well as: Efficient log handling: Browser logs are written to files that Agent can grep and selectively read. Instead of summarizing verbose output after every action, Agent reads only the relevant lines it needs. This preserves full context while minimizing token usage. Visual feedback with images: Screenshots are integrated directly with the file reading tool, so Agent actually sees the browser state as images rather than relying on text descriptions. This enables better understanding of visual layouts and UI elements. Smart prompting: Agent receives additional context about browser logs, including total line counts and preview snippets, helping it make informed decisions about what to inspect. Development server awareness: Agent is prompted to detect running development servers and use the correct ports instead of starting duplicate servers or guessing port numbers. You can use Browser without installing or configuring any external tools. Browser capabilities Agent has access to the following browser tools: Navigate Visit URLs and browse web pages. Agent can navigate anywhere on the web by visiting URLs, following links, going back and forward in history, and refreshing pages. Click Interact with buttons, links, and form elements. Agent can identify and interact with page elements, performing click, double-click, right-click, and hover actions on any visible element. Type Enter text into input fields and forms. Agent can fill out forms, submit data, and interact with form fields, search boxes, and text areas. Scroll Navigate through long pages and content. Agent can scroll to reveal additional content, find specific elements, and explore lengthy documents. Screenshot Capture visual representations of web pages. Screenshots help Agent understand page layout, verify visual elements, and provide you with confirmation of browser actions. Console Output Read browser console messages, errors, and logs. Agent can monitor JavaScript errors, debugging output, and network warnings to troubleshoot issues and verify page behavior. Network Traffic Monitor HTTP requests and responses made by the page. Agent can track API calls, analyze request payloads, check response status codes, and diagnose network-related issues. This is currently only available in the Agent panel, coming soon to the layout. Design sidebar The browser includes a design sidebar for modifying your site directly in Cursor. Design and code simultaneously with real-time visual adjustments. Browser design sidebar showing layout controls, positioning, and CSS properties for a selected element. Visual editing capabilities The sidebar provides powerful visual editing controls: Position and layout: Move and rearrange elements on the page. Change flex direction, alignment, and grid layouts. Dimensions: Adjust width, height, padding, and margins with precise pixel values. Colors: Update colors from your design system or add new gradients. Access color tokens through a visual picker. Appearance: Experiment with shadows, opacity, and border radius using visual sliders. Theme testing: Test your designs across light and dark themes instantly. Applying changes When your visual adjustments match your vision, click the apply button to trigger an agent that updates your codebase. The agent translates your visual changes into the appropriate code modifications. You can also select multiple elements across your site and describe changes in text. Agents kick off in parallel, and your changes appear live on the page after hot-reload. Session persistence Browser state persists between Agent sessions based on your workspace. This means: Cookies: Authentication cookies and session data remain available across browser sessions Local Storage: Data stored in localStorage and sessionStorage persists IndexedDB: Database content is retained between sessions The browser context is isolated per workspace, ensuring that different projects maintain separate storage and cookie states. Use cases Web development workflow Browser integrates into web development workflows alongside tools like Figma and Linear. See the Web Development cookbook for a complete guide on using Browser with design systems, project management tools, and component libraries. Accessibility improvements Agent can audit and improve web accessibility to meet WCAG compliance standards. Automated testing Agent can execute comprehensive test suites and capture screenshots for visual regression testing. Design to code Agent can convert designs into working code with responsive layouts. Adjusting UI design from screenshots Agent can refine existing interfaces by identifying visual discrepancies and updating component styles. Security Browser runs as a secure web view and is controlled using an MCP server running as an extension. Multiple layers protect you from unauthorized access and malicious actions. Cursor's Browser integrations have also been reviewed by multiple external security auditors. Authentication and isolation The browser implements several security measures: Token authentication: Agent layout generates a random authentication token before each browser session starts Tab isolation: Each browser tab receives a unique random ID to prevent cross-tab interference Session-based security: Tokens regenerate for each new browser session Tool approval Browser tools require your approval by default. Review each action before Agent executes it. This prevents unexpected navigation, data submission, or script execution. You can configure approval settings in Agent Settings. Available modes: Mode Description Manual approval Review and approve each browser action individually (recommended) Allow-listed actions Actions matching your allow list run automatically; others require approval Auto-run All browser actions execute immediately without approval (use with caution) Allow and block lists Browser tools integrate with Cursor's security guardrails. Configure which browser actions run automatically: Allow list: Specify trusted actions that skip approval prompts Block list: Define actions that should always be blocked Access settings through: Cursor Settings > Agents > Auto-Run The allow/block list system provides best-effort protection. AI behavior can be unpredictable due to prompt injection and other issues. Review auto-approved actions regularly. Never use auto-run mode with untrusted code or unfamiliar websites. Agent could execute malicious scripts or submit sensitive data without your knowledge. Browser context The browser opens as a pane within Cursor, giving Agent full control through MCP tools. Recommended models We recommend using Sonnet 4.5, GPT-5, and Auto for the best performance. Enterprise usage For enterprise customers, browser functionality is managed through toggling availability under MCP controls. Admins have granular controls over each MCP server, as well as over browser access. Enabling browser for enterprise To enable browser capabilities for your enterprise team: Navigate to your Settings Dashboard Go to MCP Configuration Toggle "browser features" Once configured, users in your organization will have access to browser tools based …