The future of AI lies in autonomous web navigation agents. Major tech companies are investing in the development of browser agents aimed at automating web tasks and enhancing productivity. This article explores their application scenarios, current challenges, and the opportunities presented by Web3-native solutions. This article is sourced from a piece written by Mario Chow, Figo, and @IOSG, organized, compiled, and written by BlockBeats. (Background: OpenAI's Sam Altman: I'm interested in acquiring Google Chrome! The battle for the largest market share among browsers.) (Background information: Perplexity offers $34.5 billion to acquire the Chrome browser, AI search engines battling giants.) In the past 12 months, the relationship between web browsers and automation has undergone a dramatic change. Almost all major tech companies are racing to build autonomous browser agents. Starting from the end of 2024, this trend has become increasingly apparent: OpenAI launched the Agent mode in January, Anthropic introduced the 'Computer Use' feature for the Claude model, Google DeepMind launched Project Mariner, Opera announced the proxy browser Neon, and Perplexity AI launched the Comet browser. The signal is very clear: the future of AI lies in agents that can autonomously navigate web pages. This trend is not just about adding smarter chatbots to browsers but represents a fundamental shift in how machines interact with digital environments. Browser agents are AI systems capable of 'seeing' web pages and taking action: clicking links, filling out forms, scrolling pages, and entering text—just like human users. This model promises to unleash immense productivity and economic value, as it can automate tasks that currently still require manual operation or are too complex for traditional scripts to accomplish. ▲ GIF demonstration: Actual operation of AI browser agents: following instructions, navigating to target dataset pages, automatically taking screenshots, and extracting needed information. Who will win the AI browser war? Almost all major tech companies (and some startups) are developing their own browser AI agent solutions. Here are some of the most representative projects: OpenAI – Agent Mode OpenAI's Agent Mode (formerly known as Operator, launching in January 2025) is an AI agent with its own browser. The Operator can handle various repetitive online tasks: such as filling out web forms, ordering groceries, and scheduling meetings—all accomplished through standard web interfaces commonly used by humans. ▲ AI agents schedule meetings like professional assistants: checking calendars, finding available time slots, creating events, sending confirmations, and generating .ics files for you. Anthropic – Claude's 'Computer Use' At the end of 2024, Anthropic introduced a new 'Computer Use' feature for Claude 3.5, giving it the ability to operate computers and browsers like a human. Claude can see the screen, move the cursor, click buttons, and enter text. This is the first large model agent tool of its kind to enter public beta testing, allowing developers to have Claude automatically navigate websites and applications. Anthropic positions it as an experimental feature, primarily aimed at achieving automation of multi-step workflows on the web. Perplexity – Comet AI Startup Perplexity (known for its Q&A engine) launched the Comet browser in mid-2025 as an AI-driven alternative to Chrome. The core of Comet is a conversational AI search engine built into the omnibox that provides real-time Q&A and summaries instead of traditional search links. Additionally, Comet includes Comet Assistant, a sidebar agent that can automatically execute daily tasks across websites. For example, it can summarize your opened emails, schedule meetings, manage browser tabs, or browse and scrape web information on your behalf. By allowing the agent to perceive the current webpage content through the sidebar interface, Comet aims to seamlessly integrate browsing with AI assistance. Real Application Scenarios of Browser Agents In the previous text, we reviewed how major tech companies (OpenAI, Anthropic, Perplexity, etc.) inject functionalities into browser agents through various product forms. To better understand their value, we can further explore how these capabilities are applied in real life and business workflows. Daily Web Automation # E-commerce and Personal Shopping A very practical scenario is delegating shopping and booking tasks to agents. Agents can automatically fill your online shopping cart and place orders based on a fixed list, or search for the lowest prices among multiple retailers and complete the checkout process on your behalf. For travel, you can let the AI perform tasks like: "Help me book a flight to Tokyo next month (ticket price under $800) and a hotel with free Wi-Fi." The agent will handle the entire process: searching for flights, comparing options, filling in passenger information, and completing hotel bookings, all through the airline and hotel websites. This level of automation far surpasses existing travel bots: it is not just about recommendations but directly executing purchases. # Enhancing Office Efficiency Agents can automate many repetitive business operations that people perform in browsers. For example, organizing emails and extracting to-do items, or checking openings across multiple calendars and automatically scheduling meetings. Perplexity's Comet Assistant can already summarize the content of your inbox through the web interface or add new appointments for you. Agents can also log in to SaaS tools to generate regular reports, update spreadsheets, or submit forms once they have your authorization. Imagine an HR agent that can automatically log in to various recruitment websites to post job openings; or a sales agent that can update CRM system lead information. These daily mundane tasks would consume a significant amount of employee time, but AI can complete them through automating web forms and page operations. Besides single tasks, agents can also link complete workflows across multiple web systems. All these steps require operations across different web interfaces, which is precisely the strength of browser agents. Agents can log into various dashboards for troubleshooting or orchestrate processes, such as completing onboarding for new employees (creating accounts on multiple SaaS websites). Essentially, any multi-step operation currently requiring navigating multiple websites can be delegated to agents. Current Challenges and Limitations Despite the immense potential, today's browser agents still have a long way to go to perfection. Current implementations reveal some long-standing technical and infrastructure issues: Mismatched Architecture Modern...