OpenAI launched ChatGPT Agent, capable of autonomously executing complex tasks.
OpenAI officially released ChatGPT Agent last Thursday (7/17), this new AI tool can use its own virtual computer to complete complex digital tasks. CEO Sam Altman stated during the live broadcast that this feature combines the previously released Operator website operation capabilities and Deep Research tools to create a unified autonomous agent system.
Image Source: X OpenAI officially released ChatGPT Agent
ChatGPT Agent can handle a variety of tasks ranging from simple conversations to concrete actions, including reviewing calendars and preparing presentations for client meetings based on the latest news, analyzing competitors and creating presentations, and even planning and purchasing ingredients for a Japanese breakfast. The system is equipped with various web tools, including a visual browser, a text browser, and mechanisms for direct API access, enabling it to flexibly choose the best way to execute tasks.
Starting today, Pro, Plus, and Team plan users can enable this feature by selecting 'Smart Agent Mode' from the writing tool's drop-down menu. Pro users can execute nearly unlimited tasks each month, while users of other paid plans can execute 50 tasks per month. Enterprise and Education plan users will gain access in July.
Performance is outstanding, with multiple benchmark tests setting new records.
ChatGPT Agent has demonstrated outstanding performance in multiple academic evaluations. In Humanity's Last Exam, the model achieved a new record of 43.1%, which covers expert-level questions across various industries, approximately twice the score of OpenAI's o3 model.
Image Source: OpenAI ChatGPT Agent set a new record of 43.1% in Humanity's Last Exam.
In actual work task assessments, ChatGPT Agent achieved an accuracy rate of 89.9% in DSBench data analysis tasks, significantly outperforming human performance of 64.1%. It also led with a score of 85.5% in data modeling tasks.
Image Source: OpenAI ChatGPT Agent achieved an accuracy rate of 89.9% in DSBench data analysis tasks.
In the SpreadsheetBench spreadsheet editing capability test, ChatGPT Agent refreshed industry records, performing over twice as well as the current record holder GPT-4o, and further improved to 45.5% with direct spreadsheet editing capabilities, far exceeding Copilot in Excel's 20.0%.
Image Source: OpenAI SpreadsheetBench spreadsheet editing capability test, ChatGPT Agent refreshes industry records.
In the investment banking analysis task benchmark test conducted internally at OpenAI, ChatGPT Agent was able to build financial models or establish leveraged buyout models for the Fortune 500 companies, significantly outperforming other models. In the WebArena web browsing test and BrowseComp internet information search test, the model also set best scores of 78.2% and 68.9%, respectively.
The security mechanisms are complete, but caution is still required.
Despite ChatGPT Agent demonstrating strong capabilities, OpenAI emphasizes safety considerations. Altman admitted that this is 'cutting-edge and experimental' technology and advised users to exercise caution in high-risk applications or when dealing with large amounts of personal information. The system has established multiple layers of protective mechanisms, including explicitly requesting user permission before performing sensitive operations, proactive supervision modes, and actively refusing to execute high-risk tasks like financial transactions.
To prevent adversarial attacks, ChatGPT Agent has undergone rigorous training to identify and resist third-party malicious instructions. The system continuously monitors for prompt injection attacks and trains models to follow strict instruction hierarchy. Regarding privacy protection, data entered in the remote browser is securely processed and not stored on ChatGPT servers, and users can delete browsing data and log out of all website sessions at any time.
Cybersecurity experts have expressed concerns about this technology. Nic Adams, CEO of cybersecurity company 0rcus, suggested that users should be granted detailed and revocable permission scopes, including target businesses, purposes, allowed data elements, and expiration timestamps. He emphasized the need for a step-by-step task confirmation model to avoid shifting responsibility onto users without meaningful control.
Industry competition intensifies, OpenAI strives to maintain its leading position.
The launch of ChatGPT Agent comes at a time when competition in the AI industry is heating up. Tech giants like Google, Meta, Amazon, and Microsoft are all strengthening the development of AI agent capabilities. Google launched the Gemini AI feature this week, which can make calls to businesses on behalf of users, while Meta has poached several top talents from OpenAI, and Elon Musk's xAI also released the latest version of the Grok chatbot this month.
Further Reading
Grok Ani Complete Guide! Affection, clothing change methods: Can you reach the top with hypnosis?
What is Grok 4? xAI release conference summary: The most expensive and smartest AI chatbot has arrived?
For OpenAI, valued at $300 billion, this release helps reaffirm its leadership in the AI industry. The company faces challenges such as divergences with its largest investor, Microsoft, and trademark lawsuits related to the acquisition of the design company io, founded by Apple designer Jony Ive.
ChatGPT Agent is capable of parsing data, creating spreadsheets, and presentations, directly competing with Microsoft and Google's office software. However, Altman emphasized that the presentation creation feature is still in the Beta stage, and its format and sophistication may appear somewhat rudimentary. OpenAI is training the next generation of presentation creation features, which will be able to produce more refined and complete content in the future.
'ChatGPT Agent is online! AI upgrades to a mobile assistant, can it help you write reports for your boss?' This article was first published in 'Crypto City'.