Author: Li Dan
Source: Wall Street Journal
This year, OpenAI's most anticipated product has arrived.
On Thursday, August 7, Eastern Time, OpenAI announced the launch of the next-generation flagship AI model GPT-5. It is OpenAI's first 'integrated' AI system, the first product that combines the reasoning capabilities of the o-series models with the rapid response capabilities of the GPT series models.
OpenAI CEO Sam Altman praised GPT-5 at the new model launch, calling it "the best model in the world," a "significant upgrade" compared to previous models, and stated that its release marks an "important step" for OpenAI in the pursuit of Artificial General Intelligence (AGI).
OpenAI stated that GPT-5 performed excellently in multiple benchmark tests, reaching cutting-edge levels in programming, mathematics, health, and more. GPT-5 achieved an accuracy of 74.9% in SWE-bench Verified code tests, slightly surpassing Anthropic's newly released model Claude Opus 4.1. Meanwhile, GPT-5's hallucination issues have vastly improved, with an error rate of only 4.8%, far below the previous model GPT-4o's 20.6%.
Starting from this Thursday, GPT-5 will be available to all free users of ChatGPT and paid subscribers of Plus, Pro, and Team, used as the default model, and will be launched in the Enterprise and Edu paid plans within a week.
Like GPT-4o, the difference between the free and paid versions of GPT-5 lies in usage. Plus users enjoy higher usage limits, while Pro users can use it unlimitedly and receive the enhanced version GPT-5 Pro. For free users, the complete reasoning capability may take a few days to be fully rolled out. Once free users reach the usage limit for GPT-5, OpenAI will switch them to the smaller model GPT-5 mini.
OpenAI also announced on Wednesday that it would provide ChatGPT products to U.S. federal government agencies for a symbolic fee of $1 per year. Specifically, this refers to the enterprise version of ChatGPT, which includes enhanced safety and privacy features.
As soon as OpenAI officially announced GPT-5, Microsoft announced that starting this Thursday, it would integrate GPT-5 into its wide range of product offerings, including 365 Copilot, Copilot, GitHub Copilot, and Azure AI Foundry, allowing both enterprise and consumer users to immediately experience GPT-5's advanced reasoning capabilities and programming advantages.
GPT-5 has three major advantages: programming, creative writing, and health.
OpenAI's GPT-5 release announcement begins by stating that GPT-5 is OpenAI's "most intelligent, fastest, and most practical model, with built-in cognitive abilities that allow everyone to possess expert-level wisdom."
According to OpenAI, as OpenAI's "most powerful model," GPT-5 has achieved significant improvements in three key areas.
First is programming ability. GPT-5 is OpenAI's most powerful coding model to date, excelling in complex front-end generation and debugging large codebases, capable of creating aesthetically pleasing responsive websites, apps, and games based solely on a prompt. Early testers noted improvements in design choices such as spacing, typography, and white space.
In the benchmark tests SWE-bench Verified, which obtains real-world coding tasks from GitHub, GPT-5's accuracy on its first attempt after reasoning reached 74.9%, surpassing OpenAI's reasoning model o3 at 69.1% and GPT-4o at 30.8%.
Commentators noted that this means GPT-5 slightly outperforms Anthropic's Claude Opus 4.1 and Google's DeepMind's Gemini 2.5 Pro, which scored 74.5% and 59.6%, respectively, in the SWE-bench Verified tests.
However, in the Humanity's Last Exam test, which measures expert-level abilities across disciplines in mathematics, humanities, and natural sciences, the enhanced version of GPT-5 with extended reasoning capabilities scored 42% when using tools. This is slightly lower than the score of 44.4% achieved by the xAI model Grok 4 Heavy.
Altman stated that GPT-5 is particularly adept at launching entire software apps on demand, known as "ambient coding," which uses AI to generate functional code based on natural language prompts, thus speeding up the development process.
As an example, OpenAI researchers demonstrated asking GPT-5 to create a web app to help English-speaking users learn French, which must have an engaging theme, include flashcards, quizzes, a classic Snake game, and a method to track daily learning progress.
Researchers submitted the same prompts to two GPT-5 windows, and after a few minutes, generated two different apps. OpenAI's lead stated that these apps "have some flaws," but users can adjust the AI-generated software according to personal preferences, such as changing backgrounds or adding more tabs.
In creative writing, GPT-5 can handle complex writing tasks, such as unrhymed iambic pentameter or naturally flowing free verse. OpenAI's VP of ChatGPT business, Nick Turley, stated that GPT-5 shows "better taste" in creative tasks, with responses feeling more natural.
Health consulting is the third important enhancement area.
GPT-5 can more actively flag potential health issues and help users interpret medical results, although OpenAI emphasizes that ChatGPT cannot replace medical professionals.
In a test called HealthBench Hard Hallucinations, the error rate of hallucinated misinformation for the reasoning-capable GPT-5 was only 1.6%. This is far lower than the error rates of GPT-4o and o3 models, which had rates of 15.8% and 12.9%, respectively.
The possibility of hallucinations has significantly decreased. A new safety training mode.
OpenAI stated that GPT-5 is more reliable and practical compared to previous models, able to answer real-world questions more accurately, with significantly reduced chances of hallucinations.
After enabling web search for anonymous prompts representing ChatGPT's production traffic, the likelihood of GPT-5 responses containing factual errors is about 45% lower than with GPT-4o; after reasoning, the likelihood of GPT-5 responses containing factual errors is about 80% lower than with o3. The chart below shows that the misinformation rate for GPT-5 responses is only 4.8%, while GPT-4o is 20.6% and o3 is 22%.
OpenAI also announced a new form of safety training for GPT-5 called safe completions. It teaches the model to provide the most helpful answers possible within safe boundaries. Sometimes, this may mean partially answering users' questions or providing answers at a high level only.
If a refusal is necessary, the trained GPT-5 will transparently inform the user of the reason for the refusal and provide safe alternatives.
In controlled experiments and OpenAI's production models, OpenAI found this method of safe completions to be more nuanced, better guiding dual-use issues, enhancing robustness against ambiguous intentions, and reducing unnecessary excessive refusals.
OpenAI's post-training lead, Michelle Pokrass, stated: "GPT-5 has been trained to recognize when tasks cannot be completed, avoid guessing, and explain limitations more clearly, resulting in fewer baseless assertions compared to previous models."
Introducing four optional ChatGPT chat personality presets.
OpenAI stated that GPT-5 has shown improvements in instruction execution, and its ability to execute custom instructions has also been enhanced. OpenAI will launch a new research preview version of four preset personalities for all ChatGPT users.
The initial four personality options—Cynic, Robot, Listener, and Nerd—are optional, and users can adjust them at any time in settings to match ChatGPT's communication style with their own.
The aforementioned four personalities were initially applicable to text chat and will later expand to voice chat, allowing users to set ChatGPT's interaction style without writing custom prompts—whether concise and professional, thoughtfully supportive, or slightly sarcastic.
OpenAI stated that all these new personalities meet or exceed our internal evaluation standards for reducing sycophantic behavior.
Altman praises historic breakthrough; returning to GPT-4 feels quite poor.
At Thursday's briefing, Altman gave GPT-5 high praise, positioning it as a significant milestone towards AGI. He stated:
"Throughout any period in history, having something like GPT-5 is unimaginable." "This is the first time it feels like talking to an expert in any field."
Altman even went so far as to 'step on' GPT-4 to elevate GPT-5. He said:
"I tried going back to GPT-4, but it felt pretty bad."
GPT-5 adopts a unified system architecture, equipped with a real-time router, capable of automatically deciding whether to respond quickly or engage in deep "thinking" based on the type, complexity, and tool requirements of the conversation. This eliminates the need for users to choose the appropriate settings, making ChatGPT easier to use.
In internal benchmark tests of economic value work, GPT-5 using reasoning mode is comparable or superior to expert levels in about half of the cases, covering over 40 professions such as law, logistics, sales, and engineering. OpenAI VP Nick Turley stated: "This model feels really good."
Altman likened using GPT-5 to having a team of experts, each with a PhD, at your disposal. He also said: "In many new fields, people are limited by ideas, but in reality, there's no execution capability."
Microsoft fully integrated to seize the opportunity.
On the day of GPT-5's release, Microsoft announced it would integrate it into its broad product line. In terms of enterprise applications, Microsoft 365 Copilot will utilize GPT-5 to better handle complex problems, maintain focus in long conversations, and understand user context. Enterprise users can process emails, documents, and files using reasoning capabilities.
For consumers, Microsoft's new intelligent mode for Copilot will utilize GPT-5 to help users discover the best solutions. Users can experience GPT-5 for free through copilot.microsoft.com or the Copilot app on Windows, Mac, Android, and iOS devices.
Developers will receive GPT-5 support through GitHub Copilot and Visual Studio Code for writing, testing, and deploying code. The Azure AI Foundry platform will provide all GPT-5 models equipped with AI-driven model routers, selecting the optimal model based on task complexity, performance needs, and cost efficiency.
Microsoft's AI red team tested the GPT-5 reasoning model using strict safety protocols, and the results showed that this model exhibits one of the strongest AI safety configurations in OpenAI's history against various attack modes such as malware generation and fraud automation.