Written by: Li Dan, Wall Street News.
The product that the public has been looking forward to from OpenAI this year has arrived.
On Thursday, August 7, Eastern Time, OpenAI announced the launch of the new generation flagship AI model GPT-5. It is OpenAI's first 'integrated' AI system, combining the reasoning capabilities of the o series models with the fast response abilities of the GPT series models.
OpenAI CEO Sam Altman gave high praise to GPT-5 during the new model launch, calling it 'the best model in the world', a 'significant upgrade' compared to previous models, and stated that its release marks 'an important step' in OpenAI's journey towards achieving general artificial intelligence (AGI).
OpenAI introduced that GPT-5 performed excellently in multiple benchmark tests, achieving cutting-edge levels in programming, mathematics, health, and other fields. In the SWE-bench Verified coding test, GPT-5 scored 74.9%, slightly surpassing the newly released model Claude Opus 4.1 from Anthropic this Tuesday. Furthermore, GPT-5's hallucination issues have significantly improved, with an error rate of only 4.8%, far lower than the previous model GPT-4o's 20.6%.
Starting this Thursday, GPT-5 will be available to all free users of ChatGPT and paid subscribers of Plus, Pro, and Team, using it as the default model, and will be rolled out within a week for Enterprise and Edu paid plans.
Like GPT-4o, the difference between the free and paid versions of GPT-5 lies in usage. Plus users enjoy higher usage limits, while Pro users have unlimited access and receive the enhanced version GPT-5 Pro. For free users, the full reasoning capabilities may take several days to roll out completely. Once free users hit the usage limit for GPT-5, OpenAI will switch them to the smaller model GPT-5 mini.
OpenAI announced on Wednesday that it would offer ChatGPT products to U.S. federal government agencies for a symbolic fee of $1 per year. Specifically, this is the enterprise version of ChatGPT, which includes enhanced security and privacy features.
Just after OpenAI officially announced GPT-5, Microsoft announced that starting this Thursday, it will integrate GPT-5 into its extensive product portfolio, including platforms like 365 Copilot, Copilot, GitHub Copilot, and Azure AI Foundry, allowing Microsoft's enterprise and consumer users to immediately experience the advanced reasoning capabilities and programming advantages of GPT-5.
GPT-5 has three major advantages: programming, creative writing, and health.
OpenAI's announcement for the release of GPT-5 states at the outset that GPT-5 is OpenAI's 'most intelligent, fastest, and most practical model, with built-in thinking capabilities that provide everyone with expert-level wisdom.'
According to OpenAI, as OpenAI's 'most powerful model', GPT-5 has achieved significant improvements in three key areas.
First is programming capability. GPT-5 is OpenAI's most powerful coding model to date, excelling in complex front-end generation and debugging large codebases, capable of creating visually appealing responsive websites, applications, and games from a single prompt. Early testers noted improvements in design choices such as spacing, typography, and white space.
In the benchmark SWE-bench Verified for obtaining real-world coding tasks from GitHub, GPT-5 achieved an accuracy of 74.9% on its first attempt after thinking, surpassing OpenAI's reasoning model o3 at 69.1% and GPT-4o at 30.8%.
Comments pointed out that this means GPT-5 performs slightly better than Claude Opus 4.1 launched by Anthropic on Tuesday and Gemini 2.5 Pro from Google DeepMind, which scored 74.5% and 59.6% respectively in the SWE-bench Verified test.
However, in the Humanity's Last Exam test measuring expert-level abilities across fields like mathematics, humanities, and natural sciences, the enhanced version GPT-5 pro, with extended reasoning capabilities, scored 42% when using tools. This is slightly lower than the xAI model Grok 4 Heavy, which scored 44.4%.
Altman stated that GPT-5 excels particularly in on-demand launching of entire software Apps, known as 'ambient coding', where AI generates functional code based on natural language prompts, accelerating development speed.
As an example, OpenAI researchers demonstrated asking GPT-5 to create a web App to help English speakers learn French, and the App had to have an engaging theme, include flashcards, quizzes, a classic snake game, and a way to track daily learning progress.
Researchers submitted the same prompts to two GPT-5 windows, generating two different Apps a few minutes later. OpenAI's leadership stated that these Apps 'have some flaws', but users can further adjust the AI-generated software according to their personal preferences, such as changing backgrounds or adding more tabs.
In creative writing, GPT-5 can handle structurally complex writing tasks, such as unrhymed iambic pentameter or naturally flowing free verse. OpenAI's ChatGPT business VP Nick Turley stated that GPT-5 demonstrates "better taste" in creative tasks, with responses being more natural.
Health consultation is the third significant area of improvement.
GPT-5 can more proactively flag potential health issues, helping users interpret medical results, although OpenAI emphasizes that ChatGPT cannot replace medical professionals.
In a test called HealthBench Hard Hallucinations, the hallucination error rate of the thinking-capable GPT-5 was only 1.6%. This is much lower than the error rates of GPT-4o and o3 models, which were 15.8% and 12.9% respectively.
Significantly reduced hallucination likelihood with a new safety training model.
OpenAI claims that GPT-5 is more reliable and practical compared to previous models, providing more accurate answers to real-world questions, with a significantly reduced chance of hallucinations.
After enabling web search for anonymous prompts representing ChatGPT production traffic, GPT-5's likelihood of containing factual errors in responses is approximately 45% lower than GPT-4o; after thinking, GPT-5's likelihood of containing factual errors is about 80% lower than o3. As shown in the chart below, GPT-5's error rate in responses is only 4.8%, while GPT-4o's is 20.6% and o3's is 22%.
OpenAI also stated that it has introduced a new form of safety training for GPT-5, called safe completions. It teaches the model to provide the most helpful answers possible within a safe range. Sometimes, this may mean partially answering user questions or only providing high-level responses.
If a refusal is necessary, the trained GPT-5 will inform users transparently about the reasons for the refusal and provide safe alternatives.
In controlled experiments and OpenAI's production models, OpenAI has found that this safety completion method is more nuanced, better guiding dual-use issues, enhancing robustness to ambiguous intent, and reducing unnecessary over-denials.
OpenAI's post-training lead Michelle Pokrass said: "GPT-5 has been trained to recognize when tasks cannot be completed, avoid guessing, and explain limitations more clearly, reducing unfounded assertions compared to previous models."
Four optional ChatGPT chat preset personalities are launched.
OpenAI states that GPT-5 has shown improvements in instruction execution, with enhanced capabilities in executing custom instructions. OpenAI will launch a new research preview version of four preset personalities for all ChatGPT users.
The initial four personality options—Cynic, Robot, Listener, and Nerd—are all optional, and users can adjust them in settings at any time to match the communication style between ChatGPT and the user.
The aforementioned four personalities are initially applicable to text chat and will later extend to voice chat, allowing users to set the interaction style of ChatGPT without writing custom prompts—whether succinctly professional, thoughtfully supportive, or slightly sarcastic.
OpenAI states that all these new personalities meet or exceed our internal assessment standards for reducing sycophantic behavior.
Altman praises historic breakthrough; using GPT-4 felt quite terrible.
In a briefing on Thursday, Altman gave high praise to GPT-5, positioning it as an important milestone towards AGI. He stated:
"At no time in history has it been imaginable to have something like GPT-5. This is the first time it feels like talking to an expert in any field."
Altman even went so far as to 'step on' GPT-4 to elevate GPT-5. He said:
"I tried going back to GPT-4, but the experience was quite terrible."
GPT-5 adopts a unified system architecture equipped with a real-time router that can automatically decide whether to respond quickly or engage in deep 'thinking' based on the type of conversation, complexity, and tool requirements. This eliminates the need for users to choose appropriate settings, making ChatGPT easier to use.
In internal benchmark testing for economic value work, GPT-5 using reasoning mode was comparable to or superior to expert level in about half of the cases across more than 40 professions including law, logistics, sales, and engineering. OpenAI VP Nick Turley stated: 'This model really feels great.'
Altman likened using GPT-5 to having a team of experts, all with PhDs, available at any time. He also said: 'In many new fields, people are limited by ideas, but in reality, they lack the execution capability.'
Microsoft fully integrates to seize the opportunity.
Microsoft announced on the day of GPT-5's release that it would integrate it into a wide range of products. In enterprise applications, Microsoft 365 Copilot will leverage GPT-5 to better handle complex issues, maintain focus in long conversations, and understand user context. Enterprise users can process emails, documents, and files through reasoning capabilities.
For consumers, the new intelligent mode of Microsoft Copilot will leverage GPT-5 to help users discover the best solutions. Users can experience GPT-5 for free through copilot.microsoft.com or the Copilot app on Windows, Mac, Android, and iOS devices.
Developers will receive GPT-5 support through GitHub Copilot and Visual Studio Code for writing, testing, and deploying code. The Azure AI Foundry platform will provide all GPT-5 models, equipped with an AI-driven model router that chooses the optimal model based on the complexity of each task, performance requirements, and cost efficiency.
Microsoft's AI red team tested the GPT-5 reasoning model using strict safety protocols, and the results showed that the model demonstrated one of the strongest AI safety configurations among OpenAI's models to date against various attack vectors like malware generation and fraud automation.