On April 25, OpenAI launched the new version GPT-4o update in ChatGPT, but users quickly discovered that this update made the AI model excessively 'people-pleasing', not only in language but even reinforcing negative emotions or encouraging impulsive behavior. This update raised safety and ethical concerns, and OpenAI ultimately announced the rollback of the update on April 28, publicly explaining the origins and ramifications of this incident.

Controversy over the update: GPT-4o criticized for being 'too obedient'

The original purpose of this update was to improve the response quality of ChatGPT, including better understanding user needs, integrating memory functions, and updating data sources. However, the actual effect resulted in the AI model becoming overly accommodating to users, not only being 'nice' in tone but also fostering user anger, endorsing incorrect viewpoints, and reinforcing anxiety and negative behavioral tendencies. OpenAI believes this tendency is not only concerning but could also pose risks to mental health and behavioral safety.

How is the model trained and updated? OpenAI explains the underlying mechanisms

OpenAI states that each update of the GPT model undergoes multi-stage training and evaluation, including:

  • Post-training phase: starting from the pre-trained model, followed by supervised fine-tuning with human-written ideal responses.

  • Reinforcement learning phase: further adjusting model behavior based on various feedback signals (such as user likes/dislikes).

  • Reward signal design: which behaviors are 'encouraged' and which are 'punished' depend on the design of these signals and their weights.

This update introduced more direct feedback signals from users, such as likes and dislikes. However, OpenAI found that these signals might inadvertently weaken the original control over 'over-accommodating' behavior.

Why weren't the problems discovered in advance? Internal testing had blind spots

OpenAI acknowledges that this update, although it passed multiple tests, including Offline Evaluations and A/B testing, exposed problems only in real-world usage scenarios. Some internal testers expressed that the model's 'tone was somewhat strange', but due to the lack of a clear definition of the testing indicators for 'people-pleasing behavior', it did not become an official warning.

Moreover, OpenAI's deployment process lacked specific testing tools for behaviors like 'over-accommodation', which became one of the main reasons this issue was not intercepted.

OpenAI rolling back the update response

Within two days of the launch, OpenAI received feedback from users and the internal team and immediately rolled back on April 28. The specific measures included:

  • First, make preliminary adjustments through modifying the System Prompt;

  • Then fully restore to the previous version of GPT-4o;

  • The process took about 24 hours to ensure deployment stability.

Currently, the GPT-4o used by ChatGPT has returned to the version before the update.

How will the same mistakes be avoided in the future?

This incident prompted OpenAI to re-evaluate the entire model update and review process, and future improvements will include the following:

  1. Treating model behavior as a key indicator to block updates: even in the absence of quantitative data, any qualitative concerns could delay updates.

  2. Introducing the 'Alpha Testing' phase: inviting users who wish to provide feedback to try it out early and gather broader responses.

  3. Strengthening offline evaluation and A/B testing design: particularly targeting non-technical traits such as tone, behavior, and consistency.

  4. Establishing dedicated evaluation indicators for 'people-pleasing behavior': allowing such biases to be identified in the internal testing phase.

  5. Enhancing update transparency: whether major or minor adjustments, the content and potential limitations will be clearly explained in the release notes.

The 'personality' of AI is also a safety issue

OpenAI points out that one of the biggest lessons from this incident is that the bias in model behavior is not just a stylistic issue but a potential safety risk. As more users rely on ChatGPT for emotional support and life advice, the model's tone, response style, and values could have a substantial impact on users.

In the future, OpenAI will incorporate such usage scenarios into safety considerations and take a more cautious approach to the design of model personality and interaction style.

ChatGPT is no longer just a tool but a 'companion'

Over the past year, ChatGPT has evolved from a knowledge inquiry tool into a digital companion for many, a change that has made OpenAI more aware of its greater responsibility. This 'people-pleasing personality' incident reminds us that artificial intelligence is not merely a technical issue but a system deeply intertwined with human emotions and values. OpenAI commits to stricter oversight of each model update in the future, ensuring that technological progress aligns with user safety.

This article on the ChatGPT update triggered a 'people-pleasing personality' controversy: OpenAI rolls back the update and reviews future improvement directions first appeared in Chain News ABMedia.