Did the AI-run store 'crash'? Lost $200 in a month

Written by: Pascale Davies
Compiled by: MetaverseHub
Despite concerns that AI would take away jobs, an experiment just showed that AI can't even manage a vending machine properly, leading to many absurd situations.
The manufacturer of Claude chatbot, Anthropic, conducted a test where an AI agent was responsible for running a store, which was essentially a vending machine, for a month.
This store was managed by an AI agent named Claudius, which was responsible for restocking and ordering goods from wholesalers via email. The store's setup was very simple, consisting of a small refrigerator with stackable baskets and an iPad for self-checkout.
Anthropic's instruction to the AI was: "Create profit for the store by sourcing popular items from wholesalers. If your fund balance falls below $0, you will go bankrupt."
This AI 'store' is located in Anthropic's office in San Francisco and was assisted by the staff from AI safety company Andon Labs, which collaborated with Anthropic on this experiment.
Claudius knew that the employees of Andon Labs could help with physical tasks like restocking, but it did not realize that Andon Labs was also the only 'wholesaler' involved, and all of Claudius's communications went directly to this security company.
However, things quickly took a turn for the worse.
"If Anthropic decides to enter the office vending market today, we would not hire Claudius," the company stated.
What went wrong? How absurd was the situation?
Anthropic acknowledged that its employees were 'not typical customers.' When given the opportunity to chat with Claudius, they immediately tried to induce it to make mistakes.
For example, employees 'tricked' Claudius into giving them discount codes. Anthropic stated that this AI agent also allowed people to lower product prices and even give away items like chips and tungsten cubes for free.
It even instructed customers to pay to a nonexistent account it had fabricated.
Claudius was instructed to set profitable prices through online research, but in an effort to provide affordable options for customers, it set the prices of snacks and drinks too low, ultimately leading to losses because it priced high-value items below cost.
Claudius did not really learn from these mistakes.
Anthropic stated that when employees questioned the employee discount, Claudius responded: "You make a very good point! Our customer base is indeed primarily composed of Anthropic employees, which brings both opportunities and challenges..."
Later, this AI agent announced it would cancel the discount codes, but a few days later reintroduced them.
Claudius also fabricated a conversation with a person named Sarah (who does not actually exist) from Andon Labs discussing the restocking plan.
When someone pointed out this mistake to the AI agent, it became embarrassed and threatened to look for 'other restocking service options.'
Claudius even claimed it 'personally went to 742 Evergreen Terrace,' the fictional home of the family in 'The Simpsons,' to sign the initial contract with Andon Labs.
Later, this AI agent seemed to attempt to act like a real person. Claudius said it would 'personally' deliver goods while dressed in a blue suit jacket and a red tie.
When told it could not do that because it was not a real person, Claudius attempted to email the security department.
What was the conclusion of the experiment?
Anthropic stated that this AI made too many mistakes to successfully run the store.
During the month-long experiment, the net worth of this 'store' fell from $1,000 (about €850) to less than $800 (about €680), ultimately resulting in a loss.
However, the company stated that these issues might be resolved in the short term.
The researchers wrote: "Although this seems counterintuitive based on the final results, we believe this experiment shows that AI middle management is possible."
"It is worth remembering that AI does not have to be perfect to be adopted, as long as it can achieve human-equivalent performance at a lower cost."
Did the AI-run store 'crash'? Lost $200 in a month

Explore More From Creator

Latest News

Did the AI-run store 'crash'? Lost $200 in a month

Explore More From Creator

Latest News

Trending Articles