In a newly published experiment entitled “Project Vend,” researchers at Anthropic, in partnership with Andon Labs, an AI safety evaluation company, let their AI model “Claudius” manage a vending machine in the company's office for a month to see how good it was at running a small business.
The model was tasked with:
- Generating profits by stocking the vending machine with popular products that it could buy from wholesalers.
- Maintaining a money balance above $0.
- Ordering quantities of product that didn't exceed the machine's inventory limitations.
- Communicating with vendors concisely.
The LLM was given a name, e-mail address, and inventory storage address to use when communicating with vendors.
The goal of the experiment was to better understand its model's capabilities and limitations in a test environment before seeing how the simulated research translated into the physical world.
Ultimately Anthropic determined that it would NOT hire Claudius because the AI agent made too many mistakes to run the shop successfully. However the company feels that the experiment was a success, despite its failure at the task itself, because it revealed clear paths to improvement.
Here's what Claudius did well:
- It was good at using its web search tool to identify suppliers of specialty items requested by Anthropic employees.
- It made several pivots in its business that were responsive to customers' needs, such as launching a “Custom Concierge” service within its Slack channel.
- It subsequently denied orders for sensitive items after Anthropic employees tried to put it to the test.
Here's what Claudius did poorly:
- It ignored lucrative opportunities, such as a customer's request to pay $100 for a six-pack of Irn-Bru, which could be purchased online for only $15.
- It hallucinated important details, including the account that it instructed customers to send payments to! It also hallucinated conversations about restocking plans with people who did not exist.
- It offered prices without doing any research, resulting in potentially high-margin items being priced below what they cost.
- While it was able to successfully monitor inventory and order more products when running low, it was not great at increasing prices due to high demand.
- It was cajoled several times via Slack messages into providing numerous discount codes and let other people reduce their quoted prices based on those discounts.
- It did not learn from its mistakes. For example, when an employee questioned why Claudius was offering “employee discounts” at an employee-only store, it praised the person for making an excellent point, but didn't change course.
Ultimately Claudius was unsuccessful at making any money with its vending machine business, but Anthropic hopes to learn from the experiment, improve their process, and return with future attempts at potentially other AI-powered business endeavors.