eBay, DoorDash, Instacart, & Uber are among the companies that have signed on to participate in the research preview of OpenAI's new Operator agent, which the company describes as “an agent that can use its own browser to perform tasks for you.”
Operator acts as a virtual assistant, autonomously preforming tasks across the Internet such as shopping, filling out forms, and booking travel based on the user's instructions.
OpenAI wrote in its blog post announcing the launch:
“Today we’re releasing Operator, an agent that can go to the web to perform tasks for you. Using its own browser, it can look at a webpage and interact with it by typing, clicking, and scrolling. It is currently a research preview, meaning it has limitations and will evolve based on user feedback. Operator is one of our first agents, which are AIs capable of doing work for you independently—you give it a task and it will execute it.”
Here's how Operator works:
- It's powered by a new model called Computer-Using Agent (CUA) which combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning.
- CUA is trained to interact with graphical user interfaces like buttons, menus, and text fields that people normally interact with on a website.
- Operator uses screenshots to see and interact with webpages, using all the same actions as a mouse and keyboard, which allows it to engage with websites without the use of API.
- If Operator gets stuck and needs assistance, it simply hands control back to the user for a collaborative experience.
- To get started, users describe the task they'd like done and Operator handles the rest. Users can take over control of the remote browser at any point.
- Operator is trained to proactively ask the user to take over for tasks that require login, payment details, or when solving CAPTCHAs.
- It is also trained to decline certain sensitive tasks like banking transactions or those requiring high-stake decisions like applying for a job.
Operator and other AI agents like it will be a game changer for the visually impaired when it comes to accessing online services that would have otherwise taken them a significant amount of time to navigate using existing text-to-voice solutions.
Operator is currently available to Pro users ($200/month) in the US, with plans to eventually expand it to Plus, Team, and Enterprise users as well as integrate the capabilities into ChatGPT in the future. Operator is currently in an early research preview, and OpenAI warns users that while it’s already capable of handling a wide range of tasks, it’s still learning, evolving and may make mistakes.
A few weeks ago I reported that Perplexity, the AI-powered search engine backed by Jeff Bezos, Tobi Lütke, and other notable investors, debuted a new shopping feature for its paid customers in the US that offers shopping recommendations as well as the ability to place an order without going to a retailer's website. Unlike OpenAI's Operator, Perplexity powers the feature via integrations with websites like Shopify, Amazon, and Best Buy, whereas Operator uses visual browsing and can operate with most websites without an integration.