ChatGPT launches Operator web browser agent

 Robert MacCloy Robert MacCloy

ChatGPT launched their Operator web browser agent today. This is a launch I've been anticipating for a while, so I was excited to try it out.

Here are a couple things to know about Operator so far, whether you're interested in task automation, AI agents, or a website operator interested in how these tools work on your properties -- there's some surprising stuff!

  1. Operator is a vision-based computer use model. It runs a full, "regular-ish" version of Chrome in the cloud and can interact with arbitrary web experiences. This is very different from ChatGPT Search and previous browser automation tools, more similar to Claude Computer Use -- but it comes with the computer.

    (Operator can still get confused because of particular web design / technology choices. More on this later.)
  2. Unlike traditional web crawlers and ChatGPT Search, Operator traffic is not immediately identifiable. It presents as a normal-ish Chrome user agent. In my testing, Operator requests come from IP addresses associated with Azure, which is normal for OpenAI. The specific IPs are not listed in the IP ranges OpenAI has previously published for bot activity.

    Operator is not a completely automated experience -- all Operator tasks must be "attended" by a human. Because Operator traffic comes from cloud systems, some Operator requests get flagged by Cloudflare, Akamai and similar as bot activity, usually resulting in a captcha screen. Unlike some web automation tools, Operator does not autosolve captchas -- the user must click in and take over.

    This is a conservative choice that should make website admins happy, but it does add a lot of friction in some cases.
  3. Operator can use browser capabilities like "Find on Page" (Control+F) -- in other words, it can automate interacting with the browser itself, not just interacting with web pages.
  4. Operator starts most requests that don't reference a specific URL or one of the example tasks (e.g. booking on OpenTable) by doing a web search, which is pretty similar to human behavior. If you've followed ChatGPT Search, it won't be surprising that Operator uses Bing for this by default. However, Operator can use Google if you tell it to.
  5. Operator is designed with a lot of guardrails to avoid inadvertently taking irreversible actions without confirmation (e.g. buying a product, sending a message.) Some UI design choices can cause this to trigger when it shouldn't.

    For example, in one of my test runs I had Operator find a specific product for my car. The website it landed on had a multi-step product search flow that required the user to hit a large red "Continue" button after inputting info. The model paused and asked for confirmation here even though this was a perfectly safe action to take -- I suspect this is because of the button design. Something to think about for UX designers!

I'll follow up with additional research and some takeaways soon. In the meantime, would love to hear your take on this launch!