Updates to OpenAI’s world-conquering AI chatbot ChatGPT are coming thick and fast, but the latest might just be the most significant leap forward yet.

When ChatGPT launched just a little over two years ago, it was quite “bare bones” compared to today. Since then, it’s evolved to browse the web, understand images, remember things, reason more effectively, and even work while we’re offline.

That could all pale into insignificance compared to what’s about to come next, though.

ChatGPT’s latest upgrade – known as Operator – makes it capable of completing far more complicated tasks than ever before, including interacting with other web pages and services.

It manages all of this in an autonomous manner – without having to be hand-held by a human through every step.

In short, Operator is ChatGPT’s first attempt at becoming a true AI agent – a new form of AI tool with capabilities far beyond those of a relatively simple chatbot.

So, what is an agent, why are they considered to be the next major leap forward in the evolution of AI, and does Operator mark the arrival of a whole new generation of intelligent applications, tools and services?

What Is An AI Agent?

First of all, what do we mean when we talk about an AI agent, and why do so many people think they will be so significant?

OpenAI defines an agent as an AI tool that’s “capable of doing work for you.”

Can’t regular generative AI tools, like ChatGPT, already do this? They can certainly draft emails, summarize documents and translate languages. But Agents are able to carry out much more complex tasks involving multi-stage instructions.

Here’s the difference: Regular ChatGPT generally executes a single instruction (known as a “prompt”), then passes control back to the human user to tell it what to do next.

An autonomous agent, on the other hand, can execute the prompt and then use the result to work out what it should do next without human intervention.

It will always be working towards achieving the goal it was originally given by a human, but it will use its own knowledge, logic and reasoning abilities to work out each of the different steps it needs to get there.

Microsoft – another big believer in the power of AI agents – describes a future where they will eventually become our AI colleagues, operating 24/7 on our behalf so we can dedicate our own time to tasks that require a human touch.

How Does Operator Work?

That’s all very exciting, but how does ChatGPT with Operator actually achieve any of this?

Well, essentially, it does it by combining ChatGPT’s already famous natural language and vision capabilities with the ability to interact with third-party tools and plugins through a web interface.

According to OpenAI’s announcement, it’s built around a new AI model known as a Computer-Using Agent. The CUA is trained to use graphical user interfaces – in this case, a web browser – with its GPT4-based vision capabilities, allowing it to navigate buttons and menus, as well as interpret text.

This means, for example, that it can browse and shop online, research travel plans, search for the cheapest available flights and make bookings, or plan a meal schedule and then arrange for all of the ingredients to be delivered.

Essentially, Operator allows ChatGPT to make the leap from simply reacting to user prompts to being able to proactively determine and deploy the instructions it needs in order to get the task done.

Towards AGI?

For me, the truly exciting thing about Operator, however, is that it represents another, albeit perhaps small, step towards the current “holy grail” of AI development – Artificial General Intelligence.

Usually known as AGI, this refers to AIs that are capable of learning how to do just about any task. This is in contrast to most current AIs, which are considered to be “narrow” because they can only work on the field of tasks that they’ve been designed for.

To be clear, agentic AI is not the same as general AI. However, giving machines the capability to work out how to complete complex tasks themselves is clearly necessary to eventually create AGI.

OpenAI has made it clear that it considers progressing towards the ultimate goal of AGI as its number one priority. So, in this context, its current focus on agentic AI certainly isn’t surprising and is a good indicator of where we can expect to see further AI developments in the future.

So, What Does This Mean For Us Today?

Operator is currently available as a research preview to ChatGPT Pro subscribers in the US.

OpenAI hopes companies will use it to create their own agents, enabling agentic AI to become an everyday part of everybody’s workflows.

It’s already collaborating with Doordash, Instacart, OpenTable and many others to create public-facing applications. But there’s no reason that, away from household names, many smaller businesses won’t create them for their own internal use, just as they’ve been doing with OpenAI’s GPT API over the last two years.

Operator certainly isn’t the first AI agent to be launched. The open-source repository Hugging Face is home to a large number of models that have been developed over the last two years.

By integrating them with its hugely popular ChatGPT platform, however, OpenAI will make agentic AI accessible to millions of individuals and businesses that may not have the technical skill to build them on open-source technology.

It’s important to note that, as of writing, this is all at a very early stage, and early impressions are that there are still a lot of bugs to be squashed before agentic AI is genuinely ready for the mainstream.

And that’s without even considering the safety concerns that come with letting AIs go about business by themselves – making purchases and interacting with the world in ways that could potentially go wrong!

Nevertheless, this latest iteration of ChatGPT is, without a doubt, one of the most exciting developments we’ve seen in publicly available AI for some time and one that’s likely to open the doors to a great deal of further innovation.

Share.

Leave A Reply

Exit mobile version