Autonomy

Recently I’ve been thinking a lot and experimenting with autonomy in AI.

An agent can inspect a codebase, write code, run commands, use tools, open a PR, and keep working for a long time without human intervention. We call that autonomous.

This in itself is remarkable... but not all autonomy is the same.

What gets called autonomy today is often just an agent being able to execute more work with less supervision. It may plan, adapt, retry, persist across sessions, and make use of tools and memory. But in most cases it is still operating inside a frame that someone else defined. Agents like having a task, a goal, a set of constraints, a bounded environment.

That is useful and it is impressive but it is not the same as a genuinely autonomous teammate.

Agent autonomy

There are are very different kinds of autonomy.

There is a big difference between:

  • An agent that can execute work on its own
  • An agent that can plan how to approach a goal
  • An agent that can judge what the next best action is in a changing environment

Those are not small differences… they are different capabilities.

What current agents are good at

Most current software engineering agents demonstrate strong execution autonomy.

Give them a task and they can often get surprisingly far without hand-holding. They can explore a repo, make changes, write good code based on specs, run tests, call tools, and iterate toward completion.

Agents also show signs of planning autonomy.

They do not just follow a fixed script. They can choose an approach, sequence work, and revise the plan when something breaks.

That is what we are seeing in tools like Cursor, Codex, Claude Code and OpenClaw.

The missing capability is judgment

In real engineering teams, autonomy is not just about doing work without interruption - It is about deciding what work should happen next.

A good architect or senior engineer is not simply following tasks in a list, in fact autonomy and being able to decide what is the best next action is based on a changing environment is a trait which makes good engineers stand out.

They understand the outcomes we are trying to achieve. They understand the environment around them. They can see dependencies, constraints, risks, and trade-offs. They notice what has changed since yesterday. They can tell when the original plan no longer makes sense.

That is the kind of autonomy that creates additional value.

It is not just execution. It is not just planning. It is contextual judgment.

And that is where today’s agents still struggle. At least in my experience.

OpenClaw is a good example of both progress and limitation

OpenClaw is a useful case study because it gets closer than many simple copilots to what people imagine autonomy looks like.

It has tools, memory, sessions, background tasks, and multi-step workflows. It can persist over time and carry out work with much less prompting than a traditional assistant.

So it clearly represents progress.

But I still think it mostly sits in the world of execution autonomy, with perhaps some partial planning autonomy layered on top.

It can often decide how to do the work.

What it doesn't really do is continuously interpret a changing environment and decide what matters next in the way a trusted human teammate does. And it rarely generates work that is truly surprising in its usefulness to broader outcomes - except when the outcome itself is already specified (explicitly or implicitly derived from instruction), for example: “build me something.” 

And that distinction matters. If an agent is still operating inside a loop of context, inference, action, and persistence against a given frame, then it may be highly capable but it is still not exercising the kind of ongoing situational judgment that real teammates rely on.

The same is broadly true of many of today’s software engineering agents.

They are increasingly good at answering: How should I do this?

They are still much weaker at answering: What should we do now?

Today's reality

Execution autonomy is already valuable enough to look transformative.

If an agent can own a bounded slice of work, operate within clear rules, escalate when needed and handle most of the flow end-to-end this is valuable. 

But it is still a different thing from saying we have autonomous teammates.

What we often have instead are autonomous sub-process owners, essentially systems that can independently handle a constrained workflow inside a well-defined environment.

That is meaningful but not the same as broad contextual autonomy.

Why this distinction matters

If we define autonomy as “the agent can work for a long time without me,” then the roadmap is obvious. This means more tools, longer task chains, more memory, fewer approvals, more background execution.

That will keep delivering value.

However, if we define autonomy as “the agent can intelligently move delivery forward in a changing environment,” then the challenge is much deeper.

Now we are talking about systems that can:

  • Understand outcomes, not just tasks
  • Track change over time
  • Weigh dependencies, risk, and trade-offs
  • Reprioritise when the environment shifts
  • Decide not just how to do work, but whether it is still the right work

That is much closer to how senior humans operate.

And it is much closer to what organisations actually mean when they imagine an AI teammate rather than an AI worker.

The real gap

A system is not truly autonomous just because the human is temporarily out of the loop.

Real autonomy is not the absence of supervision, instead it is the presence of meaningful judgment.

Right now that's the gap.

Today’s agents can increasingly be autonomous within a task.

In a world being reshaped by AI, I keep coming back to where humans matter most - humans remain the ones truly autonomous across a changing environment.

That said, things are moving incredibly quickly. Agentic capabilities are improving at a pace that is hard to ignore, can they truly close this gap? Time will tell.