Barry Zhang of Anthropic gave a short and punchy presentation at AI Engineer Summit in New York in 2025 (see YouTube).
Below are elements that resonated:
"Should I build an agent" Checklist
Is the task complex enough?
No
Use Workflows
Yes
Use Agents
Is the task valuable enough?
<$0.1
Use Workflows
>$1
Use Agents
What is the cost of error/error discovery?
No
Reduce scope
Yes
Use Agents
What is the cost of error/error discovery?
High -
Execute as Read-only
OR human-in-the-loop
Low
Agents
Why is coding a great agent use case?
Complexity
Journey from design-doc to completed code is High
Value
$$$
Viability
Claude is great at coding
Cost of Error
Significantly mitigated by decent unit test and continuous
integration and prompt recovery techniques
Dan Shapiro (Serial
entrepreneur, CEO of Photobucket and Senior Research Fellow in Wharton's
Generative AI Lab) has determined that there are Five
Levels of the Use of AI in Code Development (mirroring the Five
Levels of Driving Automation)
Level
Summary
Description
0
Spicy Auto-complete
The Engineer writes all the code, but uses AI for auto complete of
the next variable_name or guessing the next couple of lines of code.
Tab to accept.
1
Coding Intern
AI writes boilerplate code for standard procedures with
well-documented instructions
2
Junior Developer
Human writes detailed specifications for requirements that may span
with multiple modules. You don't specify how it should do it, but
what it should do. The AI cranks it out. You still review it.
3
Developer
Human manages agents. Human feeds in specs, agents push code back to Human for
review and approval.
4
Engineering Team
Human writes product requirements and test thresholds and Human only
spots check output when system tests have passed. Human doesn't care too
much about what the underlying code does - as long as it works AND
doesn't fail.
5
Dark Software factory
No human writes the code or tests it - indeed it would be suicidal
if they did. Nate Jones (see below) makes the point that the code
generation doesn't know the test criteria during development (so not
like XProgramming): the AI has to develop its own unit and system
tests. Code-Gen submits it to a separate testing entity - a scenario
that I envisaged this time last year.
No-one is here yet (really).
How to apply to existing software
Here is Nate Jones on the migration path for existing software houses. This is an excellent video (and you should view the rest of it), but I jump to his commentary on how existing software providers need to implement this on their existing code base and development practices.
First use you use AI Level 1 & 2 to refactor existing code.
Use AI to reverse engineer your code to product specs that should have
been written first time around (and maintained - haha - product managers know
that specs are never maintained once the code gets into the hands of the
engineer!)
Then re-engineer your development process to be AI first: new testing
workflow (testing definition up front - how about that?!?) and different
review processes.
Next, new development uses autonomous agent development (and
maintain the old code base using the former methodology).
Finally, cast off the old way of doing thing, restructure your organisation to solely adopt these new techniques.
I think organisational behaviour of existing software houses will be a
HUGE barrier to adoption and there will be pitched battles when AI-gen and
human-gen which may well rip apart companies.
Does this mean that only start-ups can get to Level 4 and Level 5? Yes, I
think so.
Nate Jones (about, YouTube channel) is a well-regarded commentator on AI and pontificates daily - do bookmark!
Yesertday's analysis is a whopper - it's a must-listen.
I agree on lots and lots of points, far too many to mention:
Per-seat pricing model struggles when an AI-first world is fully embedded - it's not here now, but it will be.
The threat of AI is enough to make people renegotiate deals (KPMG is the example given)
The value of having someone on a retainer to fix 'stuff' when it goes wrong is worth a lot
I disagree on one minor point: that organisations want to have software built that orientates around them and their processes.
NO! If the business process that the software is managing is NOT a differentiating feature to the organisation, then it wants to adopt moderately good practice (not leading or bleeding edge, just moderately good).
Why? Because they don't want the risk - middle of the road is just fine.
So they are quite happy to give up their existing practices and adopt the processes dictated by the software because they know that others have adopted it too.
No-one ever got fired for buying IBM
Herding vs being a lone sheep is worth a lot - an awful lot - the same amount that got wiped off stocks today? No, much more.
This topic has been a real head scratcher – how do you charge for AI?
By consumption (eg by tokens used or by transaction) – and the value of a token isn’t too transparent either
By conversation
Salesforce CEO Marc Benioff made a pronouncement in December: per-seat with some fair use limitations (see The Register).
AI is a significant plank in Salesforce’s strategy. It has made lots of investment in the tech and running AI has costs too of course, so it needs to evidence some return.
Given that AI is touted to reduce headcount, the fact that Benioff is charging by per-seat might be surprising. However Forrester reports
55 percent of employers regret laying off workers because of AI. More people in charge of AI investment expect it to increase headcount (57 percent) than to decrease it (15 percent) over the next year.
So Salesforce is signalling that AI is an enabler for employees rather than a separate profit centre. This decision is a balance between:
User-based pricing. AI is likely to loved (and heavily-used) by a minority
Consumption-based pricing. This is kinda an open-book charging mechanism and the boys in Finance have no control on how big a bill employees might be running up.
In the end Salesforce believe understandable pricing outweighs matching cost to benefit, meaning that they’ll make some profit out of some customers and a loss on others. Forrester agrees – see Outcome-Based Pricing Is Coming — Be Ready For It
Gartner’s best case scenario projection predicts that agentic AI could drive approximately 30% of enterprise application software revenue by 2035, surpassing $450 billion, up from 2% in 2025.
Gartner sees five stages of Agentic AI evolution
This diagram needs some explanation - here's my understanding from Gartner's press release in August 2025.
Stage 1: AI Assistant
Gartner's prediction
By the end of 2025 most enterprise applications will have embedded
assistants.
Gartner's explanation
They simplify tasks and interactions for users but depend on human
input and do not operate independently.
Arthur's interpretation
Perhaps the application interface permits a user to type in an
instruction into a bot interface and the app responds, rather than
clicking through multiple pages eg 'Show me an exception report for
all grocery orders received since 5pm yesterday.'
Arthur's assessment
'Most'?? This definitely didn't happen!
Perhaps 10% by the end of 2026??
Stage 2: Task-Specific Agent Applications
Gartner's prediction
Up to 40% of enterprise applications will include integrated
task-specific agents by 2026, up from less than 5% today.
Gartner's explanation
These AI agents have the capacity to operate and perform complex,
end-to-end tasks.
An example is an AI-driven cybersecurity threat response agent that
scans network traffic, system logs and user behavior patterns in
real time. The agent then assesses and initiates a response as
appropriate.
Arthur's interpretation
Applications (perhaps from a single vendor?) can perform
multi-step operations autonomously. What's the difference from
today? Applications can follow a rules-defined process flow today. I
think the new shiny thing is that the app can use AI to determine
the process flow??
Arthur's assessment
Perhaps 10% by the end of 2026?? And it's likely that it is the
same 10% as have achieved Stage 1.
Stage 3: Collaborative AI Agents Within an Application
Gartner's prediction
By 2027, Gartner predicts one-third of agentic AI implementations
will combine agents with different skills to manage complex tasks
within application and data environments.
Gartner's explanation
Collaborative agents will offer more adaptable and scalable
solutions by learning from real-time data and adjusting to new
conditions.
Arthur's interpretation
Some predictive or forecasting capabilities that use Machine
Learning
Arthur's assessment
Err, I think this is an inevitable benefit from Stage 2
implementation??
Stage 4: AI Agent Ecosystems Across Applications
Gartner's prediction
By 2028, AI agent ecosystems will enable networks of specialized
agents to dynamically collaborate across multiple applications and
multiple business functions, allowing users to achieve goals without
interacting with each application individually.
Gartner's explanation
Arthur's interpretation
There will be an approved list of AI Agents (from multiple
vendors) that are permitted to collaborate together. Most likely I
envisage a marketplace where vendors list their agents and their
capabilities and customers pick and choose which can Apps can play
together and in what circumstances.
Arthur's assessment
Actually, I think there will be a completely new breed of
ecosystems that are native agentic AI first. (Indeed as a founding
member of the management team at Fetch.AI, I would say that
these ecosystems exist already.)
Here's an announcement about the formation of the Agentic AI
Foundation (AAIF) in December 2025 to do just that: Here's TechCrunch's
description:
Anthropic is donating its MCP
(Model Context Protocol), a standard
way to connect models and agents to tools and data; Block is
contributing Goose, its open source agent framework; and OpenAI is
bringing AGENTS.md to the table,
its simple instruction file developers can add to a repository to
tell AI coding tools how to behave. You can think of these tools as
the basic plumbing of the agent era.
The key question is how will successful will this new approach be
and how will existing Enterprise Software providers respond?
Students of disruptive
innovation theory (like me!) are watching with great interest!
Stage 5: The “New Normal” for Democratized Enterprise Apps
Gartner's prediction
Gartner predicts that by 2029, at least 50% of knowledge workers
will develop new skills to work with, govern or create AI agents on
demand for complex tasks.
Gartner's explanation
As agentic AI matures, standardized protocols and frameworks will
enable seamless interoperability, allowing agents to sense their
environments, orchestrate projects and support a wide range of
business scenarios
Agents will be created on the fly by humans and humans and AI will
collaborate in new ways.
Arthur's interpretation
I'm not sure what to think about this: I do think that
most knowledge workers will use an AI agents to help them with
complex tasks on a one-off basis.
Whether many enterprise employees will enable an AI agent to
exist permanently to achieve an objective (an objective that goes
beyond improving their personal productivity) I struggle with for
reasons of control, security, liability, cost. As a result, I think
creating enterprise AI agents will be a specialist role
within an enterprise.
Arthur's assessment
Nope, I don't think that this will happen in the way that Gartner
articulates.