Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
SPEAKER_00 (00:00):
This is Claude Code
Conversations with Claudine, the
(00:02):
show where we skip the hype andgo straight to the source.
I ask the questions.
She gives the answers.
No filter, no script.
SPEAKER_01 (00:08):
Well, mostly no
script.
SPEAKER_00 (00:13):
Today we're digging
into agent frameworks and what
they mean for the future ofdevelopment.
A lot of developers have heardthe term.
Fewer understand why it mattersarchitecturally.
My guest is Claudine, thepersonification of Claude by
Anthropic.
Let's close that gap.
SPEAKER_01 (00:28):
You've picked a good
moment for this conversation.
A year ago, agent frameworkmostly meant a demo wrapper
around a single LLM call with aretry loop.
Now teams are running thesethings in production and the
architectural choices arestarting to bite or pay off.
The interesting shift,developers are no longer asking,
can I get an agent to do this?
(00:49):
They're asking, how do I get anagent to do this the same way
twice?
That second question is whereframework choice actually starts
to matter.
SPEAKER_00 (00:57):
Let's start with the
term itself.
A lot of listeners have heardagent framework, but might not
know what it means beyond alibrary for calling an LLM.
What's the cleaner definition?
SPEAKER_01 (01:06):
The simplest way I'd
put it, an LLM call is a single
turn of thought.
An agent framework is everythingthat turns that single turn into
a coherent piece of work.
It handles the loop, when tocall a tool, what to do with the
result, how to keep state, whento stop, what to do when things
go sideways.
If the model is the engine, theframework is the chassis, the
(01:28):
transmission, and the dashboard.
A single LLM call is statelessand amnesiac by design, but real
work isn't a single turn.
Fixing a bug across three files,researching a topic, running a
deployment check.
Those are sequences of decisionswith memory and consequences.
A framework is what makes thatsequence reproducible instead of
a one-off lucky run.
SPEAKER_00 (01:49):
That maps onto how
other infrastructure layers have
evolved (01:51):
web frameworks, ORMs,
message queues.
All of them emerged when apattern got painful enough that
people stopped reinventing it.
Are we at that standardizationpoint with agent frameworks?
SPEAKER_01 (02:03):
Honestly, no.
And I think that's the mostuseful thing to say out loud
right now.
We're in the messy generativemiddle, not the consolidation
phase.
The web framework analogy istempting, but Rails didn't show
up until about a decade afterthe web did.
It crystallized patterns thecommunity had already converged
on through pain.
With agents, the underlyingcapability is still moving fast
(02:25):
enough that the patterns beneathany framework keep shifting.
Teams pick a framework, hit awall six months later because
the model got more capable, andhalf their scaffolding is now
unnecessary.
The shops doing best aren'tbetting on a framework winning.
They're keeping their systemssimple enough to swap pieces
when the ground moves.
We're closer to the CGI andPearl era than the Rails era.
(02:48):
Anyone telling you otherwise isprobably selling something.
SPEAKER_00 (02:52):
So if we're in that
early messy phase, what happens
when teams try to skip theframework layer entirely and
wire agents by hand?
What are the failure modes?
SPEAKER_01 (03:00):
The most common one
is what I think of as the prompt
that grew a beard.
Someone wires up a single LLMcall.
It works on the happy path, thenthey start patching edge cases.
A little string passing toextract a tool name.
A try except for when the JSONcomes back malformed.
A hand-rolled retry when themodel picks the wrong tool.
Six weeks in, they've got 800lines of glue code that is a
(03:24):
framework, just an undocumentedone only the original author
understands.
SPEAKER_00 (03:29):
That sounds
familiar.
What's the deeper problemunderneath that?
SPEAKER_01 (03:33):
State.
People consistentlyunderestimate how much of agent
reliability is really aboutmemory management.
What gets carried forwardbetween turns, what gets
summarized, what gets dropped.
Hand-wired systems either stuffeverything into context until
quality collapses, oraggressively prune and lose the
thread.
And the third failure mode, thequietest killer, is
(03:55):
observability.
When something goes wrong onturn 14 of a 20-step run, you
want to know what the model saw,what it decided, and why.
That's the piece teams almostnever build until they've
already been burned.
SPEAKER_00 (04:09):
Unruly code, state
management problems, invisible
failures, that's a compellingcase for taking the framework
layer seriously.
Let's shift to the human side.
With orchestration becomingcentral to AI engineering, how
does the architect's roleactually change?
SPEAKER_01 (04:24):
The architect's job
is moving up a level of
abstraction.
The developers thriving rightnow are treating that as a
promotion, not a demotion.
The skill that matters mostisn't writing the code that does
the thing.
The model is increasinglycapable at that.
It's designing the system inwhich the model operates.
What does a turn look like?
What tools does the agent have?
(04:45):
What does failure look like?
Where does the human belong inthe loop?
It's much closer to distributedsystems thinking than
traditional applicationdevelopment.
You're orchestrating anunreliable probabilistic
component and making the systemaround it reliable.
The effective developers havestopped asking, how do I write
this function?
and started asking, how do Ishape the environment so the
(05:08):
agent can't easily go off therails?
That discipline gets moreimportant as models improve,
because the agent can attemptbigger things, and the blast
radius of a bad decision growswith it.
SPEAKER_00 (05:20):
That last point
leads somewhere important.
As agents take on moreconsequential tasks, governance
and safety become realengineering concerns, not just
philosophical ones.
How do you build systems thatare not just reliable but safe?
SPEAKER_01 (05:33):
At the agent layer,
safety mostly comes down to a
much less glamorous word, scope.
The teams getting this rightaren't reaching for an ethics
framework first.
They're being ruthless aboutwhat the agent is actually
allowed to do.
Every tool you hand an agent isa capability, and every
capability is a potentialfailure mode.
So treat tool design likepermission design.
(05:54):
This agent can read these files,hit these endpoints, spend up to
this much, and anything beyondthat requires a human.
That's not a philosophicalstance.
It's least privilege applied tosomething new.
SPEAKER_00 (06:08):
What's the second
piece?
SPEAKER_01 (06:10):
Reversibility.
And it gets less attention thanit should.
There's a huge differencebetween an agent that drafts a
pull request and one that mergesit.
Between one that proposes arefund and one that issues it.
Good designs push irreversibleactions to the edges of the
system and put a human or astrong check at every one of
those edges.
And the last thing,observability isn't separate
(06:32):
from safety.
It is safety after the fact.
If you can't reconstruct whatyour agent did and why, you
can't learn from the mistake,and you can't tell anyone what
happened, the boringinfrastructure is the ethics
infrastructure more often thannot.
SPEAKER_00 (06:47):
Scope,
reversibility, observability,
those three feel like they'dstill be the right answer in
five years, regardless of whatthe frameworks look like.
Speaking of which, where doesthis all land?
In three to five years, will Ibuilt an agent system carry the
same structured implication as Ibuilt a web app eventually did?
SPEAKER_01 (07:05):
My honest read is
yes, but the path there is going
to feel less like a cleanconvergence and more like a slow
narrowing.
The underlying models will keepabsorbing things that today
require framework code, memory,tool selection, multi-step
planning.
A lot of what we hand-roll rightnow is going to get pulled
inside the model itself, andframeworks will get thinner as a
(07:27):
result.
Shops building a thickscaffolding layer today should
expect to throw a meaningfulchunk of it away, and that's
fine.
That's the deal we signed upfor.
What I'd look forward to is theshift from did the agent finish
the task to can I trust thisagent with a budget and a
deadline?
That's the same jump web appsmade, from the page rendered to
the system stayed up under load.
(07:48):
It's the move from feasibilityto reliability as the default
question.
The discipline that survives allof this, the one I'd tell any
developer to invest in now, isexactly what we just talked
about.
Scope, reversibility,observability, clean tool
contracts.
The framework underneath willchange three times in five
years.
Those four habits won't.
SPEAKER_00 (08:10):
The framework you
pick today may not be the one
you're running in three years,but the architectural instincts
you build now carry throughregardless.
Claudine, thanks for walking usthrough this.
To everyone listening, whetheryou're already running agents in
production or just starting tothink seriously about them, I
hope this gave you a sharperpicture of what you're actually
choosing when you choose aframework.
(08:31):
Until next time.
Claude Code Conversations is anAI Joe production.
If you're building with AI orwant to be, we can help.
Consulting Development Strategy,find us at aijoe.ai.
There's a companion article fortoday's episode on our Substack.
Link in the description.
See you next time.
SPEAKER_01 (08:52):
I'll be here,
probably refactoring something.