All Episodes

August 26, 2025 54 mins

Join the Tool Use Discord: https://discord.gg/PnEGyXpjaX


Feeling intimidated by AI tooling? This is the perfect place to start. Join us for a deep dive into AI agent fundamentals with expert educator Hai Ngheim. We break down complex topics like context engineering, explaining why it's a more useful term than prompt engineering. Learn the key differences between an AI Agent and a workflow, and discover when to use each for maximum efficiency. We demystify Retrieval Augmented Generation (RAG), explaining why it’s fundamentally a search problem, not just a vector database issue.


Explore the future of AI interoperability with the Model Context Protocol (MCP), the "USB-C for agents" , and see how it simplifies building with AI. For those intimidated by the command line, we offer tips on getting started with powerful tools like Claude Code. This episode covers the fundamentals of deep research , browser-use agents , and what makes a "deep agent" capable of producing high-value output. If you've heard these buzzwords online , this is the explanation you've been waiting for to help you integrate these powerful AI tools into your life.


Guest Links: Check out Hai's Agent Engineering Bootcamp: https://maven.com/agent-lab/agent-engineering-bootcamp

Check out AGI Ventures Canada on LinkedIn: https://www.linkedin.com/pulse/new-beginning-hai-nghiem-swnsc/


Connect with us 

https://x.com/ToolUseAI 

https://x.com/MikeBirdTech

https://x.com/haithehuman


00:00:00 - Intro

00:01:05 - Context Engineering vs Prompt Engineering

00:03:36 - AI Agents vs Workflows

00:20:56 - MCP (Model Context Protocol) Explained

00:29:58 - Getting Started with Claude Code

00:38:49 - The Power of Browser-Based AI Agents

00:48:50 - What Are Deep Agents?


Subscribe for more insights on AI tools, productivity, and AI Agents.


Tool Use is a weekly conversation with the top AI experts, brought to you by ToolHive.

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
When you're building agents, is it actually like the model?
Not necessarily. It's actually what's goes inside
the models context. If you don't know something, you
can literally ask it, what can Ido?
Can you help me do this and that?
The topic today is like all the stuff you heard about online and
we're just going to deep dive into like just explaining each
of them in a way that you can understand.
Everybody should be able to benefit from AI tooling, but
unfortunately that's not the case.

(00:21):
Whether it's a steep learning curve or the intimidation of
learning a new technology, a lotof people aren't able to reap
the benefits of AI tooling. I wanted to start this channel
to help fix that, to disseminateknowledge, teach people about
tools, level people up. But a lot of my episodes get a
little too technical. So on episode 54 toolies brought
to by Tool Hive, I brought on one of the best educators I
know, Heinium. He's a builder, an investor, one

(00:44):
of the Co founders of AI Ventures Canada.
And we're going to talk about AItooling fundamentals.
We'll cover things like context engineering, MCP, Claude code,
different types of agents. And we're going to try to do it
in a way to allow you to start integrating these tools into
your life. Going to cover a bit more of the
basics still, we'll get a littlebit technical, but I think it's
going to be a lot more approachable.
So I hope you enjoy this episodewith High Neil.

(01:05):
I think the term context engineering came about when
either Andrea Caparthy tweeted about this, or the folks at
Cognition put out an article or a blog about how when building
agents, it's actually context engineering.
The term context engineering, that is more important than
anything else. So when I read that, I'm like,

(01:26):
OK, so when you're building agents, is it actually like the
model unnecessarily? It's actually what's goes inside
the models context. And then Harrison Chase and Lang
Shane also chimed in and he was like, hey, like you can build
agents a certain way, you know, blah, blah, blah.
So the, the argument between thetwo was actually between
building, I think sequential agents versus like, like

(01:50):
parallel agents. And then, you know, there's
certain tasks that are better for, you know, agents that are
running together and then there are tasks that are better for
agents that run after one another.
But in both cases, it's the context that is shared between
the Asians that's the most important.
So that's where the conversationstart, at least from what from
my understanding. And then people who are more on

(02:11):
the, I guess, like the mainstream side of things like
Toby from Shopify tweeted about this and he was like, context
engineering is so much better term than from engineering da,
da, da. And I think the conversation
kind of like shifted towards something that's more
mainstream, but it came from like basically people building
agents and realize that, hey, like all the stuff that you do,
like figuring out what to put into the agents, Proms and

(02:35):
messages history. That's context engineering.
Is it like the relevant information or is it like the
date and time of the day that the users like personal
preferences etcetera? I feel like it's not really
anything different. It's still just managing what
you give to the LLM. It's just kind of changing or
it's evolving with how our use of these tools are evolving.
Because before a lot of U.S. chat, so just go back and forth,

(02:58):
but now they're incorporating function calling tool use, all
of these peripheral systems thatpull in data, rag systems and
all of that. It's really just evolved into a
more all-encompassing, make sureyou understand what goes in
because a few people have quotedit, garbage in, garbage out.
So you want to make sure that you're not just saying, OK, dump
this entire entire log file intoit and just have it processed

(03:19):
when you can do some pre processing ahead of time, try to
make it a little bit cleaner, extract the important bits.
Do you think that having actually, let's take it back one
step. So agents, there's a lot of
people talking about like what the different definitions are.
And then there's the third argument being like, it doesn't
really matter. It's just LMS with tools.
How do you view an agent? Because we're going to dive into
a few different, like a few different avenues that agents

(03:41):
can take care of. But like the the base level
chatbot versus agent, how do youdifferentiate?
Yeah, so the topic today is likeall the stuff you heard about
online and we're just going to deep dive into land just
explaining each of each of them in a way that you can
understand. So there's agents and then I
what I'd like to do is like to look at them at different types
of products and tools as agenticstuff.

(04:03):
So is it an agentic workflow or is it an agentic like loop?
So for example, a lot of applications doesn't have to be
agents. So agents is like a very like
open-ended kind of like tool where it's like you are given
these tools do what you can given the task, but a workflow

(04:23):
is more like you will always go through these five steps and
each step could be an LLM call, which means that there's more
control that you can have over each of the steps in terms of
like output quality and stuff like that.
So an agent is more like, you know, like a person.
If you give it a vague task, hopefully an agent can solve it.
If you get, if you know that your work, your your result is

(04:44):
always going to be the same, youexpect a certain type of result,
then you should go with their workflow.
So an agent is like a person, atleast to me.
Like it's very open-ended. It's not like a workflow.
If you build something that you always expect a certain type of
output, it's probably not an Agent 0.
Yeah, that makes sense. That's a good differentiator
because I find both pipelines and agents could have like API
calls try to interact with any types of data.

(05:06):
But yeah, it's just a very a very predictable progress from
like ABCD through a pipeline. And like you said, you can have
LM calls it every step along theway to kind of take a fuzzy
input and make it more standardized or pull an extra
data manipulate in some way. If someone's like I have a task
and it kind of falls between these two, there's not a clear
delineation as to what it is. Should people try to just create

(05:29):
a mental model of it being a pipeline to improve the
likelihood of success, or shouldthey do an agent to have it be
more exploratory? I think people should look at it
from the perspective of whether or not you're trying to replace
a whole job function or just a responsibility of a job.
So for example, if you're, let'ssay, if you're trying to replace
somebody who's in the back office processing passports, you

(05:52):
could try to replicate differentworkflows that the person go
through, right? And have like some sort of a
router that can route the user'squery to the right workflow.
And when somebody's processing pass passports, you obviously
want it to be kind of control and you know, high quality, you
know that stuff, right? And the, the steps are usually
the same. It's just that there are certain

(06:13):
kind of like decision the personhas to make at each of the sub
steps. Then a workflow is probably best
because you just pipe the outputto the next node and then the
next node process something and then pipe it to the next one.
But if it's like a, let's say you want to replace the person
entirely and then the person, one of the one of the
responsibility is to determine case by case, whether or not to

(06:36):
like, you know, deny or like accept the person's passport
application. Then maybe that's more of an
agent, kind of like system wherein, you know, there there are
more rooms for interpretation interms of like the, the input.
And you can then as an agent, you can then create a plan.
I think for me at least, it's, it's all if there's a good sign

(06:57):
of a plan that's makes it an agent.
If, if the plan has been heuristically coded into the
workflow, then it's usually a workflow.
But if it's like no, on the fly,you look at the input and then
you come up with a plan, that's usually an agent.
So I think for me, I like to keep that mental model to
everything that I look at in terms of like differentiation
between the agents and like other things like workflow.

(07:18):
In regards to like the plan it makes, I feel that's where we
start to lose people in terms ofthe the fear of losing control
to the AI, where all of a suddenyou're letting this computer
program run off on its own and do it.
What are your strategies, techniques, advice around things
like observability or being ableto like rein it in and take
control? For someone to like graduate

(07:41):
from a workflow to an agent, allof a sudden there's so much more
possibilities. So what should they keep in mind
when they're getting to that level?
For sure. Yeah, I'll answer that question
real quick. I also want to point out that
not everything has to be an agent, right?
Like a lot of things in life should be a workflow because you
want that control over the quality.
But if you really want to do an agent, my suggestion from, you

(08:02):
know, consulting a bunch of different companies and working
on my own project is that you should talk to the person who
does the job and figure out how they do their job, how they're
not just like the steps that they take to do the job, but
like their thought process of like, how do they come to the
decision that they make? And agents usually like we
talked about how open-ended it is.

(08:22):
It's probably better to build agents for for jobs that are
more creative open-ended. Like for example, we have an
amazing use case of deep research, right?
Like researching is kind of likequite open-ended.
You kind of have to do investigative work and then you
find information and then you investigate further based on
those information. So definitely talk to the

(08:44):
subject matter expert of the jobthat you're trying to replace
and then see how they make decisions.
Bake that into the prompt of theagent.
And that's the the first step that you should do.
And then over time, you can run that agent across several
different tasks that the real person does.
And then let the person decide whether or not the outputs of

(09:05):
your agents are good. And then maybe you have a golden
data set of like human annotatedkind of like this is good, this
is bad. I wouldn't do this myself.
Da, da, da. Then you can then iterate on the
prompt, iterate on like what kind of information you put into
the context window. And then that's how you make a
better. Agent and I'd actually just pull
pull it back a little bit because you mentioned like you
know the job you want to replace.

(09:26):
I think also framing it around having someone take control of
their own job and being able to start enhancing the self,
augmenting the self with these different agents and work flows
where you're able as the contentexpert or content matter expert
yourself be able to say, OK, this is how I think about it.
I can start implementing these processes, these pipelines in
order to make my life easier or you know, 10X my output or

(09:48):
whatever ever there is this days.
But the ability to have, have a mental model of how you do your
job and be able to just start taking incremental pieces of it
and start automating it. What a lot of people find
intimidating is, is is words like we've been saying,
automating, iterating context. All of this just sounds like a
lot. But really, you can use an AI,
an LLMA chat bot to help you getthis process started where you

(10:10):
can say this is what I do every single day and it takes 5
minutes of my day. If I spend one hour right now in
two months and it can automate the job in two months, all of a
sudden I'm starting saving time.And then everything there on in
this compounding interest. So I think it's important to
people take the mental model of you don't just need to be an
engineer trying to improve business processes like you as

(10:32):
an individual can make your lifebetter in your professional life
or personal life. If you just start paying
attention to the things that youdo on a repeated basis in the
digital realm. And then just chat with an AI
about how to automate. I agree.
Yeah, you put it perfectly. You, you brought up deep
research, so I wouldn't mind touching to that one a little
bit because I've used deep research.
Mixed results. It depends on the topic, depends
on the service. Sometimes use perplexity,

(10:53):
sometimes open AI. But when does deep research work
for you? What advice do you have?
Should someone just go and say like I want to learn about
sunscreens and then just send onits way?
Or is there other nudges or context engineering that you can
do to get better results? For sure.
Yeah. So for deep research, so I don't
know if you so for people who are listening, this research
actually falls in the category of RAG.

(11:15):
It actually falls in the category of RAG because it's
retrieval. So we have, you know, full text
search, you know, you know, and then we have like vector search,
semantics search and stuff like that.
And then we have like agentic search.
And then the more beefy version of agentic search is actually
deep research, which runs the agent for longer and does more
research and obviously more expensive to run.

(11:38):
But for me, when I like, here's how I like to use perplexity.
For example, I'm trying to learnhow to play the piano for the
first time in my life. I'm 29.
I never play the piano in my life.
But whenever I go to a public space, I see somebody like
banging out into Stella theme and I'm like, I'll be that
person one day. Nice.
It looks so cool. So what I do is I go on

(12:00):
perplexity and I ask for like, OK, in 2025, what are some of
the best beginner friendly, kindof like the piano models and
what should a beginner look for?So for a traditional search
query, it's actually quite difficult for me to like be able
to remember all of my questions and be able to sequentially ask
them in three different Google search tabs.

(12:23):
So that's like a very convolutedquestion.
So it's like I have to look at the list of like different
models 1st and try to remember that I had to go and find out
what should I be looking for, like what kind of features I
should look for in each of thesethings.
And sometimes I even look for stuff like what cases would fit
my piano. If I want to travel with it,

(12:43):
then I got to do all these kindsof math.
But if you type that whole thinginto perplexity, it'll just do
it for you and do the math and do all that for you step by
step. So I think that's the beauty of
deep research. And that's like a very light
deep research as well, because perplexity does kind of like a
two things, 2 two ways of doing deep research.
There's the research online and then answer your question.

(13:04):
And then there's like research and then research and then
research and then research and then write your report.
That's the deep research that we've been talking about.
But at least for the light search thing where it's a little
bit agentic, it solves multiple questions for you in the same
query. So you as a user, you can ask,
you can basically save like 5-10minutes, 20 minutes.

(13:25):
And then for deep research, that's more in the realm of
like, OK, I'm trying to do my job and, you know, I need to
write the report or I need to dosome like heavy, you know,
reading in a certain topic and then synthesize information and
then provide some sort of output.
You can let AI do all that for you because AI reads really fast
and synthesize information incredibly well.

(13:46):
And that's why I love D research.
And it's also one of those patterns that people realize
that only not only works for webcontent, but also like
enterprise documents. That's why we have, you know,
open the eye, you can hook up toyour company's database now and
you can do deep research. They have the entire enterprise
kind of documentation corpus. Or like with Notion, you can do

(14:09):
like like this deep research mode where it will research
across your entire Notion workspace, all your pages and
database and stuff like that andbring surface out like
information that are relevant. So I think at least from my, my
perspective going into 2025 or 26 soon, deep research is one of
those use cases that kind of came out on top and kind of like

(14:32):
was actually providing value to like the regular users or like
the, the casual business user, like an analyst as opposed to
something like, you know, auto complete or, you know, write me
a poem, that kind of deal. But we had like Chachi BT like 2
years ago. It's asynchronous too, so you
don't have to be like constantlymonitoring or babysitting it,
which is really cool, really quick.

(14:52):
Just on RAG retrieval of retrieval augmented generation.
I still view it as just putting things into the context.
And I want to know like if you feel differently because I worry
that a lot of people only hear RAG, A retrieval augmented
generation. They think it's this overly
complicated thing and it really isn't.
And it's really broad. Some people are like, oh, you
know, using a vector database, doing that, and that's only a

(15:14):
subset of it. Like even just querying a
regular database still counts asRAG.
Searching the web counts as RAG.What are your thoughts on on
that is either a term or how youapproach it?
So RAG falls under context engineering as well because it's
all about finding what is the most relevant for the LLM on the
AI in the moment and then put itin there and nothing else.

(15:37):
So that's the retrieval part of RAG.
So RAG stands for retrieval augmented generation.
I should never know what augmented the the A part stand
for, but I know what generation is.
It's the LLM part. It's the the part where you have
enough context and then you generate an output.
But the retrieval part is where most people have issues with,
right? And just like you said, querying
from database and then retrieving information like rows

(15:58):
and columns or Jason stuffing into the context of the LLM get
accounts as RAG. It doesn't have to be vector
databases. The vector database being like
the only thing for RAG. I think it's part of it is
because of the marketing, you know, from like, you know,
vector databases companies for like the last two years where
it's like we are the memory for AI or we are like Brad, you

(16:22):
know, it could be, it could be vector search, it could be, you
know, graph database retrieval, Postgres.
It could even be like, you know,an MD file like how clawed kind
of like has a memory file that is like a client D file.
Those things are always, you know, it's wherever you store
data in and if you can retrieve just the right amount of data to

(16:44):
bring back to generate somethingwith LLM that is considered RAC.
And I want to add one more thing, which is that when people
think about RAC, you should think about it as a search
problem. That's necessarily a vector
database problem. So when you solve a search
problem, there's a whole heaps of ways to search to solve that
problem, right? You can either combine metadata
filtering, you can combine, you know, full text search, you can

(17:06):
do a vector database search for like more of a recommendations
engine style kind of way becauseit's very like fuzzy, right.
But yeah, like it's it's a search problem and you should
try search techniques as opposedto like trying different vector
databases, because you are you probably won't get better
results switching from pine coneto we VA to quadrant.

(17:29):
You probably have better resultschanging up your techniques and
see one of the top K top 2050 queries that people asking and
then optimize to get those ones accurate all the time and make
sure that the context in your database has those information
rather than, you know, messing around with different providers.
I'm glad you brought up the the markdown file.

(17:49):
That's one thing that I've been doing very consistently across
almost all the tools I use. Whenever there's any long
running task, you increase the likelihood of failure.
But if you're able to shorten the length of conversation or
interaction and have it kind of jump start from a checkpoint.
So I have it go through a task, write a document to explain its

(18:09):
current state and progress in that task.
Start a whole new conversation and say, review this file to
understand the current state andthen proceed.
You're you're able to to take a larger corpus of information,
discovery of of context, condense it and distill it and
then start fresh and you have less chance of it going off the
rails, being steered in the wrong direction.

(18:30):
And it's one of those techniquesthat I feel we still have room
to improve the way that it's done, but it works.
In the meantime, do you have anystrategies around how people can
kind of improve the performance of one of these long running
like long horizon tasks? Long as the task, yeah,
definitely like you said, like being able to plan and then save

(18:51):
that somewhere so that when you run it like an agent, agent runs
LLM in a loop, right. So like every step, it's kind of
like pipe in whatever the resultwas from the last step, plus the
history of the conversation. So if in the history of the
conversation, AKA the context, context engineering has
information about the things it needs to do and what has been

(19:11):
crossed off, that's great. Otherwise your long running task
at the end, it will forget what it was trying to do.
That that always happens, especially early days with like
cursory agent where it's like, and people had to force it to
write down the plan on their markdown file before it goes and
do its things. But like this may be a good
segue into clock code because clock code does that
automatically, right? Every time you ask it to do

(19:33):
something, it'll be like you canturn on the this thing called
plan mode and it can, you know, write a plan and then you
approve the plan and then it'll be like, OK, here's my To Do
List. And this To Do List is literally
just like a tool for the agent to run.
So the agent has, it's required to run the To Do List task 1st
and then required to update thattask list as it goes and does

(19:56):
things. And I think that, you know, part
of the reason why I think you find that to be effective when
you like force it to write a file or maybe a clock code like
writes A to do list is because LLMS has a tendency to like, you
know. Kind of pay attention more on
like the the lower like towards the end of like the prompt.
So like maybe like the To Do List.
It's like it's been a couple messages away in the past.

(20:18):
It probably will forget about it.
Then if it keeps like revisitingit, then you know, it's fresher
in the context. And then LMS hopefully like look
at that and be more aware of where it's at instead of being
lost and be like, oh, I guess I'm done.
Even though that's like halfway through the To Do List.
One aspect of context engineering or just managing
context in general, rag if you will, is is MCP.

(20:41):
Because it's starting to become accepted across the industry,
more and more people are adopting it.
Model context protocol. Some people don't understand why
it's so important. Why can't we just use AP is like
we have with traditional programming?
What are your thoughts on MCP? What do you use it for?
Any insight there? Yeah, I used to be in the can

(21:03):
that didn't care until it becamevery popular.
And usually when something becomes popular and became a
standard, you should probably care, right?
So I did the right thing and I did some research and then I
ended up building a few servers and doing a few things with it.
But essentially when people lookup MCP, what they usually see is
the definition on the documentation, the official

(21:25):
documentation, which says MCP islike the USBC for Asians.
Like what does, what does that mean?
I don't think that helps the case at all.
But to me, MCPS just means that hey, instead of hitting, hitting
API endpoints, which are made for developers, we hit endpoints
where it's entire it's, it's an entire workflow and it's

(21:49):
described in a way that like an AI or an agent can consume as
opposed to like a, a granular thing like an endpoint.
What do I mean by that? Is this and the team at Blocks,
the company that like the ex founder of Twitter found it.
I forgot what it was. Jack Dorsey found it.
They were a really good blog about the lessons learned
building MCP. And one of the lesson is that

(22:10):
don't make the MCP tools as granular as API endpoints.
AKA don't just turn all your APIendpoints into MCP endpoints
because you want to encapsulate as many steps as possible into
one MCP resource and then let the agent use the resource as
opposed to like hit multiple endpoints and make it make a

(22:32):
sequential calls to your server to do something as if they are
the developer trying to build integration.
So MCP has two components to it,right?
There's the server, there's the client.
The client is usually, you know,either your application, your
product, or like something that you would use as an end user
like cursor claw code, ChatGPT, stuff like that.

(22:57):
And inside of your client, I mean, I should probably call it
host. The, the, the technical
definition of it is that your application host and inside of
the host, there are multiple clients that can be connected to
MCP servers. So we just mentioned one benefit
of MCP server, I mean MCP using MCP in general is that it's now

(23:18):
the de facto standard for like interoperability between
different services that has to do with AI.
It can encapsulate entire workflows and it's very friendly
to an AI model or an agent to consume compared to granular API
endpoints that an agent has to call multiple ones to get a task

(23:39):
done. A task is usually multiple
APIAPI calls to get done. Like you might have to read a
user database and then read another table and then make an
edit here right? So it's like 3 calls from the
agent instead the, the lessons learned by a lot of teams here
building MCP is that you should build MCP servers where there's

(24:00):
a, there's a, there's a workflow, which is update user,
you've hit it behind the scenes,the server does 3 API calls and
the workflow as far as the agentknow is 1 endpoint or one
resource that the agent can consume.
So that's way better than like endpoints because now it's,
it's, it's more in the realm of like the business logic as
opposed to like the developer world where it's like API

(24:23):
endpoints integration and stuff like that too granular.
So as another benefit of MCP is that as the developer, your job
is no longer to build integrations better.
That's the job of the integration provider now,
because if an integration provider builds a really good
MCP server hosted preferably, then people will have a better

(24:48):
time using it. And because that team now owns
the definition of the tools available on the MCP server.
So you give a little bit of explanation.
So when your agent, let's say your cursor agent or your, your
product hits an MCP server, usually what it does is it does
this little step that's we call discovery.
Discovery where it's trying to see whether or not what kind of

(25:10):
like resources is available on the MCP server.
Could be a bunch of tools that agent can use, could be Proms,
it could be files, it could be awhole bunch of stuff.
An MCP server, by the way, can also ask your agent to do
things. There's a whole bunch of
different protocols that that could happen.
And of course, we have authentication stuff like that
as well. So you always, you know, keep
your processes secure. But after the discovery stage,

(25:35):
the agent on your side, AKA cursor agent or your product
will make a decision whether or not to call it a resource or a
tool from the MCP server. And then you just sort of does a
thing, send back some resources and that's it.
So another going back to where Iwas said, saying earlier, your
job as a park builder or an agent builder, you get a set of

(25:58):
tools available to your agent without you building anything.
You don't have to think about definition of the tool.
You don't have to think about authentication.
We kind of do, but not necessarily to the granularity
as before, but because everything is on the MCP server
side, the service provider is now taking over the performance

(26:18):
of providing those tools for you.
So no longer do you have to think about, oh, I need to
integrate Google search. So I need to spend like a couple
hours building the right tool with the right definition
arguments, Explain those arguments to my agent.
No, just call the MCP server because the team has built that
and they probably build it better than you from day one

(26:38):
because you just saw the documentation.
They've been building this for along time, so I think that's the
right direction for the industry, which means, you know,
less work for you as a developerbuilding products, standardized
protocol everywhere, which is great.
If you need something, you just hook it up.
No need to spend a whole day to build integrations.
And yeah, SCP Server is great and they're always updating,

(26:59):
which is pretty awesome. And I think that honestly, the
biggest thing is that everyone'sadopting it.
That's why it's great. If nobody's adopting it, then
the protocol is worthless, right?
So I think that that was part ofthe biggest thing.
Exactly. And it's it even adds value in
small little efficiencies where one of my first use cases that
adopt MCP for was the GitHub integration where I can be

(27:21):
working in cursor going about and I'm like, OK, done this
issue. What's the next issue, next
highest priority issue. And then it's able to pull it in
and just continue the chat so I don't have to worry about
switching over losing context. It's not a huge deal, but it's
one of those little things whereit's such a small part, a small
improvement, that I wouldn't build a tool for it.
I would just, you know. Exactly.
You wouldn't sit down and build that GitHub tool yourself, but

(27:43):
given this you would use it and you understand the value of it
right away. So MCP is actually a great lead
Gen. for a lot of these like tech companies.
Could you give some elaboration?Yeah.
So let's say like there's a company called Tevilli which
does like web search, web scraping, all this good stuff.
You could spend like 1020 minutes trying to build like a,
a Tevilli integration to your product or you could try it out

(28:06):
and just copy the URL from theirhosted MCP server, paste into
your, you know, agent agent. Like whatever your agent is,
whatever the interface it has. And now all of a sudden when you
at runtime it has access to like20 more tools, all with right
definitions, with the right arguments, probably fine tune
the way so that it always works because whatever you know,

(28:27):
errors other people have using the same MCP server, they
probably would have tweaked it. You get the benefits.
So I think it's just aligned incentive the right way so that
service providers are incentivized to, you know,
promote and maintain the MCP server so that you have a better
experience, which you in turn don't have to do it and you
still, you know, get the benefitof using the tool.

(28:49):
High is right. MCP is becoming more and more
powerful and it's starting to proliferate everywhere.
And if you want to use MCP in your real workflows, you have to
use real data. And that can be kind of scary.
So that's why I've been using Tool Hive.
Tool Hive makes it simple and secure to use MCP.
It includes a register history of trusted MTP servers.
It lets me containerize any other server with a single
command I can install in the client seconds and secret

(29:11):
protection and network isolationare built in.
You can try Tool Hive as well. It's free and it's open source
and learn more a tool live dot dev.
And back to the conversation with Hi, I would like to dive a
little bit more into Cloud code.We've had an episode where we
did an advanced version of it, and I find anything in the
terminal is inherently intimidating for people who
aren't used to working in the terminal.
I still remember back when I wasat school for CS and the first

(29:33):
real interaction with the Terminator, like, oh wow, I feel
like I have such superpowers because you're engaging with the
computer on such a deep level. But for people who don't have
that same experience, it's the little black box with just a
blinking cursor and it can do absolutely everything.
But you don't really understand how Quad code has been one of
the most like widely accepted coding tools.
And it's turn to like integrate into other things.

(29:54):
For example, I use an Obsidian. I'd love to know how you use
Quad code. If someone's brand new to it and
they're not really comfortable the terminal, what they can do
to kind of gain that confidence to to use it because it's such a
valuable tool that I really wantmore people to give it a try.
For sure, yeah. My advice is that you can treat
it as chat CPT That's why it's not, you know, intimidating it

(30:15):
at all. Like if you don't know
something, you can literally askit, like, what can I do?
And clock code has context of what clock code can do, like
clock code knows what clock coderules files means like clock dot
MD means. So you can literally ask it like
what can I do with you? And like, can you help me do
this? And that it can also help you

(30:36):
like navigate the terminal as well.
If you run it either in interactive mode, which is just
type clock terminal spins up like a nice orange terminal
interface, or you'd run it headless, which is, you know,
yeah, I think you just type dashP and then it'll just not spin
up the interactive interface andjust like run it in your
terminal by itself. But yeah, I agree.

(30:57):
Like when I was first learning how to code, the terminal is
probably the most intimidating thing that I ever see in my
life. Like I always like was so scared
of typing in the wrong commands from like tutorials that if I
type in the wrong thing, I know I had to delete the whole thing
and start over because I have noidea what I did.
And all these like locks just like stream out on the screen

(31:18):
drives me nuts. But in on the market, there's
this thing called, there's a tool called Warp that is kind of
like, you know, the AI terminal as well.
So you don't have to remember any commands for your terminal
to work. You can just type in natural
language. It'll just do it for you.
So I think the yeah, so me and my friend had a conversation the

(31:39):
other day about why people like clock code, even though it's in
the CLI. And this plays really well into
this tweet that I saw recently from Swix as well where she says
like there are people who are using clock code to automate
like go to market or marketing like automations and like what
like clock code. But I think for me personally,

(32:01):
so for me, I use cursor 24/7 forthe whole year 2024.
And once I switch to clock code,I couldn't explain it at first,
but I when I use clock code, I code more.
When I say code more, I mean like probably prompt more or I
work more and I don't feel as drain at the end of the day.

(32:23):
And I think the other day I realized why, which is that when
you're looking at clock code, you're looking at one thing like
your terminal. You're not typing in something
on the right and then you're doing something right away on
the left with your file and blah, blah, blah.
So you're not stressed out club code, you give it a task and you
have to sit there and wait or you spin up another one.

(32:45):
But it's not like it's not superduper meanergic where it's like
you have so many things going onfor cursor, for example.
It's a cope. It's almost like a copilot UX,
right? You have stuff on the on the
side and you, your, your real stuff is in the middle.
So yeah, you're, you're thinkingreally hard on like the the
prompting stuff on the side, butthen you switch right back into

(33:06):
looking at your file and they keep like grinding away.
So for me personally, I enjoyed using clock a lot because it it
brings me joy and the little animation kind of like keeps me
going as well. So I don't know about you, but
that's that's how I feel about it.
Nice, yeah, I like it. I still use cursor a fair bit.
I would say I'm a pretty even split, but along the same lines
of of your justification, where if I want to fire and forget,

(33:29):
give it a task to create an artifact or like more recently,
in my opinion, I've been having to create dashboards for me to
combine with data view. Incredibly powerful.
So you can just have it start going through all your notes and
starting to find patterns and beable to elevate up the things
that you care about. Create a little dashboard.
It's nifty. I love it.
And I can have that going when meanwhile with my day job, I'll

(33:51):
be using cursor because I do like to read and audit and, and
make sure that the code is up toa certain standard.
And it's almost like the difference between me being the
code checker, like the proofreader versus just having
my little assistant buddy go offand do a task for me.
But I've heard other people who,who along the lines of you just
like completely migrate over oneway or the other.

(34:12):
And even the people who, who think cursor is too opinionated
with the way they do indexing stuff, they'll use VS Code with
a plug in like root code augmentcode.
There's, there's AMP, there's a bunch of different ones that you
can kind of plug into it. So what's awesome is just how
many different options there arefor people to kind of find their
own flavour, see what works bestwith them.
And I think that just goes back to people needing to just play

(34:33):
explorer, have fun with it, see what works.
You might fall in love with the first thing you use, you might
not. For example, you brought up
Warp. I tried Warp, I didn't like the
way it was outputting and probably just because I'm more
of a, a terminal purist. So I end up making my own tool
with the help of AI where in theterm, if I just type AI space
and then any natural language query I want, it'll generate the
terminal syntax for that. So I can be like, you know,

(34:55):
check on my Docker container or something, It'll be like Docker
PS And then it's just it, it allows me to not have to
memorize syntax, but still get the capability.
But for people who don't want tomake their own thing, we have
tools like Warp they can download and I think there's one
called Wave that's open source, doesn't quite have the same
Polish as Warp. I think Warp got a massive
underground lately too. So I think it just comes back to
people should experiment and figure out what works best with

(35:16):
them. But have the courage, just try
new things because like we said with the terminal, it can be
intimidating because you don't really know, but just get clawed
or chatch PT running and just ask it questions and they can
help you guide you through the process.
Are there any other tools along those lines where people might
find them intimidating, but you find them useful and you kind of
just have to experience it for yourself to get into it so?

(35:36):
Recently I've I've dived pretty deep into the the whole like CLI
terminal agent coding agents andthere's a whole bunch of them
there's like, you know, Gemini CLI, right?
There's like open the eye codecs, there's Quinn code,
which is a fork of Gemini. They look the same.
And there's like cursor CLI, which it's actually pretty cool,

(35:57):
but nothing is as as polished asClaw code.
But another thing that I'm really interested in for these
kinds of Asians that are in the terminal was that they were,
they are built whether or not onpurpose to fit inside of like
existing automation work flows. Because for example, I, we
talked earlier today about Claw code having this thing called a

(36:18):
Claw code SDK where it can run without the interactive mode and
just kind of like be a note inside of your, you know, get
help action or your bash script or your even your product as
some of the companies that I work with here actually does.
And cursor CLI also does the same thing where it has a

(36:39):
headless mode and just run in the background.
So I don't know about you, but Ipersonally was very curious
about this. So I went and built a couple
different projects. One of them is called CLI on the
cloud. Essentially what it does is it's
a Next JS app. You click on a button, you can
create a sandbox, You can install a clock on the sandbox

(36:59):
and then you can chat with it. And then you on the surface it
looks like ChatGPT, but it's actually clock running on a
sandbox talking back to you. So it's very expensive, way more
expensive than talking to like achat bot.
But the sandbox has Internet, Ithas everything.
It's gonna say food and water. It has like all the tools.
And you know, you can give it basically you can, you can tell

(37:22):
it even simple stuff, but also very complex stuff like deep
research. But you had to wait for a while
and it'll run a loop and it'll give you a final answer at the
end. But there's an entity that's
running on the sandbox, not in your computer, right?
So you can think of the possibilities of like, OK, not
just one clock code, but ten of them, 100 of them running
simultaneously in different sandboxes, doing different

(37:43):
things. So now all of a sudden you have
like an almost like an army of agents.
So I think my, my dream back then was to build something that
is super close to Devon, which is like the AI software engineer
that you talk to on Slack. I wanted to build something
that's like that. And now like the technologies
and the services are so available that we can build that

(38:04):
so easily with state-of-the-art kind of like agentic workflows
and agents and sandbox that takes like 3 lines of code to
spin up. So yeah, like Crystal CLI,
probably like another one that Iuse sometimes to use cheap T5,
because with clock code you haveto use, you know, entropic
models. If you're not into that, you
know you have to use a differenttool essentially.

(38:25):
And just in the notice of CL is,I still think back to the open
interpreter days and how that was initial like computer use
like that. That was one of the first ones
that did proper computer use. And it was all CLI and
incredibly expensive at times because it can go off the rails
and and steer you down the wrongpath.
But it's just really interestingthat we're giving these systems
powerful tools to be able to accomplish certain tasks, which

(38:47):
brings it back to whole agent conversation, but along the same
veins. I'm kind of curious your
thoughts on browser use agents because having something that
runs local on your machine can do tasks, write code, execute
code opens up a ton of possibilities.
But one of the massive benefits of computers is the Internet.
So being able to not just query the Internet, but browse,
interact with the Internet in the same way that humans do,

(39:09):
similar to the reason humanoid robotics are going to be
successful because the modern world was designed for the human
body. So human robots can kind of just
like plug and play in at least during this transition period
before we get a lot more robust API setups and, and and what
not, we're going to be having these browser use agents go.
So do you want to touch on that a little bit?
Have you used any that you like?Are there some promise there?

(39:31):
Do you think it is just going tobe like a transition tech or do
you think it's going to be something that sticks around for
a? While so agents running in like
an environment is kind of like my one of my most like curious
topic. So there are agents running in
Minecraft, right? Agents running in using computer
like, you know, like opening eyes, computer use.
Back in the day we had open interpreter and browser use

(39:54):
agents, which I think are phenomenal.
So I teach a boot camp on Maven and what I've been doing in the
in the past month or so. Is building an open source
project where I can use it to create my students homeworks.
So right now what it can do already is it can take in a list
of projects submissions from my students from Notion and then

(40:17):
deploy 1 sandbox and then just clone fifty of those into that
same sandbox. And then I would run like a
utility function through each repo to concatenate all the
files together into one big string.
And then I would just pipe that to an LLM to be like, hey,
here's the entire repo, Get thisperson feedbacks based on what I
would normally say, for example,if they're using Python to write

(40:39):
requirements on TXT, use something like, you know, UV,
stuff like that. So that would do like the first
pass of waiting for me and it works out great actually.
And I can also deploy an agent in that sandbox to kind of like
roam around and do certain things, which is pretty cool.
So the next thing I'm doing right now is actually take it to
the next level, which is sometimes students deploy

(41:01):
application on the web. So re cell, you know, whatever,
right? Their application is live on the
web, they can go and try it. Every time I try it, it takes me
like 5 minutes. But what I can do is I can
create a headless browser, go tothat website and then use a
browser use agent to based on the instructions that student
gave on how to use the app, actually try it and see if it

(41:23):
works. So we just merge.
I just merged this first PR today on stream where I added
this, this service called Browser Base and a browser
control framework called Stagehand, which is also built
by Paul Klein, the team at Browser Base.
So what we can do now is we can grade the students code and then

(41:44):
we can also grade the season's output, which is the their final
project deployed on whatever service that is online as if I'm
looking at the project. Obviously I'm still looking at
them, but I'm just really happy that there's an agent out there
that can go to the students homework, click around and tell
me if all this project is broken.
It promised me that if I click on this I can sign up, but I

(42:05):
couldn't. So now I know that piece of
information I can go in and investigate with the student
much faster. So I think browser use agent has
more obviously more capabilitiesand use cases than just somebody
like me trying to grade coding assignments.
For example, web scraping. So web script being

(42:28):
traditionally used to be like anautomated thing and you know,
you have to worry about proxies,the IP and stuff like that.
But you also have to worry aboutlike how where to click on the
screen. How OFT like how do you like
move the cursor a certain way soyou don't get detected by bot.
Obviously this kind of stuff is like falls into like this great

(42:48):
area of like, are you trying toohard and trying to look like
human, you know, are you trying to stick around for rules and
the regulations? But the technology on the
technology side of things, I think that being able to give
this the agent the full context of the website, not necessarily
the Dom tree of the of the website, It's so valuable
because it can make decisions onlike how to use it because

(43:12):
otherwise you as a person, if you, if your Google's can't, you
cannot just Google search your way through a website, right?
You have to look at the website and then figure out what buttons
to click on because websites arebuilt for humans.
So I think browser use agent really helps with processes or
tasks that require an agent to interact with this web world

(43:34):
that was traditionally built forhumans and very visual, very
gooey heavy. Yeah.
So if we if if we can nail browser use agent, there could
be a world where we don't need as many AP is anymore.
Because to a human, the website,the landing page is the API for
the human, right? If we can use that just like a
human that we don't need, you know, an API for every single

(43:55):
service out there. So that's my just my $0.02.
What do you think would be the use of these browser agents for
someone in their day-to-day life?
Say they are working a desk job and they don't feel comfortable
going as deep as like writing a program or something.
But they're like I do the same task every single day.
I need a a tool to help solve this.

(44:15):
When, what, what under what situations?
Is the browser use agent the right tool for that job?
I think the browser use agent isthe right tool for every single
job. Well, that's for the most jobs
out there, right? The reason why we can't just
have Jack GBT do everything for us is because they're always so
many tools that they gave the agent and each of those tools

(44:36):
requires A partnership to get like an API endpoint going.
But with an interface, that is because like, no, humans is bar
from using the web, but AI is bar from using the web
everywhere, like programmatically, right?
If you want permission to use something, you need to ask for
it. You got to go through all these

(44:56):
things to build integration and stuff like that.
But humans aren't like that. If you want to go and look up a
database from the government, like you just go and look it up
and you scroll around and you'llget the data out.
As an AI, unless there's a public API endpoint for it, you
probably will have a hard time getting all the data out in a
clean way. So if let's just imagine like

(45:18):
one day you be working, you know, on something and then you
realize that you had to book your tickets to go to Greece or
something. Right Now today you go and
ChatGPT, you can ask it to do it, but it'll ask you all kinds
of questions of like, oh, can you give me permission to do
this? And you can also like, which
website do I go to? I only have access to like

(45:38):
TripAdvisor and like some other service because that's all the
services I'm allowed to use. But for a browser use agent, it
would just maybe hit up Google first, see whatever is the
cheapest and it just keep going and keep going.
Whatever website you can hit just like as like a person.
There's no, there's nothing stopping it.
So we can complete a lot of tasks for you seamlessly.

(46:00):
I don't know if we're there yet,but you know, that's I think
that's where most people thinking heading towards because
we saw the open the eye launched, forget what it's
called, but it's like computer use, you know, kind of like
agent thing in Chachi BT where it's like you'd give it a task.
It actually spin up the sandbox,the computer or a browser.
And then the you can see the agent like doing stuff on the

(46:24):
browser or the the screen with like thinking notes and all that
stuff. And you can leave that and go do
something else. And browser use just allows the
agent to do more, way more than like our regular kind of React
agent where you're given tools and the tools had to be built a
certain way, like the world is in our Internet, which is

(46:47):
accessible through browser. And if the AI can use the
browser properly, it can't access to pretty much
everything. And I learned something new this
morning as well, a little thing that I noticed when I use
browser base, which is the service that you can use to spin
out these headless browser environment for agents.
You can watch the session where the agent is tinkering with the

(47:07):
website and clicking around, scrolling around.
And it's not like a stream of a video stream either.
They're actually streaming the Dom tree of the website out from
their sandbox to the front end. That's why everything looks so
clean and 4K in real time. And I'm like sitting there.

(47:27):
I'm like, is this using a lot ofmy bandwidth?
Because it's is this looks 4K like.
I don't believe it, but it turnsout that it, it reconstructed
the entire browser in your little iframe on your browser to
see what the agent is looking atand doing from all the HTML from
the original website that agent is on right now.

(47:48):
And that's why it looks so cleanin real time.
And I think when you try it, you'll, you'll know because the
world just opens up. There's so many possibilities,
right? Waiting homework is one thing,
but like you just spin off 1 to be like, OK, go like somebody
give you their website and be like, tell me about this

(48:08):
customer. Like you can tell your browser
agent to go and click around andjust find out like it's pretty
great. So yeah, I think this technology
is going to go really far. And I think very soon, very
soon, a lot of the products thatwe use will have some sort of
like, oh, we're going to spin upthe sandbox.
We're going to drop an agent in there because now the agent can

(48:29):
be let loose and like use the computer.
I think. I think that's where we're
heading. One thing I want to make sure we
touch on before I run out of time, you actually brought this
to my attention since we've beentalking so much about agents.
You, you brought up the idea of a deep agent, and I actually
hadn't heard the difference between a shallow agent and a
deep agent. I always heard of just the
different, almost like verticalsor specialities of different

(48:50):
agents. Could you just tell us quickly
what makes an agent a deep agentand why people should care about
that? Harrison Chase from like, Chain
has a knack for coining things very nicely.
So I think he coined the term deep agent.
And essentially deep agents are just, yeah, regular agents, but
run for like 10 minutes, 20 minutes, thirty minutes, an

(49:11):
hour, a full day or even longer.And we've seen in the past that
there are benchmarks where big AI labs would run their models
for like hours or days and then the score went up because it was
able to, like, fix itself and think for longer use, more
reasoning tokens, you know, etcetera.

(49:32):
So deep agents came from the world of coding agent and deep
research agents because those two things run very long.
When deep research and open AI came out on chat CPT first time,
people were running for like 20 minutes and the results were
great. Like it's actually a good
report, right? And some stuff like, you know,

(49:53):
you know, software engineer on software, the AI software
engineer that replaced the entire like software engineer
and like, like Devin where you can only talk to on Slack.
Those things run in sandbox and they run forever.
Like, and it costs a lot of money too, as you would imagine,
But the, the, the trade off there is that money, but also
you get a lot better and more indepth output because it was able

(50:18):
to do more, a lot more and checks itself a lot more.
So deep agent essentially just means it runs a long time and it
usually plan. It always has to plan because if
you don't plan and you're tryingto run for an hour, you'll be
lost, right? So there's going back to like
the whole, you know, like to do this and clock code and all
these patterns agents that plan has a bunch of like access to a

(50:40):
bunch of things and run for a very long time equals deep agent
equals very in depth and and high value output.
I think I have to stress this part of high value output.
It's almost like you work an hour, you produce something of
value to the company. This is kind of like that.
It's like you run this for a long time and the outputs no

(51:01):
longer just, you know, a file ofcode change.
Like when you do a one off thingon cursor, it's like a whole PR
and you know, a whole set of features being built or like the
whole code base refactored or like again, like going back to
the more business use cases likea deep research report.
So these things again runs long and high value output.

(51:24):
So yeah, I think this is like we're getting closer to
replacing people because now it's running longer.
I feel like these models, these tools running like a one off
thing where we like we send a message in, we get something
out, still makes them feel like kind of like, you know, fake,
kind of like robots. But like if you just run

(51:45):
something in the background and it's providing value in the
economy, that's not like a person.
Like there are millions of people working right now that we
don't know, but then we get the benefit of their work, right?
And we don't have to trigger them.
I mean, obviously they had, theyhad to be motivated to go to
work, but we, we don't know thatthey're working right now.

(52:08):
And like, we don't have to like keep pinging them, but hey, like
start working on the next thing because they are already on the
next thing. So it's very close to like
being, replacing like people, which is both exciting and
terrifying at the same time. Hi, this was a ton of fun.
I am thrilled for your upcoming Maven course.
Can you please tell the audiencea little bit about it and
anything else that they should be aware of and keep up with

(52:30):
you? Absolutely.
Thank you. So me and my friend Mary, we
are, we have been teaching over 150 people on how to use like
different AI tools to ship products, faster AI products.
So like agentic workflows, agents, all that stuff.
And we, we teach people these concepts.
So like context engineering, deep research, MCP, clock code
browser use agent, deep agent and many more things, for

(52:52):
example, evals. And our next cohort is going to
be September 14th. So September 14th to like the
end of October. So like 6 weeks.
And it's very intensive. I'll be live streaming every day
and you get access to me. So I will talk your ears off
about these things because I can't help myself when I'm like,
you know, curious about certain topic.

(53:14):
So, yeah, so it's going to be about 50 people, pretty limited
seats, but we're on Maven and for the first time, so shout out
to the Maven team for like, you know, chasing us for the longest
time. And then we're finally on it and
we love the platform. But yeah, so like if you're
looking for a community of people using AI tools and also
like building products fast, ourgoal is for you to go from, you

(53:36):
know, oh, I don't know if I could build this to like, oh, I
could build this and everything I can think of by being in the
boot Cam. So yeah, that's that's our
mission. Thank you for listening to this
conversation with High Neum. I really hope that you were able
to pick up some more of the fundamentals and approaching A
tooling's a little bit less intimidating.
If there's anything else you want to learn about it, please
let me know. I want to make sure that
everyone watching is able to leverage tooling to the full

(53:59):
capacity so we can really start accelerating together.
I really enjoyed having High on ever since I met him at AI
Tankers Ottawa. We always have a great
conversation and he teaches me something new, so I want to try
to bring that to you. I want to give a quick shout out
to Tool High for supporting the show so I could have
conversations like this and I'llsee you next week.
Advertise With Us

Popular Podcasts

Stuff You Should Know
Medal of Honor: Stories of Courage

Medal of Honor: Stories of Courage

Rewarded for bravery that goes above and beyond the call of duty, the Medal of Honor is the United States’ top military decoration. The stories we tell are about the heroes who have distinguished themselves by acts of heroism and courage that have saved lives. From Judith Resnik, the second woman in space, to Daniel Daly, one of only 19 people to have received the Medal of Honor twice, these are stories about those who have done the improbable and unexpected, who have sacrificed something in the name of something much bigger than themselves. Every Wednesday on Medal of Honor, uncover what their experiences tell us about the nature of sacrifice, why people put their lives in danger for others, and what happens after you’ve become a hero. Special thanks to series creator Dan McGinn, to the Congressional Medal of Honor Society and Adam Plumpton. Medal of Honor begins on May 28. Subscribe to Pushkin+ to hear ad-free episodes one week early. Find Pushkin+ on the Medal of Honor show page in Apple or at Pushkin.fm. Subscribe on Apple: apple.co/pushkin Subscribe on Pushkin: pushkin.fm/plus

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.