Building Agentic AI Workflows with Matthew Henage - JSJ 678 - JavaScript Jabber

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:05):
Hey everybody, welcome back to another episode of JavaScript Jabber.
This week on our panel, we have Steve Edwards.

Speaker 2 (00:12):
Yo yo yo to imitate AJ coming at you from
a cloudy bit warming up Portland area.

Speaker 1 (00:20):
I'm Charles max Wood from Top End Devs and this
week we're talking to Matthew henij Now, Matthew, you lived
near me as far as I remember. We were introduced
by my neighbor and we had a good talk about
AI and I thought, hey, let's let's have you on
the show and dive into this stuff. So you want

(00:40):
to let people know what else they ought to know
about you and then we can roll from there. Sure.

Speaker 3 (00:46):
Yeah, it's great to be on the show. So my
name is Matthew Henich. I'm full stack developer professional last
twenty years, but doing answers.

Speaker 1 (00:57):
I was a kid so about thirty years ago.

Speaker 3 (01:01):
Always love jumping into new technologies and so it's working
on a project that started working on called wows dot
AI about three years ago, and and then chat JPT
three point five came out about two years ago and
it just blew my mind. Saw a lot of huge

(01:22):
utility from it and got super involved with starting using programming,
using AI. So, yeah, I'm also from I'm from Lehigh, Utah.
So yeah, it's great to be on the show. Thanks
for having me.

Speaker 1 (01:39):
Yeah, thanks for coming. Yeah. So when he says Lehigh, Utah,
that's also where I'm at, I'm same town. So anyway, so, yeah,
you were showing me when we talked before about your system.
What is it w AOS? I don't even know what
that stands for.

Speaker 3 (01:56):
Yeah, stands for a Web app operating system. It's a
way if using low code tools to build web apps
and then you can create AI workflows that basically control
the web app in real time.

Speaker 1 (02:12):
Yeah. It kind of reminded me a little bit of
what's it called zappier, except put prompts in instead of
connecting to APIs for different products. That's a good way
to go. Yeah. Yeah, So I guess just to dive
in because at all the coding meetups that I go

(02:32):
to anymore, everybody's talking about AI. They're excited about it,
They're excited about what you can do with it. A
lot of people are, you know, diving into different aspects,
you know, whether it's generating text or images or videos
or anything like that. I'm a little curious to just
I guess kind of get the state of the art
as far as you see it of AI and how

(02:55):
people might or might not be using it.

Speaker 3 (02:59):
Sure, Yeah, Yeah. So it's kind of interesting because AI
is concept has been around for a very long time,
and kind of what a lot of people are now
referring to with AI is more of like generative AI,
so using different types of architectures like auto regression, which

(03:19):
is kind of what l MS use, like CHATCHBT or claude,
and then you have other kind of gener of AI
like diffusion models, which you might see something like mid
journey type kind of use there, and other kinds like
gams and.

Speaker 1 (03:39):
Type so.

Speaker 3 (03:42):
Kind of those are kind of some of the more
popular ones. And so it's kind of interesting because when
I first started getting into AI with llms, with like
CHATBT three point five, it's one of the things I
kind of thought about using it for is kind of
like a a universal API where you can ask it

(04:04):
to do anything and then have it basically come up
with the answer. Kind of find out pretty quickly that
that's not really a general little use case. It can't
do everything. It has advantages and disadvantages. It's kind of
interesting A lot of people. There are a lot of
businesses that are kind of just a wrapper over over

(04:24):
top of something open AI and so which a lot
of times you can just use a chat like app
and really get a lot of the use out of
it that way. Some of the advantages of some of
those kind of applications are they put a lot of
thought and effort into the prompt engineering that's basically a

(04:47):
way to explain to an AI like how it should
behave and what rules it should follow, and so that
there's a lot of utility out of that. But I
think one of the big things that we're seeing a
lot of move towards is AI agentic workflows. And there's

(05:09):
different kind of names with that. You might hear like
AI swarms, like agent swarms or agent teams, and where
the huge benefit from that is instead of using like
one agent, like if you have an AI agent. There's
a lot of different definitions for that, but one way I
like to look at it is is an agent is

(05:32):
something that you can give a prompt to and you
can get some kind of expected response from. And so
if you're using something like CHATBT, you can give it
like a task or asking a question and it will
give you a response that's like one agent and it's
more could be more general general. An agent swarm or

(05:54):
team or like a workflow is a way to be
able to have multiple different agents work together to solve
a problem or perform some kind of task. And so
there's a lot of advantage of that, just like having
maybe one person in it for company that has to
wear all the different hats, versus having a team of

(06:17):
people working in a company to perform some kind of task.
So you have different people that are specialized at different
kind of roles, and each one of them does really
well and what they do, and it comes together as
a collective to provide the most value. You can do
the same thing with a AI agent swarms or teams

(06:38):
or workflows, whatever you want to call that. An example
of that would be a project I used with WOWS
created completely within WOWS is something called speak Magic, and
you can check out the examples of that Speakmagic dot AI.
And the idea is you can give one prompt and

(06:58):
then have an agent swarm of like forty two different
agents that basically take a story input, like you just
give an idea. Maybe you have ideas for a character
or something like that, and it'll create up to a
two minute video of these different agents working together to

(07:19):
create like a scene. And so it'll take the story prompt,
it turns that story prompt into scenes. It's turns of
scenes into scripts. It turns those scripts into shots, and
then you have like different characters that are basically acting
out the scene, and animates the characters like speaking to
each other, and they could add like sound effects and

(07:41):
music and different things like that to create the create it.
So there's a lot of advantages of that. And so
it's kind of like where I see things are going
kind of the state of.

Speaker 1 (07:53):
Yeah, yeah, it's it's interesting because I talk to different
people and it seems like everybody doing something different with AI.
So you've got kind of like you were saying, where
you've got people who are you know, they're trying to
make a story and then they're maybe they're making it
into a story with the video. Right, most of the
people that I've been talking to and working with, they're

(08:16):
using it kind of you were talking about agentic AI
and the you know, kind of building out your team
to do different things, and that's kind of where my
interest has been, except it's more of a chat agent
and less of a voice or a video agent. And
so yeah, you give it a prompt and then you
also give it a set of functions that allow it
to do things, and then it can go off and

(08:38):
it can do the things. But what I'm finding most
people do with that is, like you said, they have
a team of agents. And so you may have an
agent that is kind of the coordinator or support agent,
and then you know, it can go and talk to
the scheduler agent to get stuff on the calendar, and
go talk to the technical agent to you know, get

(09:00):
more specialized technical feedback, or it can go talk to
you know, another agent that has access to different other
APIs to do other things, and and so that that's
kind of the deal. And so then you get into
using something like model fusion for JavaScript or I've been
doing a lot with ray r ai X and Ruby
to do a lot of this stuff. And you just

(09:21):
you write a tool, and a tool is essentially that
set of functions and and you make stuff run. And
so yeah, one of the tools might be here's the
video generation API, and you know it also uses a
AI to do its work. And so anyway, it's it's
really fascinating kind of see where it's all going. At
the last code meet up, I went to one of

(09:43):
the guys there had actually been using it to write fiction,
and so you know, he'd use it to flesh out
parts of the story or actually write parts of the story,
and you know, he's like, yeah, sometimes it's really good
and sometimes it's really not. It's been, it's been. It's
interesting to to see how far it goes. What I

(10:04):
tend to find is that if you do a lot
of people want like the one off prompt where you
just write the prompt and you immediately get back the
feedback that you want. What I found is that in
a lot of cases you have to refine it with
the AI see what you do wind up doing as
you wind up saying and I forgot to tell you this,
or oh you know, I need you know, I need

(10:26):
this scheduled, you know, every Thursday, and then it's okay. Well,
when I said every Thursday, I actually meant, you know,
except holidays and this and that and the other, you know,
or if I'm trying to get it to write code.
It's so one example of this I was using lately.
I've been playing with rock but this one was on
chat GPT and it you know, I said, hey, I

(10:48):
need an audio player for my website. By the way,
if you go to topendevs dot com, if you go
to JavaScript jabber dot com and you the player at
the bottom of the page is the player that Ai
mostly wrote. And I said, hey, I need an audio
player for the website. You know, I need this, I
need that, I need this, right, And so it's like
I want a volume, you know, I want to be

(11:09):
able to change the volume, and I want the progress
bar to go, and I want it to there's a
state of the there's a state of the art thing
for podcasts where you tell it to not load the
audio until it's actually clicked. So until you hit play
or download right, it doesn't it doesn't eagerload it. And
the reason is is because the the metric DuJour for

(11:33):
a long time with podcast is downloads, and so you
can actually pad your numbers by not telling it or
by by forgetting to tell it not to download every
time it loads the page, right and so right, But anyway,
so I told it, and then I actually asked it
what other features should I put in? Right? And so anyway,
we're seeing this kind of thing with a lot of

(11:55):
different people. So some people are willing to go in
and use the web interface on something like this or there,
what is it? Open web. There's a there's a web
interface that you can run on Olama on your own
machine to do a lot of these things. And so anyway,
there are a lot of options for this stuff, and
some people have refined it so that it will automatically

(12:15):
use tools to go and make web searches and stuff
like that. Or if you use grock will it'll tell
you that it's thinking, and it'll you can see that
it's loading in different pages and you can ask it
for its sources. But yeah, it's it's been fascinating to
just see where all of this goes. And then of
course they're the specialized uses like Cursor, AI and things
like that for for programmers or you know, other AI

(12:39):
systems for other folks, and so yeah, this guy's kind
of the limit for it. I think.

Speaker 3 (12:45):
Yeah, it's such a broad range of different things that
AI can do. We're talking about using things like Cursor.
There's also other tools that are more like coding agents,
so like cloud code. I don't know if you've seen
Mannish kind of hit seen it a week or two ago.

(13:06):
That one's been pretty pretty amazing what it can accomplish.
But just using I typically most of my coding I
use is I just open up a chat and I
kind of explain a specific feature that I want and
then kind of treat it like a junior dev in

(13:27):
a way. I think that's kind of one of the
best ways to treat it. Kind of give it all
the different use cases, what you're what I expect going
into it, and what should be coming out of the code,
and and why and so, and then of course you
need to check the code make sure that it's going

(13:49):
to do what it wants. It's kind of interesting because
we'll see I think we're going to see more and
more where AI can do more and more coding. Some
of these things like cursor are getting better, but typically
the larger of your code base, more complicated get things,
get it kind of runs out of context window and

(14:13):
right starts kind of not performing quite as well. So
usually smaller projects are really great. And that's changing though,
so which is well, been all fun, you.

Speaker 2 (14:26):
Know, it's interesting. I was listening to a podcast this morning.
They were talking about the topic of vibe coding, which
is the idea that using stuff like yeah and so
it was syntax FM and so what they are The
general definition is someone who wants some simple app for
some business purpose or the examples were given like I'm

(14:46):
trying to build a little game with my kid and
doing it the regular way takes forever in a day,
and or a way to do a quick demo of
something to see how something would look or things like that,
and then you're not it's not something you're going to
be using, you know, long term professionally, you know, deploy
it and reuse as sort of a one off type
of thing, and so, you know, usually with this kind

(15:08):
of stuff, the code quality, from what I've understood, is
how do we say less than desirable? But the idea
is that you can spin up something really quick, whether
it's just you know, to see how something would look,
or to give you an idea of, uh, you know,
how something could work, and then you could you know,
tweak it from there or do something from there. But yeah,

(15:29):
like I said, vibe coding is apparently one of the
new term all the cool kids are using.

Speaker 1 (15:36):
Yeah. I'm not sure where the line is though, between
vibe coding and actually just having the AI help you right, right,
Because Yeah, at the end of the day, like in
my example with the player, right, I wound up having
to it it Probably seventy five percent of the code
that the AI wrote I used, But the rest of it,
I mean I had to use my own expertise to
make it do what it do and look how I

(15:57):
wanted it to, right, it didn't give me exactly what
I wanted. So yeah, I don't know.

Speaker 2 (16:02):
So Matthew, let me ask you this question. You know,
when it comes to this auto this uh auto generation coding.
My question that I've always had that I've never had
answered that I've never because I've never looked into it
is what languages and tools are being used? I mean,
how do you determine that? Like, you know, my focus
of development, I tend to focus with letterbl and view
and tailwind and inertia.

Speaker 1 (16:24):
And some other tools like that.

Speaker 2 (16:26):
And so if I'm going to tell AI to build me,
you know, some little to do app, you know, to
beat that one to death or even something more complex,
do I say, Okay, I want you to do it
with this framework and these tools in this language, or
do you just say build me an app, and it
builds one for you out of what it determines to
be best. How does it determine that? I mean, what's

(16:47):
what is the structure that is used to generate uh,
these apps that AI is building for you.

Speaker 3 (16:53):
Sure, So I think there's a lot of different ways
to look at this. One is I mean, if your
code of yourself, I mean, you're gonna want to stick
with the things that you know to some degree so
that you can you can edit and and kind of
understand why why things are going wrong when they do
go wrong, so that you can fix things. And so

(17:15):
typically a lot of things when it comes to programming
with like using AI, like models and tools and things.
There's a lot of things that are geared a little
bit more towards Python. It's kind of the whole, the
whole kind of with with things, and so you have
a lot of tools and stuff that are kind of

(17:36):
more geared towards Python. But if you're having it generate code,
typically how I mean how lms work As you're pulling
information from a whole bunch of data from like the Internet,
and so the more the more data there is for
a particular technology, the more likely it can perform a

(17:57):
bit better. And so things like having like JavaScript one
of the nice things a lot of times when I code,
and having like AI help out with the code using
something like Chatchyp's artifact or canvas and being able to
have it generate code, but it can actually render that
code for you, which I don't think it can really

(18:19):
render anything else besides really JavaScript right now. And so
that's really nice to build you something like that in
that kind of case. But when it comes to the
different technologies, I mean, it really comes down to I mean,
I liked sticking with the kind of the languages that
I use, React node in the back end, and so

(18:45):
using kind of the technologies that you can help help
steer it in the right way or connect it to
your existing app, I think is kind of the more
the way they go. Can I see right now, if
you're going to go more obscure kind of languages of technologies,
you're you're more likely to not get as much support
from from AI helping out. One way around that is

(19:10):
is getting the context for those technologies, like say if
an ELM just doesn't really have much UH knowledge based
on that, go into like the documentation for that and
like giving that documentation to the relevant pieces of the
documentation to t A I to help you with that.

(19:30):
So it's kind of a way to ram as well.

Speaker 1 (19:35):
So one one thing with that is because I haven't
really run into that. You know, most languages I use
are fairly mainstream. I think Steve's in the same boat
with PHP and JavaScript. But if you were using an
excure obscure language, so maybe Elixir or I'm just trying
to think, like, how how far obscure can you get self?

(19:55):
I don't know ELM, ELM there might or might not.
I guess it probably depends. That's one other thing that
I found is that some models do much better with
certain languages than others. But do you run the risk
of so just to give people a little bit of context.
So there are different sets of data that your LLM

(20:18):
works from. So there's the latent space and that's everything
that it got trained on, right, And you see them
building these big data centers with all the GPUs, and
they're trying to get as much data in as possible, right,
So they come out with a new model, and generally
what it means is that, hey, we've jammed more data
into it. It's more training, and so you're going to
get better answers from it. And then you've got the context,

(20:39):
which is hey, for this particular query or set of
queries that I make to the LM, there are prompts.
You know a lot of times it's part of the prompt,
but doesn't always have to be. You know, here's everything
you need to know in order to give me a
good answer. Right, So you're counting on its ability to process,
plus whatever it's got in the lay and space that

(21:00):
it already knows. Right. So it may already know stuff
about programming in general, programming practices in the latent space
because it's already been trained there. But then here are
the specifics of the language that I'm using, right, Or
here are the functions that are available to me, and
here's how they work. Right. So maybe it's not the language.
Maybe it's an obscure library that it just doesn't know
as much about. Do you run the risk though, of

(21:23):
making that or of going beyond what the context window
will hold? Because the context window is essentially how much
you can tell the LM right when you ask the question.
It's it's all the data can hold. And so for
the layperson. I'm not explaining it to Matthew. Matthew knows
this stuff. But do you run the risk then of

(21:43):
expanding beyond what the what the context will hold, because
you know different models will allow you to give them
different amounts of data in your context.

Speaker 3 (21:53):
Yeah, so a lot of models when you go beyond
the context window, then they just fail and so it'll
just give you an error and that could be very
annoying using some different things. The other there's two other
issues with main issues with larger context as well. One

(22:15):
is the cost. So as your input context goes in
like what you're supplying to it, which could be like
your whole code base, then the cost for each token
generation and it all goes up. And then the other
problem too is as the context windows starts filling up,

(22:37):
the less likely it's going to be able to recall
and to be able to per form as well. And
so that's where when you're making AI workflows kind of
one of the parts is like from the whole solid concept,
you have the single responsibility and I kind of apply
that within each agent within it. And the idea is

(22:58):
each agent that you create needs to be specialized, and
I like to keep the context window around around four
thousand tokens, which is like basically like four thousand words,
but not not quite. And it seems to perform best
in recall as well as be able to fall instructions

(23:21):
with having a smaller context. And so that's one of
the reasons why I like using just like a traditional
just chatbot and having it generate code, is because I
can limit the context exactly what I want, and I
kind of pull and say, okay, I need this new
file that does this thing. These are the inputs coming in.
These are the outputs going out. Maybe feed it like

(23:42):
a documentation for something you'll be using, and trying to
keep that as small as possible because it'll just perform.

Speaker 1 (23:48):
There that way.

Speaker 3 (23:50):
So that's kind of how I see things.

Speaker 1 (23:54):
Yeah, it's it's it's kind of an interesting dilemma, but yeah,
I like the I like the approach that you're advocating
where you, yeah, you break it up and specialize it
because then nothing has to hold too much context, right,
all all your you know, your different things have to
know about the things the other ones do. Is how
to tell the other one to do it or how

(24:16):
to get the data back once it's done, and so
that that keeps your context smaller. Yeah, okay, ahead, So
so one other thing that I'm curious about, because you're
building a tool that allows people to build these workflows out.
So what's that like as far as you know, not
necessarily doing the prompts or putting the prompts together, and

(24:39):
maybe you are, maybe you are doing that as part
of your tool, But what's it like building an AI
tool for other people to use that allows them to
build these workflows?

Speaker 3 (24:47):
Yeah, within using WOWS. It's a it's kind of interesting,
so in the sense that it's a new paradigm of programming,
so kind of kind of bringing what the value of
working with AI and AI workflows. I think we kind
of go back a little bit kind of human evolution. So,

(25:08):
I mean we started with language, which was a way
to be able to pass on like knowledge to other people, right,
and then the concept to be on having a written
language where you're actually able to write down some knowledge
that can be passed on really changed the way we
learned and grew. Then you had like the printing press

(25:30):
that allows it to be distributed really quickly, and that
really changed humanity and that kind of sense too. Then
kind of one of the next iterational steps I kind
of see is is just traditional programming, where where now
like with written language, was a way to pass on
and to distribute knowledge, programming allowed us to capture and

(25:55):
to pass on intelligence, and so so I could take
that knowledge, I could solve a problem, create an algorithm,
and then distribute that out through like a disc type
of thing to other people. And now we have the
Internet to be able to pass that on. And that
was a way to be able to create and to

(26:16):
distribute intelligence in that kind of sense that someone didn't.
Someone could actually just reuse that algorithm to get an
answer or solve a problem pretty quickly. I kind of
look at working with AI workflows is kind of like
another iteration where you're actually building and being able to
distribute wisdom in a sense, so you can create these

(26:39):
agents that can take a problem and have that context
and understand how to apply different kind of intelligence to
solve issues. And so one of the great things in
that kind of sense is one of the things that
AI is really good at is being able to understand

(27:02):
like natural language and understand the intention of what someone
is trying to accomplish and be able to then create
many different ways of like a response. So one could
be natural langages coming back and like specifying exactly how
that solution could be a natural language, or could be

(27:24):
calling a tool to perform some kind of tasks as well.
So in working with AI workflows, the way I kind
of look at it is almost like you're building like
a team of people that would accomplish a task, and
so you divide out the responsibilities between different agents that

(27:46):
will accomplish the task. So a lot of times, like
the first agent that you might make might be something
called like an orchestration agent or a conductor agent, which
basically will take it will take the prompt from the
end user and say, okay, what team of agents should
solve this problem. So if it's just a question that

(28:09):
about the software that you're they're using, then you might
just send it to this knowledge based agent that basically
can give an answer really quickly. Or it might be
like a task. So in the sense of speak Magic AI,
you give like a story prompt and then it says, okay,
all right, this is the very first prompt, So we
need to create a story and we need to divide

(28:31):
that up, so it sends it to this list of
agents over here that then sequentially go through and work
together to build a video basically. And that's one of
the kind of different paradigm shifts as well as I
think we'll see a lot more of is is. Even
though a UY is very helpful and I think we'll

(28:52):
always have a UI for like websites and things like that,
there's a new kind of wave of input of natural
language that we can start working with AI. So it's
almost like having instead of using just a software like
maybe like a SaaS product, it's almost like having an
assistant that you can work with so that helps onboard

(29:13):
you with that that or can even perform tasks for
you within the software. So you say, instead of like
trying to figure out, okay, let's say if we have
a CRM, instead of being able to figure out, okay,
how do I add a new customer type of thing
and what's available here? With traditional kind of like a
SaaS product, you have a real estate that is limited.

(29:39):
So I mean, you don't want to stick a thousand
different buttons on one page and expect someone to figure
out how to use all these different buttons to do
different things or forms and things like that. That would
that's too much information for the end user. But the
cool thing about working with something like that language is

(30:01):
you could have thousands of different tools and tasks that
it can perform for you. It can understand the whole website,
it can navigate you to where you need to be.
It can explain how to use it, or it can
accomplish those tasks for you and fill in those forms
for you, and they could once you fill those forms
or have those kind of filled out, it can look

(30:22):
at those and say, hey, I see that the way
you fill this out is this way, but here's some
more context on tips and tricks to make this better
or maybe just improve improve it for them and then
ask them to confirm, hey, this is exactly what I'm
looking for. So working with it's definitely it's very different

(30:43):
in the sense that with traditional programming, a lot of
times we talk about like pure functions, where this one
input is always going to give you this output. That's
kind of one of the differences working with AI too.
It's a little bit more of a black box where
this input could be, uh, the answer coming out could

(31:03):
be multiple different kind of outputs. So one of the
things I like to do when making AI workflows too,
is anytime anytime there's something that could be done with
traditional programming, like say, if if it's parsing out information,
it's doing math type of thing, whatever, it's typically you

(31:26):
want to kind of steer away from AI, and so
you want to parse the information out from the answers
and then and then have kind of traditional programming actually
a figure out what what the output should be and
then AI use AI usually where where it performs best basically,

(31:47):
and so and because one of the things that you
dealing with AI's hallucinations, and so that's one of the
things we're working with workflows as well, is dealing with
hallucinations and understanding when it's important when it's not. Like
you said, you had a friend that using AI to

(32:09):
write stories, helping out that process, like that's kind of
an interesting use case and something that kind of deal
with with speak magic is because there's not necessarily a
right answer or wrong answer necessarily, but there could be
a preferred answer or a preferred response versus a non response,

(32:30):
and so it could be a little bit more difficult
on judging what is good and what's not. Except for
having a human going and basically say hey, I like
this and I didn't like this type of thing, and
refining the prompts that you use with the agent versus
something like where you might have a specific right or

(32:54):
wrong answer. Then it's a lot easier to to improve
the prompts and then just doing a whole bunch of
iterations and tests and say, okay, this is you know
exactly if this is a correct answer or correct response
or incorrect response.

Speaker 1 (33:11):
It's kind of one of the ways working with it too.
That's a little different too. So when you're writing these
tools for other people, then are you doing some of
that where you're actually trying to like beyond what they
give you, you know, you add other things to the
prompt get them better answers or yeah, so let it ride.

Speaker 3 (33:36):
Yeah, And that's one of the things ISH is trying
to steer it down certain paths that will give you
reliable outputs to what they're looking for. So uh and
so one one of the ways, I mean, there's different

(33:59):
kind of things to kind of look for in that
one is when you getting like a prompt, one example
was speak magic is is okay, we have it create
like a story first, like a summary, and then we
have it actually we actually have it parse out, like

(34:21):
so we have within the story, we might add like
some like H one tags that basically break out the
story and says, okay, we're on act one or something
like that. It actually chooses a story model first, and
then it has these different steps within the story, and
so we're having it kind of guiding it down a
certain kind of framework, and then we split out that

(34:42):
story by like having it add like h ones for
the different sections, different steps of the story. And then
and then we'll have it turn that portion of the
story into different scenes and and then so like one
of the things it'll do is like, okay, One thing
I found is I would have it to do, like, say,

(35:05):
if we're turning it into a scene. I was having
it do too many different things at once. So I
was saying, hey, I want you to write a scene
for this portion of the story, and I want it
to be in this very specific format that then I
can then parse out to make shots from that scene.
And so one of the issues I was running into
is it was having a hard time doing both of

(35:26):
those steps, and so what I ended up doing is say, okay, well,
and all of them is actually fairly good at making
a scene like a screenplay like a script basically because
it has a lot of that information how to do that.
And so I broke that out and said, okay, do
that first, and then I have it's and it's response

(35:48):
of that script to the next AI agent that says, okay,
now let's add the formatting on here, and gave it
all the rules of how to add that formatting to
that script for that scene, and then that allows it
to then parse and say, okay, this is how we
we define all the different shots that make up that scene.

(36:08):
And so then I have JavaScript basically go through and
split all that into different pieces and into different shots,
and then I can have the NEXTAI agent basically parse
the shot out and figure out, Okay, what do we
need to do to create this shot?

Speaker 1 (36:28):
Awesome? So we're using Jason mode because I know some
of these allow you to send the data over as
Jason as opposed to straight text or other formats. Most llms,
I will say, are pretty good about pulling text apart
and putting it back together and figuring out what you want.
The Jason just gives you more accuracy, is what I found. No,

(36:50):
I totally agree.

Speaker 3 (36:51):
So like with speak Magic, I'd say about eighty percent
to ninety percent of all the agents I use actually
using the tool use the Jayson mode and and that
there's a lot of vantages for that. So if it's
writing story, then I just typically I'll just have it
just write the story using kind of just plain kind

(37:15):
of texts. It's usually marked down, formatted coming out.

Speaker 1 (37:20):
And so.

Speaker 3 (37:22):
But with a Jason there's, uh one of the cool
things that that one of the strategies I use with
that is is with a Jason mode, you can actually
like say okay, I want it's basically have a jasent
object that has different properties that needs to fill in.
And so one of the cool things about it is
that actually has an order of how those things are

(37:45):
because elms are sequential, so they do token by token
and so it'll the top. Even though like maybe an
object in JavaScript isn't necessarily has has an order when
it's filling this out, it's actually has an order. So
so one of the things I've done with it is
is have it actually think through step by step of

(38:06):
how to approach a problem and maybe several of these
properties that's filling out. It never really ends up using.
But it's a way to force it to think like
lineally in a certain way that prepares it to give
accurate answers, if that makes sense. So it's kind of
like a form of like the thinking model that you

(38:26):
see things like, oh, one model that kind of does,
but you're you're able to then specifically guide it through
a whole step of like, Okay, let's build our own
context of like and have it think through different steps
by filling in these different properties and then being able
to use that context and it's thinking to be able

(38:47):
to force it to then give the answer that or
the properties that you're actually going to use for the
next steps or for the end user.

Speaker 1 (38:56):
Basically, yeah, makes sense. So one other thing that I'm
looking at, so just give a little bit of context
for people here. I have two things going that I'm
wanting to put together. One of them is I would
like to have some kind of AI help agent kind
of thing on top end depth. Right, So if people
are looking to go through some of our courses or

(39:18):
things like that, you know that there's essentially an AI
agent that can help you figure out what to learn
next or you know, maybe next steps for your career
or things like that, and so I could see that
as a kind of a coach if nothing else. And
I'm trying to figure out how to manage the context

(39:40):
right because people may ask a lot of questions or
give it a lot of information. The other thing I'm
running into is if somebody leaves and comes back, right,
then do I summarize the previous context and hand it
to a new query or can I just pick up
where I left off? So, do you have any recommendations
on something like that where maybe it's not one continue

(40:00):
as session?

Speaker 3 (40:06):
So me, maybe can I re explain that that question again?

Speaker 1 (40:11):
Yeah, so so let's say that it's it's an ongoing tool, right.
So the other the other idea I have that I
want to build out is essentially I've hired virtual assistance
in the past to help me do a lot of
kind of routine things with the podcasts, And so with
that one, I kind of see more of the team
model and I can just come in and you know,

(40:32):
I can just use a tool for it to go
look up the information about my podcast, and then you know,
I can tell it to do tasks one off at
a time. And so for that one, I'm more thinking Okay,
how do I just make sure that the workflow works.
But for the other one, you know, where I wanted
to remember things about the people who are coming and

(40:54):
asking for help, and I want if they show up
and say, okay, I just got a job interview, how
do I prepare for this? You know, it's smart enough
to say, okay, well, what can you tell me about
the company? You know, maybe get a little more information,
and then turn around and actually remember enough about them
right to help them. Do do I have to store
that myself to get the continuity or can I you know,

(41:19):
is there some form of cashing the context window that
the different lms do, because most of the time you're
hitting these over APIs and so anyway, that's kind of
what I'm wondering, guys, is allowing people to pick up
where they left off when they're using an agent.

Speaker 3 (41:35):
Yeah, so when using all these different LM models, every
time you send a request the API, you have to
actually give it the full context of the history. And
so that was one of the things that kind of
hit me. I thought, for some reason I thought when
I first got it involved a couple of years ago,

(41:58):
is that you just remember your past calls like maybe
you get like like a conversation idea or something like that,
and it would handle all the state and the memory
of past conversing with that like conversation idea or something
like that. But it turns out like it's actually more stateless,
and so you have to send it all the information,

(42:20):
which has advantages of disvantage one and the disvantage would
be be a lot simpler just to maybe have it
manage it. But then you have a whole bunch of
control over what kind of context that you're giving to it,
and so you need to keep track of that information
that could be I mean typically would be like in

(42:40):
a database or something like that, and then you can
pull It's also important though too. We talked about context window,
and different models have different sizes of context window. I
mean you're looking at like different ones. I mean some
could be like only eight thousand output tokens. Some of
them have like one hundred and twenty eight thousand, like

(43:04):
opening eyes. A lot of them one hundred and twenty
eight or two hundred thousand context window. And so sometimes
what you need to do is if you want kind
of the more full context of things, once you're you
start using more and more up is to summarize the
information or to pull out key pieces of information that

(43:26):
are really important. And that's one of the things we do,
like speak Magic for instance, is we have kind of
a kind of in a sense like an adjson object
that keeps all the important information like what kind of
style does the video you need to be? Is it
more animated? Is it live action like type of thing,

(43:48):
because you want that consistency from shot to shot, from
scene to scene for the video, like who are the
characters and like character profiles of each one, so we
have very descriptive of a lot of different things of
that character. And that way, when when we're generating an
image of that character, it will have all that context.

(44:09):
It's okay, this character has curly black hair or something
like that, and and different kind of things, so that
when it generates the image before it turns into a video.

Speaker 1 (44:19):
It'll be correct.

Speaker 3 (44:20):
But and we also don't want to overload the context
as well, so it doesn't need So it's basically, when
when you're working with a call, you want to do
kind of a need to know like context, and so
you want to be dynamically being able to decide on
what context is important. For that specific call. This is

(44:40):
really important also, so and you kind of have different
types of memory as well. So you're going to have
memory for what is the state important for this specific
call at the moment, there's going to be kind of
memory where it has to do with the whole run
of the workflow, especially when you have cycles of humans
going through so like you get a question or you

(45:05):
give a prompt, you get a response, and then you're
continuing that conversation, and so you kind of need more
of a more global kind of memory that matters as well.

Speaker 1 (45:18):
Gotcha, Okay, Yeah, I guess my concern was if I'm
gathering information, you know, if I have to store information
when people are asking more personal things like should I
quit my job or should I you know, you know,
they tell it I'm not happy where I'm at and
so I'm looking for another place or things like that.

(45:39):
I mean, I think the expectation is that it's confidential, right,
I mean, if I'm given it a creative prompt and
I'm having a create part of my story, maybe I'm
you know, I don't feel as compromised by somebody seeing oh,
you know, he generated a video with a you know,
with a girl with black curly hair and a you know,
and a you know, a villain creature thing that does

(46:02):
you know, has certain powers or whatever, whatever, you know,
whatever my story's about, right, it's different than if it's, oh, well,
we didn't know that Chuck was not happy here and
was going to start looking for another job, or you know,
asked how to answer questions about things that he's not
proud of in his career past, and so, you know,

(46:23):
so keeping that confidential seems pretty important. But that's a
common problem for a lot of other things. So you're
telling me that, I that's something that I have to
figure out because the LLLM isn't gonna keep track of
it for me. One last question, because we're kind of
getting toward the end of our scheduled time, is because
I know a lot of people are interested in getting

(46:43):
involved in AI and learning AI, and to be perfectly honest,
I feel like if you're not if you're not getting
a feel for how AI works and what it can
do and how it can at least help you where
you're at, let alone, where it can help the users
of your applications where they're at, then in a year

(47:04):
or so, you're going to be way behind. And so
if people are sitting here and maybe you have a
different opinion on that, and you can say that in
a minute, But at the end of the day, if
somebody's going I need to understand AI, what do I
need to do in order to get there? Where do
you recommend people start and what are the kinds of
things that they ought to be picking up in order

(47:24):
to be successful with this?

Speaker 3 (47:27):
Yeah, I think in some ways I would say kind
of putch it like the same of like learning programming. So,
I mean, there's so many different directions you can go
with it, but it's usually best to pick out, like
if you want to learn more about something like a
programming language, is to pick out like a simple small

(47:48):
project and that you're interested in or maybe that's what
you want to provide type of things of value and
then and then learn the parts of that to accomplish
that kind of programming task.

Speaker 1 (48:05):
And so.

Speaker 3 (48:07):
I mean one of the great resources for that is
is using like a AI chatbot to do the research
for you and answer the questions for you. It's actually,
I mean, it's a great way I've been using it
for I've been learning Java recently, and it's a great

(48:28):
way to You could have it like quiz you on
what you know, what you don't know, or answer the questions.
It's like a fantastic tool to be able to learn
new technologies and things. But there are I mean, there's
different ways of kind of approaching it, and so you

(48:49):
can using JavaScript, you can go just plane, vanilla node
type of thing setting you're accessing these different APIs UH
work with maybe one API at first. A lot of
them are fairly similar to each other when it comes
like l MS, and so you can there's different tools

(49:11):
you can use if you want to go beyond kind
of figuring out that kind of thing yourself, UH which
you can have AI help you write things different parts
of creating these workflows and stuff. You can use tools
though that get you a little further ahead. If you're
using JavaScript, like something like line chain dot j s

(49:31):
will give you a bit of a framework of how
to to work with things. There are other tools out there,
like if you're looking more for like if you're more
interested in just AI workflows, you can use tools like
uh in eight in which allows you to like a
visual kind of way of putting together these workflows. Wows

(49:55):
AI h is a way to do the kind of
same thing where you're just using Localde tools, so you're
building things out visually, and then you can use JavaScript
expressions to basically to parse information, to decide what contexts
to use in different places, and to kind of help

(50:16):
control of how the workflow is ran. But there's I mean,
you can use things like Cursor, which would be more
of like an assistant that helps you within like your
ide to build, to help you generate code quickly that

(50:37):
could help you in the process of doing things.

Speaker 1 (50:40):
Or you could use like more of.

Speaker 3 (50:41):
A a coding agent like clawed code or like Manus,
which I think Manus has a that's a not necessarily
public for everyone right now as far as I know,
and that will help generate larger kind of projects from
single prompts you do more kind of like the vibe

(51:02):
coding which was brought up earlier. So those are kind
of some different tools that you could use. Another tool
maybe look at if you want to use a lot
of different models, would be something like open MCP, which
is a JavaScript kind of implementation of that allows you

(51:25):
to help standardize using different models and different kind of
tools that you could use. I mean, we have something
kind of similar that we added a long time ago,
and to WOWS that we have like thirty I think
it's thirty three different models that you can access, and

(51:47):
they have has a very similar architecturecause you're just looking
at bring a prompt in and then a response out basically,
and then we handle all the API kind of stuff
for you. So those are kind of different things.

Speaker 1 (52:01):
I kind of I.

Speaker 3 (52:02):
Would recommend for something that's learning different the tools and
things that they can get involved with and use.

Speaker 1 (52:09):
Yeah, one thing that I'll add to that is and
you've kind of alluded to it in the way that
you've told people to approach stuff, is you know, before
you're even writing code, you can just go in and
go to like chat, gpt dot com or grock dot
com or anthropic. I can't remember that you are all
to use claude, you know, but just just get in

(52:30):
and just start asking it questions and kind of get
used to how it works, because ultimately what you're gonna
be sending over is prompts a look a whole lot
like your questions anyway, and so you can figure out,
you know, the different tricks that work and then from there.
I also recommend that people go pick up a course
or a book or something that does some explanation on

(52:55):
prompt engineering. The reason is is because the one is
knowing how to access tools and access the AI and
how to ask go questions. But the rest of it
is just going to be down to your prompts and
how you format them so that you're getting the best
answers possible from your AI.

Speaker 3 (53:12):
So and there's a lot of iteration. So, yeah, something
doesn't work with your prompt engineering, try tweaking things. Pull
out pronouns like don't use this or that, be very
explicit on what you mean about things. Keep the context low.

Speaker 1 (53:28):
Yeah, all right, well let's go ahead and do some pics. Now.
I don't think you've been on the show before, so
let me just explain what they are real quick. It's
just us shouting out about stuff we like. So a
lot of times people do TV shows or movies or
technology tools or anything in between. We'll let Steve go first,

(53:50):
and then I'll go, and then you can go last,
and that way you can kind of get a feel
for how we do it.

Speaker 2 (53:56):
Yeah, Matthew, Just one thing he didn't mention is that
the high point of every episode of ours is my
dad jokes, dad jokes of the week, and so anyway,
make sure hopefully my sound effects are working properly. Okay.
So as an example, having a conversation with my friend

(54:18):
and I said I actually have a half brother and
he said different mothers. I said, nope, shark attack, thank you,
thank you. So last week was Saint Patrick's Day, I
think was a week ago today, and I bought a
diamond ring for my wife, but it turned out to

(54:38):
be a fake. They gave me a sham rock. And
then my dentist, who's actually a good friend of mine,
I'm known for.

Speaker 1 (54:48):
A long time.

Speaker 2 (54:49):
He he got this local award where he was voted
the dentist of the year. He didn't get a trophy though,
he just got a little plaque. And then finally question
in King Arthur's time, which of the Knights of the
Roundtable collected taxes? Because you know they collected taxes, sir,

(55:11):
charge those are the dad jokes.

Speaker 1 (55:16):
Of the week. All right, well how do you how
do you follow that? Very humbly, very humbly. Right. So, yeah,
so I don't know if I've played any new board
games lately, so I'm just gonna throw out something that
I've picked in the past. This is something that we've

(55:36):
played before, me and the guys. I did find out though,
that that has a different mode to it you can play,
so I'm gonna pick it again even though I haven't
played the mode. The campaign mode the board game is
called Heat Pedal to the Metal. It's a racing game,
so everyone's in race cars. You play cards in order

(55:59):
to move forward. If you go around the turns too fast,
then you take heat from your engine and put it
into your deck. The heat cards don't do anything, and
you have to do specific things to get rid of them.
That will often make you less efficient moving forward, and
so you start figuring out how to get through as

(56:21):
many turns as possible as quickly as possible so you
can get to the straightaways and take off. The campaign
mode is you play multiple races and you collect money
and then you can upgrade your car board game. Geek
rate waits it at two point one nine, which is
pretty casual gamer ish as far as that goes. It

(56:43):
says ages ten plus. I think somebody a little younger
than that could play. The strategy is not terrible, and
if you kind of help them. With the mechanics, I
think I think you could get like a six or
seven year old to play and that would be fine.
I've played it with four players that plays up to
six and it takes about an hour to play. So anyway,

(57:05):
a lot of fun. This is. This is one of
the favorites lately. In fact, on Board Game Geek. It
actually is number forty one overall on the games. So yeah,
really enjoying that. So I'm going to pick that. And
then lately I've been watching a couple of shows and

(57:28):
I think I've picked them over the last few weeks,
but I don't remember, so I'm going to pick them again.
The first one is nineteen twenty three and it's prequel
to Yellowstone, and I'm really enjoying that.

Speaker 2 (57:44):
Is that.

Speaker 1 (57:44):
The one is Harrison Ford. Yes, yep, yeah, I've heard that.

Speaker 2 (57:49):
Him trying to swagger down the street like a cowboys
not the best picture, but.

Speaker 1 (57:55):
Yeah, yeah, he's definitely old, but it's funny.

Speaker 2 (58:00):
He does a new uh, he does a new commercial
for I said Jeep or land Rover, and the very
last thing he says is, yeah, yes, I'm doing an
ad for them. Even though my last name is Ford.

Speaker 1 (58:15):
A sort of funny guy.

Speaker 2 (58:17):
I thought, that's just me.

Speaker 1 (58:20):
So anyway, watching that, I'm also watching Reacher. I'm enjoying that.
And then I'm about done with the book Rhythm of War,
which is a Brandon Sanderson book. It's the fourth book
in the Stormlight Archives. So every time he releases a
new book, which takes like forever to listen to on
Audible because it's you know, I mean, I've been listening

(58:43):
to the Rhythm of War, I think for almost a
month now, just because I listened. You know, I have
a ton of time to listen, so I listen when
I'm like trying to go to sleep or when i'm
you know, out in the car or something. But yeah,
so he released wind in Truth and I just haven't
gotten yet. So anyway, Rhythm of War I'm really enjoying

(59:03):
as well. And then I think that's pretty much all
I've got for picks this time. But Matthew, what are
your picks? Okay?

Speaker 3 (59:13):
So I love doing research on AI, so I'm always
trying to keep up on the newest models. It seems
like lately a lot of new audio like texts to
speech audio models have been coming out, which I've been
waiting for I had some ideas of how to kind
of produce some myself, but now with these coming out,

(59:34):
like I'm pretty excited to add that to speak magic.
So one of them that kind of hit that was amazing.
It's called sessing Me, and it brought a lot of
has a lot of emotion and kind of context, aware
of of of how to provide like in real time

(59:58):
like conversations. It just sounds a lot like a human.
There's been a few others that have come out has
recently been great too. One of them from open Ai.
If you go to OpenAI dot fm, it has a
way to be able to control like how like how

(01:00:21):
the the speech should be generated, so you can put
emotion in it, you can give it different things like
have a kind of accents or different things too. It's
pretty incredible as well. So those are kind of some
of the new AI models that have been pretty exciting

(01:00:41):
coming out recently. So see and then I brought up
before kind of looking more into open MCP, which is
seems like a using JavaScript a way to be able
to connect to different models and different other kind of
services as well. Is pretty exciting. Let's see, I've watched

(01:01:06):
some Reacher too, So I guess they have one more
episode left on that's next week.

Speaker 1 (01:01:11):
That's been Yeah, it's definitely getting there where it's yeah,
it's gonna wrap up.

Speaker 3 (01:01:15):
So it sounds like it's a lot closer to like
the series and like the movie, like the books basically
than the movie. I guess it's pretty cool. Yeah, that's
kind of that for me. But that's uh.

Speaker 1 (01:01:35):
Yeah. I meant to throw out one AI pick and
I forgot about it, and that is if you want
to play with some of the large language models, especially
around text. I've been using an open router. I don't
know if you've used them. You can also get some
of them to run on your own machine if you
get them off a hugging face. So those are the
two resources that I'm going to recommend. One of open Router.

(01:01:58):
You can use their libraries to connect to them, and
then they connect to all the other models, so you
can try out the LAMA three model, the open AI
GPT models, you can try out Claude, and you can
switch between them so you can see which ones work

(01:02:18):
best without having to do a whole lot of extra
work to program against each one. And then hugging Face
they have their a whole bunch of other models and
you can run those all locally, and then I think
that's huggingface. Dot co is where you get those nice
all right, matt If people want to find your stuff,
where do they find you online?

Speaker 3 (01:02:41):
Yeah, check out speak Magic ai. You can see kind
of we need to add some newer examples on there.
Our quality has gone up and with some of these
new models that come out, they'll be even better. So
you can check out Wiles at Wiles dot ai and

(01:03:04):
uh so, kind of interesting kind of looking at transition there.
So I'm trying to decide on how to take things
further with that, possibly maybe going to open source kind
of route or kind of make it more community driven.
So if you're interested with that, you can send me

(01:03:25):
a message at Matthew at Wiles dot ai. Yeah, that's
kind of kind of two kind of places for me.

Speaker 1 (01:03:35):
Awesome. All right, Well, let's go ahead and wrap it
up here until next time, folks. Maxxed out

All Episodes

Building Agentic AI Workflows with Matthew Henage - JSJ 678

Episode Transcript

Popular Podcasts

Dateline NBC

Stuff You Should Know

Law & Order: Criminal Justice System - Season 1 & Season 2

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Building Agentic AI Workflows with Matthew Henage - JSJ 678