All Episodes

October 12, 2024 48 mins

In this episode, Mark and Shashank sit down with Greg and Chris, co-founders of AI Maker Space, at a hackathon in Palo Alto at the offices of 500 startups. They discuss the mission of AI Maker Space to build a community for developing and deploying large language model applications. The conversation covers topics such as the use of retrieval augmented generation (RAG) versus fine-tuning, the role of agents in AI applications, the importance of traditional software engineering skills in AI development, and the future of AI—including vision-language models and embodied AI. Greg and Chris also share advice for newcomers interested in entering the AI field.

https://aimakerspace.io/

https://www.youtube.com/channel/UCbDZFHUjTCCUKyXgcp3g50Q

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
All right, and hello everybody and welcome to another episode of the podcast.

(00:05):
So today is actually a super special episode.
So today we have like a we're at a hackathon actually.
So we're sitting here in Palo Alto and we have some wonderful friends at the AI Maker
Space.
So the AI Maker Space is like a really cool company.

(00:28):
They do a lot of things that will help you learn how to build with AI.
They have a YouTube channel and I heard that they're actually going to be starting a podcast
soon.
So soon, well, when they have it, we'll have to let you guys know.
It's not there yet, but it's coming.
So yeah, just really appreciate you guys for agreeing to do this.

(00:50):
So do you want to maybe just like tell us a little bit, but actually first, do you want
to introduce yourself and then we'll get into like more logistical to tell us what
AI Maker Space is and does.
Yeah, yeah, I'm Greg.
I'm the co-founder and CEO, aka Dr. Greg on YouTube.
I'm Chris.
I am the co-founder and CTO aka the Wiz.

(01:12):
The LLM.
The LLM.
The LLM wizard.
Yes.
All right.
Well, I love it.
So do you want to maybe just tell us a little bit about like what AI Maker Space is,
what you guys do, what kind of makes you special?
Yeah.
Well, we are really on a mission to create the world's leading community for people that

(01:34):
want to build, ship, and share production, large language model applications.
And this vision and our specific mission to create this community really came about from
the 10 years I spent as a university professor watching students and most of them didn't
succeed.

(01:55):
The ones that did were the ones that really became enamored with something.
They kept building, they kept shipping and iterating and working on the next thing and
the next thing and the next thing.
They found communities of others to share that stuff in.
They would always get the jobs they want.
They would always have amazing stories to tell.
They would always be the ones moving to cool cities and working with cool new startups or

(02:18):
going to get into job with Tesla.
And really I went all in on AI and on remote learning in 2020 after a lot of, you know, not
working out in the university system and large bureaucracies.
And that's proven to be a really great decision as Chad G.B.T. came out just a couple years

(02:38):
later, right as I was in the right space at the right time.
At the Wiz, at the rest of our team, actually working at a company remotely, but that was
based here locally in Palo Alto.
Okay.
Very cool.
So now that you have this company and people are building, what types of things have you
seen people actually build?

(02:59):
Yeah, I think for the most part we're still in a, some extension of talk to my data.
I think the amount of modalities of data that exist on top of the sheer volume of data that
exists.
I find it unlikely that we'll move past that being a major use case quickly.

(03:21):
So it's a lot of rag systems or systems that start as rag and then extend into to more
complex applications.
And definitely in the last six months we've seen a lot more focus and interest in agentic
applications.
And as the LLMs that we have access to quote unquote grow more powerful, right, we're

(03:44):
seeing that actually becoming a feasible strategy even in production environment.
So it's a lot still on the chat paradigm.
We're less so seeing people focusing yet on like VLMs or extending past the text modality

(04:05):
but you can definitely feel it bubbling up or burbling up.
For the people not familiar with VLMs, which you might define that.
Just a vision language model.
So it's just a combo of our favorite modality which is text and then a video as well.
Yeah, I would say the people coming into our cohorts, they're like agents so hot.

(04:26):
Oh my God, I want to build multi agent systems.
And so we actually had to change our curriculum to grant people into multi agent systems like
within three weeks even though it's a 10 week program.
And that's really proven to be a great decision because people just cannot wait to get to,
I want to build multi agent systems.

(04:46):
And then you're like, well, but where are these used in practice and industry?
And it's like, well, I don't know actually.
And not many people do because they're really not used yet in industry.
And so people are building really cool things, prototyping really cool things.
It's that line from prototyping to production that's so interesting to us in terms of community.

(05:09):
Yeah, I think that's really interesting because typically you actually don't see a lot of
agents in industry yet.
It's just like, it's so new, right?
That I think people are just kind of trying to figure this whole thing out.
And I think that kind of the stochastic nature of the agents kind of really makes us so
that companies don't quite trust it yet.

(05:29):
But I don't know, like, what do you guys think?
Is there any use cases that you think that like is maybe like low hanging fruit that companies
should be doing that they're not yet?
Maybe with like agents or rag or anything?
I think agents are definitely becoming industry standard tools.
I think we're a very short distance away from that in terms of how much time until we see

(05:51):
more viable production ready agent systems.
You see companies like Salesforce with agent forest finally starting to come out.
And I think when we're talking about like this unreliability of agent systems, we're seeing
a lot of tools that are helping to mitigate that better guardrails, better conforming to
the desired output structure, et cetera.

(06:13):
So every time we have the beauty of agents is also its worst flaw, which is that small errors
compound to, and they balloon out to large errors.
But even a small reduction in that initial error can get us on track, let's say, and we're
seeing a ton of that error correction come out.

(06:37):
I think we also got to put into perspective in terms of the time.
We talk about agents as if they have been out for two or three years.
Was it their thing?
Yeah, exactly.
Exactly.
You get people know about.
That's right.
But we're still very much in the infancy of LLM powered agents in terms of the grand scheme

(07:00):
of things.
But lots of, yeah, I think error correction guardrails, things like this are really going
to help them actually exist.
Yeah, but taking it even down a step back to rag.
I think rag is just so not well utilized by enterprise in general right now.
There's so much low hanging fruit, but honestly, the incentives have to be aligned because

(07:24):
these large bureaucracies, do they really want to know how much useless work people are
doing?
If they really dug in, you really have to face some of these real, these are genuine
problems that like they know it's going to be a political whole thing.
And so they're not properly incentivized.
Again, to go down these problem rabbit holes and really find the right solution.

(07:47):
So when we actually work with enterprise clients, it ends up being very, very simple applications
that will build and they actually create a ton of value.
And even sometimes it doesn't require rag, right?
It's just doing search and retrieval better for them in their archaic outdated systems.
So I mean, I think there's just so much low hanging fruit.
It's just completely absurd as long as you know how to look and what to look for.

(08:11):
And that's what we try to train our folks in the community to be able to do.
Yeah, I think that resonates with both Mark and I because we both work in Big Tech at Amazon
Google and the pace at which we move is a little slow for good reason.
We have to deal with regulations.
We scale at like unprecedented levels, but that's one of the fun things about running this
meet up and meeting folks in like the startup community because they move fast.

(08:34):
They break things, they try like outrageous outlandish ideas.
And you know, what made me think of the agent's conversation we were having is that this was
in people's mind ever since Chat GPD first came out like baby AGI, auto GBT.
So the seeds were there, but you guys hit the nail on the head.
It's really hard to make it productionized and have it make an output that is consistent,

(09:00):
reproducible.
So I was listening to this podcast from the CEO of Brain Trust.
It's another startup that helps companies scale and deploy LLM's at scale.
And they were having this conversation where they've noticed the same thing.
The only way they've integrated some kind of an agentic behavior is to have it be programmatic.
Have like a software that is doing reproducible like deterministic things, but with LLM sprinkled

(09:26):
throughout the process.
As opposed to like having LLM just take control and run with the output like in a recursive
loop which just results in absolute chaos.
So on that note, I was curious if there are frameworks that kind of still work with like
programmatic methodologies, but like allow LLM's to be interjected at like reasonable steps.

(09:52):
Yeah, I mean, so we're partial to it of course, but the framework Lang graph specifically is
very useful when it comes to this idea of very well defined behavior which is kind of just
Python functions plus LLM's a little bit doesn't even have to be right.

(10:15):
And I think you're seeing more of the frameworks move this way, LLM index as well, now through
workflows where people are realizing that these have to be very hybrid systems and a large
part of the hybrid system has to be traditional, well understood, well designed software with
that like you said, a smattering of agents.

(10:37):
And I think this is why we see things like LLM routers being very effective tasks where
we it's kind of a simple task that has very discrete outputs that has the ability to
fall back on a very you know, heuristics based system.
And the LLM just kind of helps make the edges be a little bit fuzzier, right?

(11:00):
So the input can be a little bit more loosely defined which is valuable, it gives you a lot
of power without kind of overstepping into this LLM controls everything and you know the
output winds up garbage.
I mean there are some other frameworks like like crew AI and these other ones that are a little

(11:21):
a little like more prototypy, more like I haven't been a full time software engineer ever
in my life friendly, right?
So if you're a data scientist or come to another place, it's a little bit easier to get these
things going.
But yeah, when you go to the production grade stuff that you generally see people using
more, you see and you have to learn a lot of these basic software engineering concepts

(11:44):
and states, you know, events, right?
You got to learn all these things that like as a data scientist using a notebook, you never
learned and you never had to.
And so there's this real shift towards production scale engineering that I think the flavoring
it, I think that's right, you know, I think that you're flavoring the classic paradigm

(12:07):
with a little, little smattering.
You know what, I think that makes a lot of sense because I think that with all of these like
new AI tools, there's a lot of hype on saying like, oh look, my AI can build like a Tetris
or my AI can build like a simple like flask app, right?
And like the thing is is like there's already a lot of training data with like a flask app

(12:28):
or like Tetris or something like that, right?
It's like these are small self-contained programs, right?
But then the moment that you go to something like a production or like maybe you have, I
think in like the talk or you were mentioning like 10,000 lines of code, but there might be like
100,000 lines of code or a million lines of code.
Like I think Android is like 400 gigabytes or something like that of like actual code, right?

(12:50):
So it's like, you know, the LLM, it isn't able to process that much.
So like, you know, LLM's can do a lot, but there's limits what they can do, right?
And like knowing like actually like traditional software engineering is really valuable.
And like the LLM still kind of get stuck in these like local optimas.

(13:11):
And yeah, I just like completely agree with you got to say like that, you know, the LLM's
are just like one little tiny piece of like the entire engineering pie.
Yeah, it's funny when people are like, well, I'll choose a long context window.
Oh, right.
And then it's like, yeah, yeah, okay, well, put your, you know, 100 million lines of code in
there and then press go and see if it fixes the bug, you know, it's like try it out.

(13:34):
And then it's like, how much did you just spend on producing and shoving those tokens through,
you know, and this, this idea of like you're just not, yeah, you can like out LLM and out,
you know, like you can spend as much money as it takes to get this thing to like work that
way, but it's like if you just thought about it for a second, you probably would have just

(13:57):
built some software to do it.
And that's exactly the thing.
And what we're talking about like this, this hyper, you know, not hyper scale problem,
but it's like these very convoluted problems where, you know, that the human understanding
is the thing that lets you be able to quote unquote, fix the bug right like LLM's are just

(14:19):
aren't there yet.
They can't see that far.
They don't attend to tokens that are across those kinds of distances in reasonable time
frames.
And the other thing that I'm struck by all the time is if you look at the path of innovation
that we're going down for LLM style applications or injecting LLM's into our applications, we're

(14:40):
really just relearning less and after less than that we already learn in software engineering
proper, right?
We're, even when we look at things like inference optimization, we're taking things that we
learned when we were creating operating systems, right, for the first time.
And I think this is the reason why we're so excited whenever we have people who are very

(15:03):
software engineering focused, who attend our courses or consumer content because those
people can bring all of those skills.
And we're talking about, Frameworks talk about async as if it's like this value add, but
if you look at the rest of the web or app development world, it's just like mandatory.

(15:25):
So it's just part of the DNA of that kind of engineering.
So it's, we're just reinventing the wheel, but this time we've added LLM's.
Now let me give a quick shout out to everybody out there that doesn't have a software engineering
background like me, right?
So I've been learning this stuff as we go along over the years with with with here.
And like I didn't know what async was until we started, you know, having to teach this

(15:50):
stuff and we started having to, I had to start having to learn this stuff for real.
And I think like when you, when you have to go and you have to get into engineering and
you're like, like, how do I, how do I do this exactly?
You know, and it's like that's where it's like, well, don't go run out and get a computer
engineering degree or something.

(16:11):
It's like, it's like, you got to learn it through building stuff.
And I think one of the things that I would be, you know, I'd be super curious if you guys
went and interviewed a bunch more people on this is like, so the most interesting people that
come in to this sort of rediscovering the patterns idea and the layers of abstraction keep
increasing are the sort of like, you know, men and women that have had really a lot of experience,

(16:37):
building software, like since the early days of the internet.
And they'll say, well, that's just like, you know, when we needed to make that protocol
and that's just like, and I'm like, really it is like, and it makes me interested in the
history of how the software came to be how it is even today.
Because yeah, there's this real, there's this real, you know, sense of rediscovery and this

(17:01):
this enthusiasm for this, this new wave of enthusiasm for this rediscovery phase is just,
it's so interesting and it's so hard to articulate and get a handle on.
And so yeah, I would really encourage you guys and maybe we can introduce you to some of
the marketing media.
Yeah, we would love to meet them.
And I think, you know, it's really true that throughout time, there's been a lot of

(17:24):
hard problems that have been solved.
I mean, the people who like, you know, worked like creative computers are freaking geniuses.
Like, I mean, like, it's amazing, right?
And like, it's also kind of interesting because like, back of the day, I mean, they were
running on these like really, kind of like not powerful machines, right?
Like, I mean, they had to like, eat every last little bit of like a memory and compute that

(17:50):
they could.
And now it's like, we have like, I don't know, like something like giant it.
It feels like, you know, we're like driving a hammer with like a steam roller or something
like that sometimes, you know?
I would add to that though, because I totally agree with you, but there's also the interesting
parallel that like right now, think about the amount of compute resources that it takes

(18:10):
to say power now, like a like a llama 405 beat, right?
So 405 billion parameter, large language model.
And think of just physically how much space is needed to compute that and draw that parallel
to the early days of compute when we had super huge computers to do what we now think of

(18:32):
as extraordinarily trivial tasks, right?
And you can imagine that if we find a way or we, we, which of course, I think the industry
is very hopeful for if we find a way to realize that we're in that stage now, right?
And how this technology will progress forward.
I think it is exactly that we are, we need to reinvent the wheel for this specific technology.

(19:00):
And while I'm very hopeful that we can use a lot of lessons of your to help do that, we've
got to get very clever about going forward to get us back to where we can have this conversation
in another 70 years and talk about how, you know, LLMs, remember when we used to run them
on data centers instead of like my pocket phone or whatever, you know?

(19:23):
So, yeah.
I have like mixed feelings about that because on the one hand, yes, compute is getting
more powerful and all of these data centers are becoming more and more massive and Nvidia
is releasing faster and faster chips and we're hosting another meetup at San Bonovo, which
has like inference that is unheard of at this point.
And I feel like developers are getting a little lazy and just throwing everything into this

(19:46):
one large context window.
But on the other hand, yes, I think we are just relearning all the same problems that
we've learned in the past and it reminds me of this algorithm class where we're just
like thinking conceptually about reducing problem sets into the basic representation of a problem.
And ideally, we'd have a combination of both.

(20:07):
Yes, we would have the luxury to be able to just throw everything at these LLMs.
But on the other hand, the smart people need to be thinking about how to optimize the
hell out of this.
Yeah.
Well, and developers even at this very hackathon, they came up, you know, after the workshop
and they're asking how to do this or that.
And I'm always more product minded and I'm bringing them back and I'm saying, well, yeah,

(20:28):
okay, sure.
You could, okay.
But why do you need to run for concurrent LLM calls in the midst of a ret?
Like, what's the question that you're asking that the user needs a response to that really
requires for truly concurrent instead of just decide which one to do next?
And there's this real elegance and simplicity that's often just cast aside because Big

(20:54):
bad GPU is pretty cool.
More GPU is more better.
On that note, I had a more practical question for some of our listeners and maybe the people
at the hackathon because we heard a couple demos at workshops where they were explaining
how to fine tune a model and deploy your own custom model.
Personally, I feel like the most sensible approach is just go to TPD4, give it some, you know,

(21:20):
a few shot examples and have it work with your use case.
So the question, rag or few shot examples versus fine tuning, but what do you think?
So we, I mean, you said we have a meme about this so it's rag or fine tuning, the answer
is yes.
I think there is a, and you're absolutely correct.

(21:42):
So for a lot of problems, there's absolutely no need to fine tune at all whatsoever.
Using few shot examples, many, whatever you can pay for it, right?
Going to be totally fine.
At some point, we get to this space where we're paying a lot for every prompt because every
prompt has an examples, right?

(22:04):
And at some point, we just want to bake those examples into the model itself and stop,
you know, if we're paying by token for say a GBT40 or something or we're just thinking
about having that wasted context window, right?
Though with, you know, KVCache or everything like this is less of a problem.
But the idea is there is, there is an idea of a gradient respecter, I think, when it

(22:32):
comes to fine tuning and rag, though for almost all use cases, we just say, just use rag, right?
Like if you're not trying to do this heavy domain adaptation, if you're not trying to do these
very specific tasks, like fine tuning is not just, you're going to waste your time and you're
going to push your app to production and you're going to get it in the hands of your users

(22:55):
a week later than you could have, right?
So it, hard to agree that in almost all cases, we want to, we want to start with rag, but
there, there does exist this space where fine tuning can be a sequiture, right?
Yeah, exactly.
So you start with prompt engineering and like generally before you even go to rag, right?

(23:18):
Once you have to use your, you shut fine of family.
But then before you even go add knowledge, like if, if you're getting a decent result and
you want to, let's say baseline and benchmark it, you just use more prompting to evaluate
what you've done.
So that's like first step for you to leave prompting.
And then you can say like, and you can just prompt the evaluator, right?
How good it will just tell you good one out of ten, you know?

(23:39):
We do don't miss a lot how dope was it, right?
But really if you look inside the evaluation frameworks, they're just prompted as well,
right?
And so then you go and you might try some rag, right?
And then you're like, well, you're probably going to make some changes, try some chunk size
things, change from retrievers.
And then probably a next step is often fine tuning the embedding model for rag.

(24:01):
You just fine tune on the data that you're doing rag on.
That's sort of a nice next step that will always give you quantitatively nice accuracy and
result improvements.
And then maybe you're going to go back and you're going to do some more advanced rag.
Maybe add an agent at that point, you get to give it access to some public stuff at that
school with your enterprise, and then maybe you fine tune a chat model in the end if you're
at this 95, but I need to get to 98% accuracy before deploying.

(24:24):
So I think there's a sequence and you start with prompting exactly as you said.
Go to rag, consider fine tuning if it's important for your application and then agents because
obviously agents.
Yeah, you know, why not?
And then multi agents, right?
Yeah, just because the board is going to love it.

(24:45):
So I think that we're kind of speaking a little bit abstract right now.
So I think that like, you know, it makes sense.
You're starting something new.
You want to use AI.
So and you want to be able to query your documents or something.
So you are going to use rag and then maybe you're going to fine tune for your specific use
case.
But like, is there any specific use cases that you've seen people use that like maybe they

(25:08):
follow this type of workflow or maybe like, okay, like maybe rag isn't what you need.
Like maybe like you need to like fine tune.
Is there anything that you've like maybe seen somebody build or like any like hypothetical
like use cases that you could think of?
Well, there's a funny story that we taught this class that we didn't teach rag and it was
all about the transformer and how to do training and tuning into the transformer.

(25:30):
We said, okay, like, and it was last year.
So it was like in the year of rag, right?
So everybody was rag minded and we were like, no, no, you're supposed to be fine tune.
And so you can actually force rag into fine tuning paradigm by shopping up the data into
input output question sets or input output, whatever you want sets.
And it'll flavor that that like a rag application would, but it's a stupid idea, right?

(25:57):
And so like I think the, and it's stupid because like rag is just so much simpler to just
try real quick and see again, it goes back to what is the question that you want answered
in this question answer system or what's the input you want with the output you want?
And this, this is so hard for especially developers to do because they don't want to be product

(26:19):
managers.
But in this age of AI, to decide on the underlying generative AI patterns you should leverage,
prompt engineering, rag, fine tuning agents, to decide on the right ones, you have to understand
what the problem is they're solving and why you're solving it and how you're going to leverage
those patterns to do so is the implementation problem.
And if you just leave the implementation problem separate, the implementation people

(26:42):
are going to be like, I want to do the cool implementation thing.
I want to do the most fun.
You know, I want to make the GPU go better.
I don't want to, you know, I don't really care about business value because my salary isn't
changed.
So I think it goes back again to incentives and that's what I think start absurd, you know,
where you have the end-hand unicorn AI engineer people that are product-minded and also
dead-offs-minded, like, yeah, like these people are building cool things.

(27:04):
So to the specific question, you know, specific use cases.
It's got to be domain adaptation is the number one, I would say, where you--
Or fine tuning.
Or fine tuning.
Like, if you're going to be in a very specific domain, with very domain specific--
Jolging in words.
Exactly.
Like, doctors, lawyers, this kind of like, you know, they use fancy, schmancy words and you're

(27:27):
like, what is that?
Or you're in a large bureaucracy and they have acronym or initialism soup or whatever it
is, right?
And the idea too is like, when there's a lot of conflicting words, right?
So if you're in a domain that has words that we would understand or use in common parlance
that have very specific or very narrow meanings in that domain, something like fine tuning

(27:50):
is very useful as well as in things like sovereign AI.
So AI systems built for specific countries or specific dialects, specific languages, fine
tuning is going to be very useful for those cases as well.
So just to start to cut you off.
But just to make it kind of more, I guess, tangible.
Yeah.

(28:11):
So like, when you say like specific domains, you'd be like, okay, if I'm a lawyer and I have
like a bunch of legalese and I have a lot of things that like maybe wouldn't apply in
other cases, maybe I would fine tune for my law firm.
Or maybe if I'm a doctor, I'm like researching all these things.
Like novel diseases, maybe I would do some fine tune name.

(28:35):
But then maybe like for the rag case, it's like maybe I'm like a regular startup and I'm
like just trying to access my company's internal documents, but I'm like, I don't know like
some marketer or something, right?
And like I have like maybe some of my like customer data, some of my internal docs.
For that, maybe rag might be the right solution.

(28:57):
I don't need to do fine tune name.
Would you say that's maybe like a right summary?
100% true.
I would add that even when we're talking about like law, even more specifically contract law
versus IP law versus like this is the level of granularity that you want to start thinking
about fine tuning at.
So not just the general domain of law, but specific practices of law as well as specific

(29:20):
practices of medicine, right?
That's where fine tuning is really going to catapult you ahead and for rag absolutely.
The meme we used to say, right, which is still I think very true when you want to teach
the language new behavior or to understand words better, you're going to use fine tuning.
But when you just need to add new knowledge, right, when we just need to be up to date,

(29:43):
let's say.
So you're working in a news organization, your journalist, you want to be able to have
the most up to date news from your whatever sources that you have as APIs.
That's perfect for rag because you're going to be able to get information that came out
yesterday or five minutes ago, which you would not be able to have from that pre-trained frozen

(30:06):
LLM.
So basically the way that I would think about it, if you need new information or you need
to add knowledge, so like exactly what you said about like you have some knowledge base
and you need to be able to communicate with it, that's where rag is going to shine way
more than fine tuning ever could. But when you get into that needily language, when you

(30:27):
get into those narrow channels of domains, fine tuning is going to be, it's just going
to allow your system to understand all of those words a lot better, to understand what they
mean in context, right?
And of course, you can always combine it to all the time, but that's what I was just

(30:47):
going to say, just put a really fine point on this to your question, right? It's like you
got PDFs, you got docs. What words are in the docs? Are the words that the LLM was trained
on, if they are, just use rag. If they're not, use rag and fine tuning in that sequence,
you know?
Okay, I think that's a fantastic summary. So, go ahead.

(31:09):
Oh, maybe to dive down just on one point. You mentioned maybe like trying to get this LLM
to understand the difference between hominem's of words that maybe sound or look the same
but have differences in depending on context. And you mentioned maybe like fine tuning the
embedding model itself in addition to fine tuning like the LLM. So would that like cause problems
if you want to then go back and use the vanilla version of chat GPT or use other models that

(31:33):
are trained on different embeddings? How would that work?
So we want to be very careful in this, this is the worst because we've used the same
word to describe five different systems. So the embedding model in rag is completely distinct
and different from the embeddings of a transformer model, the tokenizer.
Exactly. So when you fine tune that embedding model, right, which is a separate entity to the

(31:59):
LLM, they only converse through the idea of tokens and natural language. So you're not
really going to suffer if you train that embedding model to be more specialized to your data set,
it's just going to retrieve better context, which is then going to be used by the LLM.
So there's this translated through natural language. So there's no like penalty or if you say

(32:23):
you switch your generation model, there's no penalty if you're still using your old fine
tune embedding model. Since we are translating through natural language, we don't have to worry
about that, that gumming up the works. If you train your generator, absolute, absolute.
Well, that's what I was going to say is you have the retriever and you have the generator

(32:45):
and the retriever has the embedding model. The generator is the chat model, is the chat
instruction tune model. And so they're actually completely separate, right, because you have
the retrieval, then you have put everything, just put it in context, you put it in the prompt,
and then you feed that prompt to the generator. So you can do it completely independently,

(33:06):
and it works pretty well. Very small note that we've got to say, I've got to bring it up.
It didn't used to be that way, we used to train them as one unit. When Ragn first came
on the scene, it was, you would fine tune both the retriever and the generator at the same
time. Real Ragn is this thing. Well, the original Ragn paper outlined not exactly the way we

(33:28):
think about Ragn. That's right. So it actually was Ragn's fault.
Because of all, and there are applications like from RCAI, they're domain adapted language
model and toolkit does the same thing. It does what's called end to end Ragn, and it simultaneously
trains the retriever and the generator. But what's interesting is that like the majority of the

(33:49):
updates, I think this is right, are in the retriever. Oh, 100%. Yeah. Because retrieval done well
makes better generations. You put the reference material in, you get a better output. Right?
So separating them is really same. That was really helpful. Before we kind of get
kicked out and you guys have to go to your next speaking engagements, I kind of wanted to touch on

(34:13):
the future. You mentioned VLM's like vision. And with Meta's new AR vision, no pun intended, they
are opening up this developer platform to allow people to like try to build overlays on top of your
current reality. And you know, there's the Vision Pro, which also does similar things, but in a slightly

(34:35):
different way, where do you see the future of LLM, RIG, Agente, behavior in this new vision world?
And I think even Andrew Aing was like pushing for vision agents. That's like the new next big thing.
Well, it's just like humans, right? I mean, we have a lot of senses and we use them to do
better work than if we don't have them, right? Like without vision, we are impaired, right? And I think

(35:04):
we the same is true of the systems we're building now. They're great. And they work in their modality,
like absolute legends, and we love that. But they are missing such an information rich part of the
world, right? And when we talk about like the future, we talk about AR, being able to ask, what is

(35:24):
this thing that sits before me that's plugged into your computer? It's a microphone, right? But if
you don't know that or you don't have the context, being able to look at it and then without needing to
build an extensive prompt that really goes into deep, well, it's okay, it's black and it has this kind of
mesh metal on top. It's got some knobs to say gain and pattern, right? Just being able to show it.

(35:47):
It's a measurable time saver. It's also the way humans communicate, right? If I wanted to show you
someone, I wouldn't describe them in detail laboriously to you, but just show you a picture, right? So
it must be the future to incorporate the additional modalities, audio, vision. It just, it has to be where we

(36:10):
go. To Greg's point is making a lot. It's got to have a body at some point, right? We have to have
the ability for these systems to be in the world and to understand that they're in the world and
where in the world they are. Absolutely the future. Whether we're going to get there soon. Yeah, right.
Well, we can talk about that. And then this sort of brings up sort of the AI maker space thing,

(36:34):
you know, and I used to teach in real maker spaces that aren't that AI and they're more about
manufacturing, 30 printing this guy. I believe there will be a convergence of the digital and the
physical world in the 21st century, just real long term. And I think the vision models are going to,
you know, pave the way into being able to do more simulation stuff. The simulation stuff is going to

(36:55):
be, you know, able to now we can combine the physics and the physics space and the empirical modeling.
And now the sudden we're really, really cooking with gasoline instead of like these, you know,
one-off training companies or one-off robotics companies. We want to ultimately be a space where
people can come build really awesome things that are glassed, you can already face, and that kind of thing.
But yeah, again, it's, it's not something we're investing in right now. It's not something we're

(37:18):
spending our time on a whole bunch because right now the way to, and the enterprises aren't generating
value with this anytime soon either. They can't even get ragged or search in which we will run the
first place. But I agree with you. I think that having robots in the world is going to be really
the future. I mean, because the thing is, is like, the way I kind of think about like LLOMS right now,

(37:41):
it's sort of just like a brain floating in a jar. It's just like, it's like, it's like, it can't really,
like, it can do about as much as like a really, really smart like brain just like kind of floating there.
Like, I mean, it can like, you can chat with it like onto a computer screen, it can like, you know,
do some response back. But like, the moment you put that brain into a body, now it's like,

(38:06):
the, you can do anything, right? Like, we can do like asteroid mining and space or something like that.
Right. Let's send the robots there. Like, I don't want to go. But like, we can have the robots go.
All right. Like, this, like, we could start odd, like having like the robot drive my car. We can have,
you know, the robot, like, clean my house. Like, do the dishes, right? Like, all that stuff is like,

(38:29):
is super exciting. So I'm on board. Yeah. Well, I mean, if we sort of connected back to agents and
multi agents in the grand vision of the whole thing, it's like, we want to think that, you know, the idea
of an agent is quite old. And the idea of an agent is just a subsystem of another system, right? People
and companies or agents and economies, neurons or agents and brains. Like, what kind of agents is the

(38:54):
brain on that robot going to be programmed with so that it can go mind those things on, you know,
planet X, Y, Z, right? It's like, it's really interesting. But again, I think it's time to go all
in on software today. These other things are curious, curiosities and quite interesting. But if you
want to really create this value, do they have a stick with text as a modality? Yeah. I can see

(39:16):
there being a huge issue, especially with an agent that has embodied the physical space, given how
many issues we have of agents today running with like the errors compounding. Doesn't seem like a problem.
Terminating. I think too, like one of the things that's most exciting for me about a safe future,

(39:37):
when we've, when we've ironed out some of these things, when we've, when we've got to the place
where we have guardrails that work consistently and we can have deterministic behavior, you know,
is we're always thinking about human modalities. We're always thinking about, I human see, we should
give the robot the ability to see. Humans feel, we should give the ability for the robot to feel,
right? But there's, there's so much more information that humans can't consider or don't consider,

(40:05):
right? We don't have the ability to extremely precisely identify temperatures of things around us
without touching them. We don't have the ability to see wavelengths of light that extend beyond our
vision, right? And I think this kind of ability to equip our future agents with modalities. Yeah.

(40:28):
That's right. To move to the next step, I would be quite interesting, especially to see how we can
use that data to make a safer, better application or in this case, a valued agent, right? It's, we just have,
we live in such an information rich universe. It will be very interesting to see how we can take

(40:52):
advantage of all that information, moving beyond even our common senses as modalities. Yeah, a superhuman,
physical, robot with multiple senses that is somehow also communicating with us through our
neural link implant. Yeah, yeah. Being the best leader we've ever wanted. Yeah. Yeah. It's like, yeah, it's,

(41:15):
but it's, it's like, we're, it is a joke right now. It's a joke. But there's also a path to that,
right? I mean, like it's, it's hard. It will be difficult, but there exists the foundations of all
of those technologies to build exactly what you're talking about, which is absolutely insane, right? Like,
this is the, if you said that five years ago, it would have been like, uh, funny side fight. But

(41:42):
when you say that now, it's like, I mean, with enough work, yeah, we'll get there. I mean, we're close.
I mean, did you guys see the Tesla announcement? I think it was, yeah, of course. Yeah, of course.
I mean, like, well, in part of the story we tell with the opening, I think it was 2019, they were like,
yeah, I don't know this GPT-2 thing is too powerful. We shouldn't release it, you know, to the public. Right? Well,

(42:02):
so it goes. Yeah. So there was, there was a moment in time, but we've all agreed that we're now
passed to that. So what are we going to do? We're going to build ship and share things that hopefully
make the world a little bit better. That's right. I love that. So one thing before we like, you know,
get kicked out of here, if somebody's new, let's say they're maybe in like a student or in like a

(42:27):
different career and they want to get into software, they want to get into AI, they want to, you know,
kind of get started this. How would you get started? Like, you know, there's a million one things
you can do. Like, what would be like the thing that you would recommend that would be like the very
first step to kind of get started to start getting to the industry? Yeah, I mean, I think we have a
very clear and obvious answer to this. And it's like, take what we call the AI engineer and challenge.

(42:50):
This is this is the thing that we put out there that's going to force you to build ship and share
your first application. So don't go learn Python first, don't go learn machine learning first.
It's like, just go build your first chat bot using an API key, using GitHub, using actual version
control. Like everybody says, yeah, yeah, I know get like, okay, show me. I don't use get every day.

(43:15):
And there's always this, you get to get over the barriers to this traditional software engineering
from day one. And you're not going to get it by going to studying Python on course. So we've got
the AI engineering bootcamp challenge. That's our answer. And if you start there, you'll know what to do
next. I think that's good advice. Yeah, I mean, I think that just building something will teach you so

(43:40):
much because I think the problem with following a tutorial is it's kind of like a problem that's
already been solved, right? But there is no kind of like sometimes like solving like unsolved problems
where the money's at. And you want to be able to pick a thing, solve it, and just kind of bang your

(44:03):
head against the wall until you get that thing working, right? And sometimes like that might mean that
you're able to follow a tutorial. But then sometimes that means you're like following like random
like blog posts and like random comments that somebody made on some stack overflow post with like
the link to Reddit, which linked to some like deprecated blog to find your answer, right?

(44:25):
So I love this. But you have to get to that bang your head on the wall point because once you're
there and you're smashing your head against the wall, you have to make a decision. You have to say,
am I going to keep doing this? Or should I just leave and go do something else with my life? Because
those two things are equally valuable for somebody trying to figure out what the hell they should do next.
And that is where I think exactly if you just start doing it, you're going to find out, is there

(44:49):
something I want to do with my time on a regular basis? And a lot of people, the answer is no today.
And that's fine. But you know, you'll figure it out much, much quicker. You'll go on self-study
in a whole bunch of things. Yeah. And for sure. And I think that just like to the listeners who are
not in software. So I think there's a lot of software development experience on this table. I will
tell for at least my experience, I spend probably 80% of my time just banging my head against the wall,

(45:13):
trying to understand why something is broken, not compiling, it won't work. Because like the easy
stuff you can do quickly, right? It's like the hard stuff that takes all your time. So if you are in
software, you will spend the bulk of your time not following some sort of the toy or being like
a jupiter notebook or something like you will spend your time like a banging your head against the

(45:36):
wall and trying to get it to work. So if that is exciting to you, you know, you see this in our classes,
don't you? Yeah. And I mean, one of the things that I want to make sure we're saying a lot,
so debugging is basically the whole job and every job that you work in software because
if debug you wasn't necessary, AI would already have our jobs. But getting through that experience

(46:02):
is not difficult. It just takes persistence. And if you can persist, you'll get through that. And then
you'll look back to how far you were away when you started, right? You'll see how far you've come,
and the kind of thing where you're like, I can't do that. And then you just do, right? And again,

(46:23):
it doesn't take like, you don't have to be super smart. You don't have to be technically gifted.
You just have to be willing to put your head down, do the actual work, do the hard boring tedious part.
And then when you get through that, right? It's rewarding beyond measure. One season, a quarter at most.

(46:47):
Real effort, you know, you'll figure out how you get to the other side. I love it. I think that's
great advice for our listeners and maybe a good note to end on. Yeah, I think so. So before we end,
is there anything that you want to mention that we didn't ask you about? Like anything you want to
pitch anything, the floor is yours. Well, I mean, if you'd like to accelerate your journey into

(47:11):
a generative AI, then I would highly recommend, first of all, joining our Discord and other folks that
are building shipping and sharing every day, there's some real legends, some real inspiration
in there that you can get. And then if you really are going to benefit from accountability,
from working with a cohort of peers, you know, go ahead and consider our bootcamp. Now you can't

(47:32):
just buy it. You literally must complete the challenge before we'll even consider you in the bootcamp.
So, you know, go ahead and start with the challenge. Go from there. If you're just looking for
keeping up-to-date finding information, making sure that you're, you know, you have access to
the latest and greatest tools that are coming out as instructed by myself and Dr. Greg here.

(47:58):
Check out our YouTube. We go live every Wednesday. Like and subscribe. Like and subscribe. Hit the bell.
We always, we're talking everything to do in this space from down to the GPU all the way up to
multi-agent. And you'll hear stories of transformation from our learners every week. So,
give you a little inspiration. That's wonderful. We'll try to put all of that in the description.

(48:22):
So, yeah, again, thank you everybody for listening and we'll see you in the next one.
Advertise With Us

Popular Podcasts

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Therapy Gecko

Therapy Gecko

An unlicensed lizard psychologist travels the universe talking to strangers about absolutely nothing. TO CALL THE GECKO: follow me on https://www.twitch.tv/lyleforever to get a notification for when I am taking calls. I am usually live Mondays, Wednesdays, and Fridays but lately a lot of other times too. I am a gecko.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.