Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
The enterprises do not work wellwith open source.
Enterprises need a very mature solution.
When you build an infrastructurecompany, you need real
engineering right? You cannot wipe code.
There are many factors and requirements by enterprise
companies like security, stability and everything needs
to be top notch. We are back on Chain of thought.
(00:25):
I am your host, Connor Bronson. Today we're joined by a guest
that many of you may know and who I've been following for
quite a while, RMS Gritsunas. RMS is someone you've probably
seen on LinkedIn, maybe on X if you're interested in an AI.
You've absolutely seen some of his incredible charts that he
shares these graphics and the insights that he brings around
(00:47):
everything from the observability stack for AI to
how AI agents are being built atthe enterprise.
He makes some really unique and interesting content and has a
deep background in the trenches of data and AI, having been
everything from a data analyst, machine learning engineer to ML
OPS engineer and chief product officer, Neptune AI.
(01:09):
Today he is the CEO and cofounder of Swirl AI Consulting
and building agentic systems forclients.
He's a prolific content creator,as we've already mentioned, and
he's launched his own course, The End That to End AI
Engineering Boot Camp to train the next wave of builders.
RMS, it's a pleasure to have youon the show.
Welcome to Chain of Thought. Hi, Connor, and thank you for
(01:30):
having me here and super glad tohave a conversation with you.
I know we'd plan to start on some more general topics, but
given that you just told me thatyour first cohort of your end to
end AI engineering boot campus wrapping up right now, I'd love
to hear from you. How has the cohort been?
How's the the new course going? I do believe that it is going
(01:55):
really well. There are a lot of learnings
that I'm bringing away from the first cohort and and I'm
definitely bringing into the second one.
So first learning probably is that there is a lot to cover
when it comes to AI engineering,especially end to end AI
engineering. Then you're covering systems
from the very simplest ones, then wing to RAG and GENTI, RAG
(02:18):
agents, world agents, communication protocols,
deployment, observability evaluation, right.
So probably 8 weeks is not enough if you are also working
full time job at the same time. So now the, the realization is
that yes, you can deliver the material in eight weeks, but a
(02:39):
commitment for a learner who is actually learning those
materials should probably probably be around, I don't
know, six months. So the materials should be
reviewed after the boot camp ends.
So I think that's one of the realizations.
But in general, it is a very hands on boot camp and it seems
(03:03):
like people like it and I think that everyone is bringing a lot
of hands on experience from it. Do you think there would be a
challenge with the pace of innovation and change that is
occurring though in AI if you were to do that type of six
month course where oh look, so much has changed around the
frameworks you're applying and there may be new tools you want
(03:24):
to bring in. How do you approach this given
that you have like your own day-to-day approach to learning
and then these cohorts that you're also working through?
So what would actually meant is that the course, the cohort
itself, would still be 8 weeks, so it will not spend 8 months.
But if you want to get deep intotopics and properly apply them
(03:49):
in practice, you should probablytake around six months and spend
a number of four months on top of the eight weeks of the boot
camp to properly learn that. Now, since I am kind of
providing also lifetime access to the materials, so the next
cohorts there are, I update the materials are also available for
the previous learners. Go back, get up to speed with
(04:11):
all of the changes in the industry.
And you've mentioned observability and evaluations as
key areas within the AI space. And increasingly, I think we're
seeing more and more conversation about this.
And you, in fact, mentioned to me that you've considered
starting a company in that area but ultimately decided against
it, calling the market too packed.
(04:33):
Could you walk us through that thought process and how you were
thinking about the AI infrastructure market today?
So when I decided to try and look into the space I was
leaving Neptune AI, it was kind of natural for me to try and
maybe build something very similar in the similar space
more in the application layer. That's why the initial decision
(04:57):
to actually research the space. But then kind of even after a
few few of the first weeks, we kind of found 20 plus companies
that are doing observability andthe loss, right?
And apart from the hyperscalers who are also doing that as well,
right? So there are quite a lot and all
(05:19):
of them are covering also tryingtry to cover end to end because
it's really hard to pinpoint what will be really important in
the next few months. So I think that's why all of
those companies are trying to doobservability and evals and
experiments and the prompt registries and maybe some of
(05:39):
them routers as well, right. So connecting the end to end
traces and having end to end observability and evals of the
system. So there was really no probably
no unique space to tackle that hasn't been already kind of
picked up. And then probably there are also
20 companies in stealth buildingthose solutions and half of them
(06:01):
are open source and available for free to host.
Speaking of open source, you areone of the folks who has
correctly anticipated the need for agent interconnection.
I've seen you talking about it for months now, maybe even back
into 2024, and that space is obviously now being tackled by
open standards like a 2A and agency which have been donated
(06:22):
to the Linux Foundation, obviously MCP.
How do you see the open source movement within AI changing the
calculus for founders who are trying to build venture backed
companies? Open source few for a few
thoughts here. So currently in the 1st place.
Currently it is not easy to finda company and be successful
(06:43):
right? So you either build something
that grab some attention really quickly and you kind of reach
some sort of escape velocity within the first months once
after you start building, or youhave really strong backing from
the sea. So you have a lot of money and
then you can actually build a big team and roll enterprise
(07:04):
operations properly, roll out enterprise operations properly
or you have distribution Day 1. So I guess those are the only
ways how you can easily start a successful startup today.
When it comes to open source, I'm a strong believer in open
source, but it is really also hard to make an open source
(07:29):
product a profitable business. Yeah.
And enterprises do not work wellwith open source, right?
Enterprises need a very mature solution.
And that's also the reason why it is hard to build
infrastructure company because when you build an infrastructure
(07:53):
company, you need real engineering, right?
You cannot wipe code because there are many, many factors and
requirements by enterprise companies like security,
stability, being able to deploy on Prem that say the support
that comes with it, enterprise features and everything needs to
(08:15):
be top notch. And then for some of the
companies also need the ability to do hyper scaling on your own
side because they are big companies, they might be
ingesting a lot of data and if your infrastructure is not
specifically meant for that, then you will not be able to
succeed in enterprise space. So do you see this
(08:36):
differentation between the capital rich folks or the folks
who have at least raised a lot of capital to take on
infrastructure companies kind oftaking a very different approach
from people who are, as you put a vibe, coding their way to
success and maybe using their own built in distribution to try
(08:56):
to quickly generate revenue. How do you see these dynamics
playing out in the market? And yeah, would love to explore
that with you. So when it comes to wide coded
tools and products, I think thiscan mostly be successful in B2C
type of products because you're quickly capturing attention with
(09:18):
some sort of a new idea from thebroad public.
And when it comes to building Enterprise Products, I still
think that we see backed companies with large amounts of
cash will be the winners unless you really build something
really, really great really fastand then you get a lot of money
and then you hire hundreds of engineers to refactor your vibe
(09:40):
coded. I know what what's your what's
your take? On this good question, I I think
you're spot on that it really depends on the space.
Every time I see someone trying to vibe code their way to a
business solution, I just assumemaybe unfairly that it's not
going to scale that. OK, sure.
This may work for a certain dev tools segment then or or or
(10:03):
maybe a a single ICPU if you have a the ability to just kind
of put a credit card in. But once you start going up
against competitors in larger deals and there's actual
frameworks being applied about OK, like how's your how are your
security and compliance protocols?
Are you meeting our needs in these specific areas?
Do we we have the role based access control we need all The
(10:24):
thing is that enterprises or even just larger scale UPS are
looking for. I would expect you to see some
some major challenges. And I do think there's a
potentially viable path and we're maybe seeing this play out
a bit of vibe code your way to acool demo, try to raise money
off that cool demo and then actually hire engineers to
(10:46):
create the whole thing. And I wouldn't be surprised if
there's quite a few companies doing that today.
I'm not going to name names, butthat also creates a lot of
hidden risks for founders who choose this kind of high
intensity path where it may helpyou get to that raise and maybe
that's what you need, but there's a lot of pressure that
comes with that as well. But but I guess if a really
(11:08):
strong engineer is doing white coding and usually we are not
doing white coding, we are doingassisted coding, very efficient
way, then maybe this kind of a tool coded in this kind of way
could actually succeed, right? You can build something good,
then we hire a team quickly and then once you get to VC money
and then you scale out. I think it's a viable approach
(11:29):
if you have that engineering talent within the founding
group. If you, if you are coming in and
you're a non-technical founder and you're expecting to be able
to just vibe your code, your vibe code, your way to initial
success, I would be hesitant because I feel like you'll
induce or you'll introduce so many issues because I it's your
point. I think you, you need to treat
(11:50):
like a partner. You can't simply just say yes to
everything there. It's very easy to refactor
things that in the wrong direction and introduce a ton of
long term challenges. So, yeah, I do agree with you
though, if if someone comes in and has technical expertise and
understands what they're doing, they want to use AI as a partner
today, I think that's a fantastic use case for it for.
(12:12):
And I think we'll see and are already seeing, but we'll
continue to see a lot of folks take on founding with AI as a
key partner in their initial build out in their demo.
And then I think the challenge will be, OK, how do you
translate that to we're a scaling company and obviously
there's a million people have written books on that.
(12:33):
I'm not going to try to pretend I'm I'm Paul Graham and say, oh,
here is here's the approach you want to take of that.
But the inception point of goingfrom idea to MVP feels like it
needs to just move so fast today.
And I think it's a big opportunity for founders, but of
course brings a lot of pressure as you start bringing in those
(12:54):
VC dollars and another backing. Even if you have raised and you
know, successful vertical app and maybe even get to 1,000,000
to an ARR, there's also this copy risk that's being
introduced of, oh, this could becopied easily.
Now it's a lot easier to just say, oh great, like let's let's
take it. What are our rivals doing?
(13:14):
And we're going to do the same thing.
How do you create that defensible note?
That's where my mind's at now ofit feels like having a clever
idea, I mean, or an early tractions.
I mean, obviously it's never been enough on its own.
You still have to execute. There's so many things that have
to go right. But I wonder if the the Moat of
technology is actually less defensible today.
(13:37):
It is. I guess it is.
And especially you, you you touched this point previously
like when you're doing an enterprise sale, right?
So it's not like you're just coming in solo a single company
and trying to sell enterprise sales process is a very kind of
known has very known patterns there.
(13:59):
You would be benchmarked against10 hours and the vest would be
chosen right. And then there are few risks
either you're already entering avery hot market.
So and then how do you become better than ours?
Like even if you like, no one will pay just because you are a
known person, right? That they will pay because the
(14:20):
product is good and better than ours.
We have need to have all of those features that we need and
implemented in a more efficient way, way than your competitors.
And now when this copy risk exists, then a new product can
very quickly be kind of by coded, you could say by coded.
(14:41):
But if you have a very strong engineering team with AI at
their side, so maybe 5 strong engineers, then you know, it's,
it's very easy to build even forexample, observability infra
tool, right? If you're a really great
engineer and you're not the a single engineer in the company
and then you use AI to some extent, you can very quickly
(15:02):
build a observability and eval tool that's rivals tools like
Landsmith, for example, or Langfuse, right?
Or yours, I don't know. I I never used Galileo, but
maybe? Try it out.
That's not what you think. Yeah.
So I guess let's make this practical then.
If you were advising another founder today and they have an
(15:26):
idea and like, how how should I get started?
How should I approach this? You know, we've talked a bit
about a couple of these flash points that are now occurring.
Where it's easier to get to MVP,it's maybe easier to copy.
What would your advice to that founder be who is coming in with
the unique ideas is trying to think through how should they
should approach it? So I think this is also how many
(15:50):
VCs think is that it's not all about the product, right?
It's very it's, it's a lot aboutthe founder herself or himself.
And what this really means is that can that person very
quickly pivot and adjust to the changes in the market?
So usually the first idea is nota great idea.
(16:12):
So I would really try and maybe figure out how good their
pivoting ability is. Then this is the first one.
The next one is how do you sell the product?
How do you market the product? Because that's probably even
more important than the product itself, at least at the very
beginning, right? The distribution and reach is
(16:32):
key, so I wouldn't even be too strict on the idea that the
person is trying to build and rather see which part of the
market those founders are targeting and maybe looking a
little bit back into their histories, what they have been
doing before and how they think about the industry in the first
(16:57):
place. I completely agree.
I'll say my most successful agent investment so far have
both been instances where the founders have pivoted and said
we didn't quite have this right initially, but we saw the
potential of, you know, how smart and driven and thoughtful
these people were and they foundthe path.
(17:18):
And I think that's true of of most folks weren't doing
investing is, you know, founder first.
Who are they? Will they actually take you down
this path? Do they have the the grit and
determination and the, you know,mental ability to think outside
the box, but then also bring order to the chaotic ideas that
they are putting out of the world?
(17:40):
And no, the real kind of decision points probably come
once you actually put out your first idea into a market and
then you get the feedback and then you get maybe some paying
customers and then you kind of figure out what needs to be done
next. I think that's one of the really
exciting parts for a lot of entrepreneurs in the space right
now, which is that it's so much easier to get the MVP and start
(18:01):
getting that feedback faster. So you can say, oh, I did this
completely wrong or oh, great, we've got something here.
Let's let's see where this goes.And it's creating this intense
market pressure for speed, both within larger companies and for
entrepreneurs. How have you integrated this
(18:22):
focus on teaching what's important and, and this idea of
focus, which I I would argue is increasingly important in a
world where there's just so muchhappening and so much
information and the cost of generating code or generating
content is drastically decreasing.
How do you bring that into your course and your your work as you
(18:45):
help advise and and understand what folks should be focusing
on? So, so my course is really
focusing on fundamentals. So it's not not not tool focused
course, even though we are usingpopular tools, of course.
But taking a simple example likeI'm using land graph throughout
(19:10):
the entire build out of the let's say the capstone project.
But at the same time, I'm not using Lang Smith, I'm using
instructor for structured outputs and I'm using instructor
wrappers within and inside of Lang graph notes.
It's kind of teaching people that the structured outputs are
important. This is how you can achieve
(19:31):
that. This is how it works.
You shouldn't rely on those abstractions that hide the
structured outputs are actually achieved.
Also not using tool bindings thethe frameworks right, actually
prompting the LLM itself by produce tool suggestions by
(19:52):
giving full descriptions as so anyway, so the main point is
that the course is specifically about teaching fundamentals and
real infrastructural patterns along the way.
For example, observability is being taught a day one and then
we move with observability and evals throughout the 8 weeks and
(20:13):
I teach how to evaluate each different system.
Yeah. So I think fundamentals are
really important. Now we are also seeing with the
flop of GPT 5, I think it is a flop now already, right?
It's not as great as promised. So I think that this iterative
improvement of LLMS is happeningis starting to happen, right.
(20:35):
So AGI might not be so close. And I think the old which are
old things which are two years old are still very important in
building injecting systems. So properly understanding how to
context engineer and by the way,context engineering I think is a
very, very important topic and very often be overlooked.
(20:58):
Then learning how to build agentic systems because it's not
as easy as it looks like when you're building demos, you're
not doing any context engineering usually.
And yeah, so these fundamental things I think are very
important. Are there particular gaps that
you're noticing in the experiences or fundamental
(21:19):
skills of folks who are either trying to grow their skills
working with you or people who are out in the market today?
You mentioned observability evaluation.
Obviously, we share the viewpoint.
Those are crucial and need to beday one in instrumentation
pieces that continue throughout the entire life cycle of your AI
application or agent. But I I'm curious if there are
(21:41):
particular gaps that you're noticing out in the market today
where people aren't really paying attention to fundamental
skills. So depends on what you which
part you're referring to. Is it the boot camp itself?
Because the boot camp is naturally not for the top top
engineers. Yeah.
I think I'm looking more broadlyhere.
If like like what are you seeingin the market as far as
(22:03):
potential gaps? Definitely evals is still a gap
right? So this is probably it's not an
emerging topic, it's already an old topic, but not everyone is
adopting the practice of eval driven development yet, even
though it is crucial in buildingthese systems.
(22:24):
Then I think over reliance on some orchestrator orchestrators
is bringing some problems eventually once the systems are
starting to are starting to mature because then you need to
go back to base software engineering for using any
wrappers. So people are taking too long to
(22:46):
ship in some cases. I think the first entities are
not being rolled out soon enough.
The human feedback is not getting brought back into the
system soon enough. Yeah, definitely business
understanding, like people are not people building those
systems are not always very close to business and they want
(23:10):
to build something shiny and adopting some cool tech, right,
But not necessarily solving a business problem.
And then projects and products start being deprioritized
because we are not showing any business value.
Teams are building agentic systems in the basement for five
(23:33):
months, then they come out and the system doesn't solve real.
Like we just said, like you haveto just get out there faster and
start getting that feedback or else you're creating a risk
point for yourself that you could be building in a silo.
Most most companies don't succeed that way.
One more thing, So I think that we are still early in MCP days
and sometimes I think MCP is being overused it, it brings
(23:58):
most value when you have remote service, right.
But I don't think that we are atany point, we are not yet there
where we can actually utilize and remote MCP service properly,
at least without significant engineering.
Yeah. So sometimes just using tools
within your code is also a good idea.
You don't need to have MCP for everything.
(24:20):
Yeah, you don't always have to be using the new hotness.
It doesn't have to be shiny. Sometimes fundamentals are
fundamentals for a reason. You mentioned context
engineering is one point of, of emphasis in the market today.
And I think there's been quite abit of conversation around, I
(24:40):
mean, everyone, the folks who are saying, oh, prompt
engineering is the way and a lotof folks think it's a short term
solve. And then vibe coding and context
engineering, there's, there's been all these terms thrown
about and different approaches that have been discussed as this
is the new approach that we should be taking on here.
And I'm curious from your perspective when you talk about
(25:03):
context engineering as as important and using a system
focus lens, what would be your advice to engineers who are
maybe under utilizing this and or haven't explored it yet?
OK, so the first very important point is probably that I well I
was thinking prompt engineering,I was always thinking context
(25:27):
engineer from day one because ifyou are if you are building
agentic systems, you cannot build agentic systems without
context engineering. So for agent agent builders,
context engineering equals prompt engineering because you
need to store the actions somewhere.
You need to compress the actionsbecause the context window is
just exploding. If you are building systems with
(25:50):
multi turn conversations, I don't even have like OK, so I I
do have suggestions on what needs to be implemented while
performing context engineering. But if you are building a multi
turn agenting system, you have felt the pain, I believe, like
the agent running for too long, the context window exploding to
(26:13):
a few 100,000 tokens per single run.
And then you need to, you know, have maybe five runs each
200,000 tokens for input. Then it takes 50 seconds to
complete. And that's a childhood.
Yeah. So where to focus is probably
(26:36):
the main ideas that I love in context engineering is the
ability to compress the conversation history.
I think because that is definitely being able to also
discard unnecessary actions or store them in so-called scratch
pad where you can later on pick it up from maybe writing all of
(27:01):
the state files to the disk, butthen picking only what you
really need for specific notes in your system as needed.
Then when it when it comes to tool usage, it's just regular
pattern. So we cannot avoid adding those
additional tokens inside of yourprompt, because if you don't do
that, then your tool calls will start erroring out, so you will
(27:24):
not be able to properly retrievestructured outputs correctly,
right? Yeah, tool optimization has been
a big area of focus for us. I'll say when we've been looking
at agentic reliability and observability is understanding
how can we better suggest opportunities to improve tool
(27:45):
usage within apps? Because it's a super common
problem as you're alluding to here is like one of the key
places will fall as like an agent will just try to use the
wrong tool over and over. Or it will get stuck in this as
the loop of trying to solve thisproblem without going back to
first principles and thinking itthrough.
And so yeah, there's there's some really obvious failure
patterns to address that. And and what kind of low hanging
(28:08):
fruits do you have to suggest tothe audience?
Yeah, well, I'll say check out Galileo dot AI and we go or
maybe maybe we'll maybe we'll contribute maybe we'll
contribute A lecture here in your next boot camp.
Actually that happy to I'm happyto chat more about that.
That'd be fun of. Course, let's do it.
Yeah, we've been, we've been doing stuff around 2 layer
(28:29):
optimization intro in the platform.
So basically looking to see if we can identify from an agent
graph or from a, you know, a project with a bunch of traces
like, OK, like what's consistently happening here?
If we aggregate these different traces together, can we identify
and basically use our inference engine on the back end to
(28:51):
suggest fixes to agents where wecan say, oh, like we're seeing
this tool issue maybe for example, your application is
supposed to be booking you a trip and it is looking at, you
know, Trivago and Expedia and it's always trying to use
Trivago first even if it fails and it's not thinking about it's
(29:11):
secondary option to book a hotel.
How do we change the weights of that?
So we've been doing some of thatthrough like automation within
the platform through what we're calling our insights engine
where we're basically feeding the kind of evals data set
that's people have established in the platform.
So the, their metrics, they've created the traces, their logs,
imitations they may have into our, our judge and the judges
(29:35):
and suggesting stuff. And then you can provide human
feedback on the suggestions. So looking pretty well so far.
I, I think there's a lot more potential there.
Honestly, like it's, we're just scratching the surface.
It's not something where I, it'slike I, I think our, our long
term goal would be something where it's creating this
automated feedback loop where it's just like, Oh yes, let me
go vibe code my app through this.
I'm a lot of vibe code with vibeimprove.
(29:56):
I guess I was just like, OK, great.
Like here's the eval, here's theimprovement.
Let me take a reader really quick.
Cool. A great check, but that's that's
the that's the longer term dream, I think.
So this is almost like a automated error analysis, right?
Yeah, we're trying to automate recalls analysis for errors.
And it's not, it's not 100% solved yet by any means, but
(30:20):
we're starting to make some realstrides with that.
And you'll, you'll see us start to do it through an MCP lens
too, where you can access this like these different catalogs of
different error types and, you know, pull it in through MCP and
just do it through IDE too, where it's like, oh, here,
great, we ran the eval. Here's a suggestion, you know,
(30:41):
awesome, let me approve it. But that's all.
That's all very much on our our beta testing side of things
right now, so. Because I would say that the
observability tooling in generalis great, right?
But it's not the hardest problemto solve, right?
When you're building agentic systems, the hardest problem to
(31:01):
solve is to actually create those eval data sets.
Yeah. Really, really hard.
So from what I hear is here thatyou're not only targeting the
actual improvements to the system given the eval data set,
but also somehow clustering the traces themselves.
Yeah, We've got a couple different ways for doing it.
(31:22):
Part of it's through just like new views.
So we have like an aggregated graph view, for example, where
you can look at like multiple traces at once and kind of see,
I wish I had a good example handy right now, but you can
just see like, OK, like what paths to the agent take
throughout this, How much are they overlaying?
Where, where are their problems?And then we're trying to do that
(31:45):
in a much more automated way, asyou point out.
So I, I'm, I'm really interestedto see where it goes.
I, I, it's the stuff that gets me excited about the platform.
It's like, OK, like observability is a base layer,
get that right, OK. And then try to do evals really
well. And then ideally that should
feed an improvement mechanism. And right now we're that
improvement mechanism is AI engineers going in and kind of
(32:07):
tuning things themselves. But OK, the more we can do to
just make it really easy for them to go, Oh yes, great.
We see this, it's it's identified very quickly.
We can go solve very quickly. I think that's where this whole
evaluation and durability space is going to really expand to
driving improvement. I agree completely.
Like this is really the a piece of a puzzle which is currently
(32:31):
kind of missing because just. Nebulous too much?
Yeah. Yeah, too much.
Too much hours are going in intofiguring out the eval datasets
like 70% of the entire project sometimes.
Yeah, we're getting faster at it.
And I think our, the fact that we have our Luna two small
language models fueling some of our like eval metrics.
(32:54):
I mean, the challenge with thoseright now is maybe not the time
as the episode comes out, but like the for the moment, like we
have to fine tune those models to get them really accurate.
But they're much cheaper and faster than if we're using an
LLM call all across the board. So you can do this much more
cheaply and much more effectively.
And I'll tell you though, Adam may have to edit this out
(33:14):
depending on what's up. So it comes out like we are
going to go live with anyone canjust fine tune their, their
metrics using SLM as a platform,which will make things go way
faster, hopefully. But this is all again, it's,
it's I'm, this is, this is the edge of the platform like, oh,
we're not quite done with it yet.
We're hopefully figuring it out.So that's the exciting stuff.
That's the fun stuff. And what is your take on all of
(33:36):
these open EI open source modelscoming out now so you can
actually pick that one up and fine tune it?
I'm a big fan yeah. I, I think, I think that we
should, I'm, I'm very pro sourcemodels and in part because like,
I think if we don't open source most models, we're going to have
a standpoint where a couple of companies just going to
monopolize in the long run. And I think that would be really
(33:58):
negative for the broader economic picture and broader
ecosystem of software. So I, I think the, the
opportunity with smaller models to do more specialized tasks and
for fine tune open source modelsis huge.
And I'll say like our Luna 2 models were originally based off
of like llama models that we took and, and redid and fine
(34:19):
tune. So, yeah, I, I think the, the,
the feature idea of having a model that is cheap to run and
can run on your, your own hardware and really enable
people to have cheap, excellent inference that is like
fine-tuned to their tasks. Is, is very exciting to me
because I, I while I know it's not going to necessarily, it's
not going to like solve AGI, right?
(34:40):
Like we much, much more reasoning, much more inference,
way more GP is thrown at that problem.
I think it can tactically solve a lot of problems as long as it
is fed initially by these broader frontier models.
So we're spending all this moneyto get them, right?
Yeah. But so I don't know what's your
take? So I believe that there is a
need for fine tuning even in agentic systems.
(35:04):
We need to fine tune for specific routes, but there's no
butts. But I think that these open
source models by open AI will bea great kind of leap forward for
all of this research. I mean even thinking getting my
hands on one of those NVIDIA sparks, maybe putting it on my
(35:27):
table, then playing around open source myself.
I'll also say we've been, I'm sorry, go away, go ahead.
Yeah, none. I'd say so we, we did this, I
don't know if you saw our research about our agent leader
born. So we did an original version of
this back in February and then we just did an update recently
basically looking at tool selection quality for different
(35:47):
LMS across a variety of agentic scenarios that were aligned
towards enterprises. So it was like, OK, here's a
like a finance scenario, here's banking, healthcare, insurance,
investments, telecoms. The idea of being OK, let's try
to actually identify like are these LMS effective within
customer support agents that have to be very specialized for
(36:09):
these different areas. So we looked at both tool
selection quality and then action completion with the idea
of being like, did they actuallycomplete what you want and solve
your problem? And honestly one of the most
impressive models we looked at and most recent round was Kimi
K2, which came out a few weeks ago as we're recording this from
Moon shot. And yeah, like Quen 2.5, there's
(36:31):
72 B and then Kimi K2 both did really well at our analysis.
So it, I think there is a big opportunity for episodes models
to feed a lot of this. And I'll say like probably by
the time this episode comes out,we'll definitely have added
these new open source open AI models too, because it's very
exciting to see that because it's been catching up.
And I'm, I know Gemini and Claude and GBT are going to jump
(36:54):
ahead again, but it's like, OK, like, let's, let's make sure
that we're not leaving the boatsbehind, so to speak.
Well, I've taken us completely off track here as we talk about
agents, but it's been a ton of fun here.
I, I do want to ask you a bit more about how you're thinking
through the future of the space.And obviously we've been talking
about open source a bit. We've been talking about the
(37:16):
fundamental skills and AI engineer needs.
But honestly, one of the things that really made me want to have
you on the show is how good the graphics are that you make on
LinkedIn and you make these complex AI concepts accessible.
And I'd love to understand as you peek into the future and
think, oh, what's ahead, what are the big misunderstood ideas
(37:38):
or emerging trends that you're excited showcase the community
next? What are you thinking about?
So there are few areas that I think are under represent
especially in content creation and one of them maybe is a step
back but it is data engineering for AI applications.
(38:00):
So connecting the data layer with the actual application
layer because no, there are there's a lot of talks, there
are a lot of talks about data engineering being kind of left
behind, even though data engineers are doing most of
their work to make these systemsactually run and then there is
no supporting content up on that.
(38:21):
So maybe I was thinking maybe I should actually step in the
right direction a bit because talking just about a genetic
system designs everyone is doingtoday, right?
It's really, really because it'sbecoming really, really boring.
Yeah. OK, well, let's let's dive in
(38:41):
there a bit. I mean, personally, I'll say one
of the things I've thought aboutwith data engineering is that I
almost feel like we are just recreating names for subtasks of
what data engineering does. With so much of what we've been
saying about AI for last year, what's your take on data
engineering and what needs to bedone to make sure it's getting
(39:05):
the attention preserves and alsothat it's being effective?
It has always, always been a problem.
I, I was data engineer in my career for four or five years, I
think also leading data engineering teams.
And even back then, machine learning was taking the center
stage and data engineering was never.
So it's either system design or machine learning or now AI and
(39:29):
AI engineering. But now I, I don't agree that I
don't agree with people who are saying that AI engineering is
just data engineering. That's not true.
Data engineering is about pipingthe data to where it needs to
live. And AI engineering is about
building those agentic system designs on top of the data that
(39:49):
you have. So it's definitely not the same
discipline. How to keep data engineering in
the spotlight? I don't know, to be honest.
I don't know. Like this is a.
Long problem that we are facing and the talking about data
engineering enough, maybe education, just the general
(40:10):
education, maybe just running boot camps about it because
people are are interested in data engineering for some
reason. It's simply not taking the
spotlight because it will never be hot.
Unfortunately, like data engineering is not the money
machine that we see they're looking.
That that does leave it a littleout of the spotlight.
(40:31):
It's true. You've got to find that money
machine to really get the attention you deserve.
Yeah, because the the data engineers are saving costs, we
are not really producing revenuein a sense.
OK, So what you do you do more clearly show I guess more graphs
showing the money saved, but help maybe but yeah.
But but money saved is the moneysaved is also not hot.
(40:54):
Yeah, Made is hot, right? No, it's really not hot today at
all. Looking at the burn rate of some
of these companies, it's like, what?
Yeah, All right. What what other predictions do
you have about what's called thethe next six months of AI?
Don't want to make you think toofar out because it starts to get
really blurry at that point. But as you think through, you
know, this massive wave of agentic conversations people are
(41:18):
having. And I would argue some of the
overhype that's happening on agents, because I'll say
personally, I'm, I'm kind of with Carpathy in this idea of
like, yeah, this is the year of agents, but also it's going to
be a decade of agents. Like we're not solving this
tomorrow, but what are the things you're thinking about
though as, as, as we head towards this next stage of AI
development? So six months, few months ago I
(41:39):
would have said that six months is really a very, very short
amount of time like a lot of things could happen.
But now we are seeing the slowdown of improvement of LLM.
So I think less and less stuff will be happening as we move
forward in this amount of time. So in general, I think that what
(42:01):
we will not see is definitely wewill not see any big leaps in
GII think what we will not see, we will not see distributed
multi agent systems in production yet even though
everyone is talking about A to Aand how it will change the world
because companies will start exposing agents and services not
(42:24):
not in six months. I don't I don't believe in that.
It's too hard to build the multiagent systems for various
reasons, but one of them being just regular observability.
It's it's really hard to instrument a distributed system,
especially when it is long running agentic systems behind
(42:45):
those distributed services. So what I'm really looking
forward to is coding agency coding CLI agents improving
because we are already quite good and I'm I'm, I mean the
agents that do not require writing any line of code.
So yeah, exactly because we are already quite good.
(43:07):
So I'm really looking forward onhow this develops.
And I had a chat with a very brilliant engineer a few days
ago about this idea of writing specifications and allowing your
agents to write your code. And then, you know, throwing,
really making micro services ideas come to life where code is
(43:30):
useless, right? You can throw it away and you
can just rebuild your entire service with the next iteration.
So I'm looking forward on the next iteration of these coding
agents because I think there's something here and it will
definitely change software engineering as it is.
Yeah, But the industry is now I think will start moving slower.
(43:51):
So six months is not that fun the time frame.
Do you think we need to basically take a new leap in
scaling as far as like massivelymore amounts of energy, tons
more GPU's to kind of take the next step?
Or do you think this is a inherent challenge with AI
hardware today and there's a need for a non transistor based
architecture or a new architecture?
(44:12):
What, what's your kind of take on what's going to get us past
this? Maybe a little bit of a wall
we're hitting. So we need to take one of two
sides, right. Either you believe that LLMS
will allow us to kind of move forward through this by barrier
that we are facing or you need to take a side that you will
even need a different kind of architecture in on the model
(44:34):
side which will not be LLM based.
So I'm rather on the second or on the second one.
I think LLMS will not bring us to AGI.
So it might not even be hardwareproblem, it might be the actual
model problem, model architecture problem.
Does it mean that this new breakthrough model architecture
(44:54):
that we will find will require as much of compute as we
currently are building for? I don't know.
But I think that we would need this kind of computing power for
inference anyway. So it's even with these kinds of
models that we currently have, Idon't think.
That's a good question. I think I agree.
(45:18):
I think we, I think it's too, I would, I would name three
challenges. So one, I mentioned energy.
I think we're going to need simply a lot more chips.
I think we're going to see a barrier in the next year or two
where we just realize, hey, we need more nuclear reactors, We
need more, you know, energy sources here.
(45:38):
Like we simply can't build the amount of data centers we want
to and have the type of power grid we want to with our current
set up. So I think there's a massive
investment either there. We're starting to see that with,
you know, Microsoft, for example, investing in reopening
the recently shuttered 3 Mile Island reactor.
We're seeing folks talking aboutsmall nuclear.
(45:58):
We're seeing people talking about investing in natural gas
in different areas. We're seeing solar green brought
up. But I think that's a limiting
factor we have to consider and that outside of the hyper
scalers, I don't see talked about as much as maybe it should
be. Maybe it's because it's not
really our problems if we're noton a hyper scaler when I follow
the data centers. But like we should be cognizant
(46:19):
of that. And that obviously aligns to the
secondary problem of like, yeah,we're, we are still going to
need more inference. We are still going to need more
chips. We're still, we're going to
need, you know, more data centers.
And I, I definitely think that'san inherent challenge today with
LLMS, but I've always been a skeptic about this idea of
getting to AGI just by brute forcing with LMS.
(46:41):
And I look forward to being proven wrong.
We'll see. But I, I agree with you, I think
we're going to need to fundamentally make some change
with the architecture. The current LM movement is
incredibly useful. SLMS are very useful.
We've made massive advances. But I, I don't, I mean, if we,
if we look at the basics of it, we're not truly creating
(47:03):
thinking machines the way that Ithink has been built at times.
Like we're creating machines that are doing fantastic things
for prediction and have incredible memories and data
sets and do unique things. But mostly they're predicting
what should go next versus I think creating that new.
(47:24):
And so it just feels to me like to make a true breakthrough here
and have a fundamentally different paradigm, We'll take
just that, like a, a new way of exploring the problem.
We're going to create a lot of business value.
We're going to create really interesting systems out of this.
We can solve a lot of problems, but I, I don't know that we're
(47:46):
going to truly redefine how thinking works.
And, and that to me feels like it is a different step.
So I guess it's also a question of how do you define AGI?
Like are like, do I think an LLMGPT 6 or Gemini 6 or whatever
could become a strong enough knowledge worker that it can
(48:06):
just solve most business problems that are kind of
inherent today? Sure.
I think that's that's possible in the current architecture and
I think that's a huge amount of value and worth shooting for.
But I don't think it's going to create a new model of physics.
I guess it's how I put it. There's by the way, there's one
more area that I'm kind of thinkthat it could work, but I forgot
(48:31):
to mention. So self improving agents, right?
So agents still kind of seem like they are the way to go, at
least short term as with the LMSthat we do have, right?
How do we make the agent rewriteits own code?
And I think this is where we also will need a lot of
inference and this is where a lot of the nuclear reactors
(48:54):
should be going into. But how do we make agent so we
can make it the right new agentsfor new agents like evolutionary
algorithms do, right? Yeah, that's, that's one
potential. But I think this will be an area
of research, active research in the next few years, at least two
years, maybe not six months, a little bit later, but definitely
(49:17):
looking forward to this one. Yeah, I think it's really easy
because there's so many excitingthings happening in space,
particular last school years to expect that, oh, we're going to
solve this immediately. It's going to be solved
tomorrow. And and in some cases I've been
surprised by the problems that are being solved.
But agents forms, and again, maybe I'll be proven wrong on
this by the time this episode comes out.
(49:37):
It feels like truly having self improving, self growing agents
forms that are very successful for solving an actual
enterprises problems. Not doing a cool demo, it's
going to take a bit of time still to get right because there
are these inherent challenges that we talked about.
Like, yes, we can vibe credit our way toward demo on this.
We can quickly, you know, pull something together, but actually
solving a business problem in a way that makes money is a is a
(49:59):
different challenge entirely. So Armas, I just want to say
thank you so much for joining metoday.
It's been such a fun conversation and I really
appreciated you kind of bringingus through your thought process
and talking about so many of thethings you're thinking about.
Where can listeners go to in order to follow you and learn
more about all the great stuff you're creating and thinking
(50:19):
about? So you can find me on LinkedIn,
you can find me on X and you canalso find me on my newsletter
which is newsletter.swirlai.com And very soon I will be start
also starting posting out YouTube videos.
So you can also start checking my YouTube channel, which is
(50:40):
still empty. It's already there for more than
a year. I think it's still empty.
So in the upcoming month for sure there will will be some
fresh videos coming in. Fantastic.
Well, I'm excited to watch thoseand I highly recommend to all of
our listeners. Definitely follow RMS on
whatever platforms you're activeon.
(51:01):
His works deeply valuable and his his thought process is, I
think, excellent. And while you're at it, make
sure you're subscribed to the chain of thought podcast on
whatever platform you're interested in here, whether
that's LinkedIn, you can follow Galileo or chain of thought
podcasts. Spotify if you're listening to
Apple podcasts, we love a ratingor review makes a huge
difference. If you're on YouTube right now,
(51:22):
you can see Armas and eyes smiling faces as we discuss the
future of AGI and everything else.
And just wanted to say thank youso much, everyone for listening.
And and Armas, thank you so muchfor joining us today.
It's been an absolute pleasure. Thank you for having me.