All Episodes

May 28, 2025 44 mins

AI in 2025 promises intelligent action, not just smarter chat. But are enterprises prepared for the agentic shift and the complex reliability hurdles it brings?

Join Conor Bronsdon on Chain of Thought with fellow co-hosts and Galileo co-founders, Vikram Chatterji (CEO) and Atindriyo Sanyal (CTO), as they explore this pivotal transformation. They discuss how generative AI is evolving from a simple tool into a powerful engine for enterprise task automation, a significant advance driving the pursuit of substantial ROI. This shift is also fueling what Vikram observes as a "gold rush" for middleware and frameworks, alongside healthy skepticism about making widespread agentic task completion a practical reality.

As these AI systems grow into highly complex, compound structures—often incorporating multimodal inputs and multi-agent designs—Vikram and Atin address the critical challenges around debugging, achieving reliability, and solving the profound measurement problem. They share Galileo's vision for an AI reliability platform designed to tame these intricate systems through robust guardrailing, advanced metric engines like Luna, and actionable developer insights. Tune in to understand how the industry is moving beyond point-in-time evaluations to continuous AI reliability, crucial for building trustworthy, high-performing AI applications at scale.


Chapters

00:00 Welcome and Introductions

01:05 Generative AI and Task Completion

02:13 Middleware and Orchestration Systems

03:17 Enterprise Adoption and Challenges

05:55 Multimodal AI and Future Plans

08:37 AI Reliability and Evaluation

11:08 Complex AI Systems and Developer Challenges

13:45 Galileo's Vision and Product Roadmap

18:59 Modern AI Evaluation Agents

20:10 Galileo's Powerful SDK and Tools

21:24 The Importance of Observability and Robust Testing

22:27 The Rise of Vibe Coding

24:48 Balancing Creativity and Reliability in AI

31:26 Enterprise Adoption of AI Systems

36:59 Challenges and Opportunities in Regulated Industries

42:10 Future of AI Reliability and Industry Impact


Follow the hosts

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠


Follow Today's Guest(s)

Website: galileo.ai

Read: Galileo Optimizes Enterprise–Scale Agentic AI Stack with NVIDIA


Check out Galileo

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠Agent Leaderboard

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:06):
I'm delighted to welcome you back to Chain of Thought and to
welcome two of my Co hosts back as well.
We've got the Co founders of Galileo, well, two out of three
today, Vikram Chatterjee, CEO and Ajandrio Sanyal, CTO.
Gentlemen, thank you so much forbeing here with us today and for
sharing your insights with our listeners of.
Course. Great to be here, Connor.
Yeah, likewise. And I think it's a particularly

(00:28):
poignant time to have you both here because not only is there a
lot going on for Galileo on the product transformation side, as
evaluation and reliability have become key themes for AI in
2025, but we're now a few monthsinto the year, We're recording
this right at the end of April, and the pace of AI innovation
hasn't slowed. In fact, it's maybe, to no one's

(00:50):
surprise, speed up quite a bit. Every week brings new models,
new techniques, new conversations about where this
technology is heading. Vikram, let's start with you.
What key themes are you paying attention to thus far in 2025,
and what are you seeing for the next few months?
I think the big theme for 2025 is being the move from

(01:10):
generative AI being this tool that was used for just you know,
chat completion and generation towards actual task completions
and actions. And in my head that means a lot
for the enterprise because it elevates the conversation from
being all about just building a bunch of chat box to actually
automating tasks in the enterprise.
Oregon actually getting higher ROI overall from an OpEx and

(01:34):
CapEx perspective. And basically I am talking about
agents. It's an overblown term to some
extent. And we can talk more about that
with the idea that these these language models are basically
becoming like these operating systems that are powering these,
these actions are being adopted for this use case rapidly.
So number one, that's brought ina lot of excitement and

(01:55):
interest, but that also in the beginning of the year brought in
a lot of skepticism around like,how real is this?
Can these tasks actually be completed by these AI agents?
What does this even mean? You're seeing a lot of
definitions flying around by, you know, all sorts of founders
around like this is my definition, this is my
definition. But at the end of the day, it's
basically task completion. What's really interesting from a
secondary team is because of this huge emphasis on one to

(02:19):
move towards task completion. Just given the amount of value
that can be uncovered for the enterprise using this over the
course of the next decade, there's a huge gold rush from
the middleware providers around that can be built right
orchestration system for this can be built out the right kind
of frameworks for this. You're seeing this huge rush
from the, you know, open AI cameout with this agent's SDK.

(02:42):
You have Anthropic came out withMCP because working with rules
with stuff, you have the, you know, the A to a from from
Google recently. Some of them are they don't mean
anything. They're actually just shallow
libraries. But from a marketing
perspective, everyone just wantstheir stamp out there around
like, you know, we have something to move from the
middle of our perspective. But what this means in my

(03:03):
perspective is it's, it's all coming when everyone's working
really hard, the industry is working very hard to make
something happen. It means that the entire system
is coming based to make the, thetop level applications around
task completion become more of areality.
I that being said, our observation has been that's not
there yet, but we're seeing thisin fix and starts.

(03:24):
Some startups have already managed to do this in a really
effective way. And some enterprises, some of
the largest banks I'm talking to, they've also managed to
actually build out, you know, these AI applications powered by
different kinds of tools. They build their own internal
tools, they've built, they use their own, build their own MCP
servers around that. It's becoming more of a reality.

(03:44):
And that to me is big, big shift, which by the end of the
year is going to become much more like how we talk about
maybe rag based systems or prompt solutions now, which is
very much the norm. This is the next level up in
that journey. And I think this is the biggest

(04:05):
shift and biggest unlock from anAI value creation perspective.
Auten, what's your perspective? Yeah, I kind of echo what Vikram
is saying for sure. I think there's a lot of, you
know, just overall high energy and excitement around emergent
infrastructure around the LLM, which was kind of the center of
the solar system. And but we've moved far beyond

(04:27):
that to, to Wickram's point around actually doing things,
doing tasks, taking action. But what's most exciting I think
for me is the we kind of have a standard of data exchange and
communication kind of being set up, the advancements in MCP with
MCP servers, etcetera. They are this attempt to sort of

(04:49):
standardized communication around the LLMS.
And we've already seen very early sort of early success with
certain frameworks. There's already a lot of MCP
servers which are open source out there and they host a whole
bunch of these tools which are augmenting the LLM to actually
take action. But then the question becomes

(05:09):
what is the accuracy of the action and whether the the
action whether we actually achieved what we intended to do
or not. So a lot of great advancements
made just in the last four months, agentic frameworks, of
course, being the biggest of them, but also, you know, a lot
of multimodal launches with GPT 4V and DeepSeek VL.

(05:32):
It was a paper they wrote in 2024, but they recently launched
it a couple of months ago. So all this will kind of come
together to just give a more enhanced sort of experience
around language models in general, but also kind of set
the standard for how to build a good, robust, reliable
generative AI applications in a more centralized manner.

(05:56):
And multimodal evals are increasingly a topic of
conversation as well, which is obviously something we are
working on internally here. I, I'd love to get thoughts from
both of you around the directionthat Galileo was planning to go
adapting to leveraging some of these new frameworks, whether
it's ATAA 2A agencies, ACP that were obviously partners with the

(06:20):
Cisco and others on or MCP. And I know you've been
experimenting there. You're already talking to the
team about it. And then obviously, we have
plans to launch more on the multimodal front coming soon.
What are the key things that people should be looking out for
as far as Galileo's products andthe vision that the two of you
have for it? Yeah, maybe I can take a first

(06:42):
stab at this. So some multimodal will happen.
I think there's enough proof of just very high quality agentic
frameworks already being released that support
multimodal. These models have a
significantly better understanding of multimodal data
and we are already seeing even the enterprise, we are seeing

(07:05):
image based Q&A systems sort of prop up in sort of in the
prototypical phase. There will be a lot of
acceleration on this front in the remainder of the year.
What it means for evals, I thinkit means a bunch of things.
Firstly, how do you adapt to sort of this new landscape of
how people are building end to end applications That includes

(07:29):
the ID itself in which you are actually sitting and building
the app, right? There's already in the zero to
one phase. There's a lot of potential for
errors and performance issues, and you know, we'll probably
talk about wipe coding a bit in a bit, but there's a whole host
of compounding issues that lead to the proclivity of errors in

(07:54):
Gen. EI apps right from when the
developer is building. So from an eval standpoint, it's
all about how do you adapt to this new pattern of development
going beyond say standardized logging and getting insights on
AUI. That's why I personally talk a
lot about evaluation tools and evaluation agents.

(08:16):
These are these composable sort of functions, if you will, that
sit right where you are. You can install in, you know,
eval agent as, as a tool in yourIDE in cursor that will
automatically fix the potential issues for you.
So eval kind of expands itself into this broader idea of AI

(08:37):
reliability because in the erstwhile world you talk about
evaluations and observability, and with the era of agents,
there's so many new components and you truly see end to end
systems building evals and observability sounds like a
point in time activity. And what you really need is a
sort of a broader reliability story where these metrics and

(08:59):
these little evals, they kind ofmeans to an end, which are in
the end, you want high quality end to end applications built.
But the form factor it takes would be in the form of things
like tools, things like agents where you have a separate MCP
server for example that only does evals which you can point
your app to and it will automatically start fixing and

(09:21):
taking action on your application.
Absolutely agreed. And we're seeing this
increasingly as these new roles begin to be established at
different enterprises, whether it's partners we work with like
HP and Databricks or elsewhere. We're seeing people who are
focused on evals, who are focused on reliability as maybe
even we're going to see AI reliability engineers, We're

(09:42):
seeing a IPMS come to the forefront.
And while observability and evaluation are crucial to how
they're getting their job done, really what we're delivering
with that is helping them to create effective applications,
create continuous learning and continuous iteration loops,
ideally ones that can be fed through automated systems and
have reliable applications that users can trust.

(10:05):
So I think it's a great point not in and Vikram, I know that's
something you're hearing a lot with the enterprises that you're
talking with as well. Yeah, because if if you think
about at the end of the day, what people care about, what AI
leaders care about, AI engineerscare about is they just want to
be able to build out AI applications, whether it's

(10:25):
eccentric or otherwise at scale in a very reliable way.
To your question before about multimodal, whether it's speech,
video, image or even if it's with basic text.
And what's happening right now is the systems are, these are to
use the word from Databricks, right?
The compound AI systems that that are being developed, these
compound AI systems are getting more and more complex.

(10:47):
But they from Databrigs had coined that phrase of compound
AI systems maybe about a year and a half ago.
And that was when we were in theworld of prompt engineering.
And we just had a model and a prompt and rag came about and
he's like, this is a compound system now.
But you know, if he looks back at this right now and it's
compounded even more. So the systems are becoming more
and more complex. And the reason I bring this up

(11:09):
is because from a developer's perspective or from AAI leaders
perspective in the enterprise, they're looking at this compound
system and thinking of it as comprising of a bunch of
different things all in one, right?
Where you have, you have a multimodel model there, which is
doing some kind of image to textgeneration, looking at a PDF.
And the next one is going to be taking taking desks to summarize

(11:33):
that. And then it's a 3rd and the 4th
1. So there's a, there's a series
of different kinds of steps thathave been be taken and there
might be some multi modality in there.
They might not be, there might be some rag in there that might
not be, there'll be multi torrent applications from a chat
perspective that might not be. So the systems are extremely
complex. What's your point about multi
agentic systems and how do we look into those?

(11:55):
I see that as like, you know, 1 aspect of this entire compound
system, which is why like from a, from a developer's
perspective and from a leader's perspective, at the end of the
day, they care about how do you build these reliable, super
complex compound AI systems Now,which includes all of these
different pieces, which is very hard to debug, very hard to

(12:17):
observe, very hard to understand.
And there are maybe 40 differentfailure modes here.
One small aspect of that is likeevals 11 aspect of that is real
time experimentation, real time monitoring. 1 aspect is like,
can I have real time protections?
Can I have complete cost transparency going to have a
governance engine in place here?So there's a bunch of different

(12:39):
stuff here like which which comes into place, which is why
the the overarching narrative from an enterprise perspective
is always being like, what's the, what's the AI reliability
platform I can use, which can bea partner for the long term for
these super complex compound systems that my team is building
and make sure that this works atscale.
That's where Galileo sits and that's where we've been helping
some of the largest enterprises in the world and indicated to

(13:01):
that. Absolutely.
And it seems clear that as enterprises create multi agent
systems, compound systems acrossdifferent multimodal use cases,
there's this increasing need, asyou've mentioned and as Austin's
highlighted for not just evals and as a point of time solution,
but as a mechanism that works throughout the AI development

(13:25):
life cycle and whereby you are causing improvement flywheels to
occur and you're creating systems that work better than
they did three months before, four months before.
And I love that product vision for the company as this
reliability platform for AI engineers, AIPMSAI builders
where that for their names may be.

(13:45):
Are there particular parts of that product vision that you
would want to highlight that youexpect to see come true for
Galileo over the next 6-9 months?
Yeah. I can I get a quick stab at this
spread. Like there's some core beliefs
we have gone are just based on what we're seeing in the
enterprise and we're seeing across our customers and those

(14:06):
beliefs typically translate intoproducts eventually.
One of them is this notion of, you know, just overall AI
reliability and production. And that aspect means more of
our it pertains more to the enterprises that already are in
production that require real time scalable guard railing.
They require a wave such that inregulated industries,

(14:28):
especially, every single priority that's coming going,
that's coming into the system, as well as every response is
coming out and everything in between needs to be guard railed
based on specific kinds of guardrails that are built for their
specific use case and do that atmillisecond latencies and a low
cost. So there's definitely that
aspect, which is 1 core belief that we have there.

(14:48):
You know, EI reliability and production is going to be a
really, really big thing and it's going to become even harder
and harder as a problem to solvefor enterprises as you go along.
So that's, that's one as big aspect of things.
The other thing that we've been thinking a lot more about from a
belief perspective is there is abig measurement problem in, in
the world of AI. We've always thought about that
as our North Star since the verybeginning of the company.

(15:12):
And I mentioned this in every single time I talk to an
enterprise that we there is no F1 score that we had in the NLP
world for classic classificationtasks in the world of generative
VI and even the F1 score wasn't a very good metric.
Right now regenerative VI, it's,it's all of that is gone.
So how do you actually build a high quality system if you can't
measure it properly? So a very large part of what

(15:33):
we'll be building out is going to is going to revolve around
tripling down on our on our lunar metrics engine that we've
built out, which is basically the factory where you can create
metrics, you can test metrics, you can make them low latency
very easily and make sure that they're high quality at scale.
It auto adapts. So there's a huge emphasis
around just R&D and algorithms and infrastructure on the

(15:55):
metrics side. And all of that collectively is
our lunar piece just to solve this measurement problem of AI,
right? And all of that is going to
power the your point of the SLDCof the AI application cycle all
the way from the offline experimentation to CIC to online
to real time. All of that is powered by this
core platform. That's the core of what we're
trying to solve. That's the that solve the

(16:16):
measurement problem at scale. That's the second belief that we
have. The third belief that we have is
these complex AI systems need tobe deemed now.
What does that mean? If you look at most systems
today, the focus of the industryis becoming more and more on the
output because the output good. Is the output bad?
Is the system fine? Is it hallucinating?
What's happening now is because these systems are getting so

(16:38):
complex and compounded, it's also becoming very important to
have help the developer understand the shape of their
system and understand where the failure boards are.
If you think of a multi agentic system where you have like
hundreds of different cases, dozens of potential tools to be
called, no one's going to go trace by trace by trace and try
to look for where things are going wrong.
It's just not going to happen. It's it's, it's a fool's errand

(17:00):
to look for that. So the question becomes like,
can you start to build algorithms and build systems in
case that you can deem these very large complex systems?
And we're working on a lot of very interesting solutions there
from an algorithmic perspective to just automatically take in
these millions of of traces, millions of different input
signals from all the way from the cord to the output

(17:20):
generation, etcetera, to automatically give the developer
an overarching, you know, bird'seye signal of exactly where the
failure boards are. And then take action on those
signals and actually tell the developer what the actions
should be just to truly up levelthe nature of of their
development life cycle. And I think that's very
necessary with multi eugenic systems because if you think of
it that way, and I'll pause right after this, but if you

(17:42):
think of this, what's happening is if agents are like workers in
a room, right? You have the managers, you have
the workers, and then now they're all going and doing
things and there are these toolsthat they can use.
But you know, if you, if you asksomebody about like what's
where's, where are things going wrong?
It's not going to be like you'regoing to stop a worker and be
like, show me everything you've done over time.
And now let me figure this out. You need a summary overall and

(18:04):
some kind of a metrics dashboardand an overall summarization
dashboard of here's what's goingwrong quantitatively, here's
what's going wrong qualitatively.
And that's the only way you can then start a root cause analyse.
I think all that's going to be necessary to be in the complex
system. You're saying we need a GRF for
agents? Because that's kind of what I'm
hearing here. Hopefully GRF plus plus plus.
Yeah. Sure.

(18:25):
I had a couple more points to what Vikram was saying.
Certainly the measurement problem, I think we discovered
that this right at the onset of the company which was years ago,
which is you know in general AI has a measurement problem.
The other thing that we realizedwas there is no one stop shop
metric that fixes all your pains.
And this was a realization whichwe made early on in our journey,

(18:48):
which is why we've gone down theroute of baking in things like
auto adaptation into into frameworks like platforms like
Luna that we've built. And in the modern world, it's
kind of taken this form of evaluation agents really because
they're they're adapting, which means they're doing the task of,

(19:08):
you know, getting better, figuring out mistakes on their
own and adapting to your data. So that's one aspect of it, the
adaptability, which makes them agentic in nature.
The second aspect to the point about taming a complex system,
it is true that there's a million different parts to an
end to end Gen. AI app, but it's also that it

(19:30):
can be built in a million different ways.
There's no limited set of 12 ways that you can build an end
to end app, especially with the advent of A to A and any kind of
agentic framework that you take.What they're allowing the user
to do is giving the power back to the users.
It's in version of control. So now the developer has a lot

(19:52):
of composability power to be able to construct a system the
way they want. What does that mean for evals
though? For us, it's not only to adapt
our metrics to to, you know, be accurate.
Over time and constantly be accurate.
That's one aspect of it. But also think just to point to
one feature of Galileo, which iswe have a very powerful SDK

(20:16):
which not only takes the shape and form of Python libraries or
TypeScript libraries, but also of course APIs, but also tools
and agents which can fit into the modern workflow of say if
you're building in cursor or anykind of IDE which gives you the
Scopilot like experience, which is automatically fixing your
code. How do we take all the the

(20:39):
goodness of things like Luna, etcetera, and shape it in a form
which is composable, which meetsthe user where they are.
And it takes action at every stage of the life cycle where
it's zero to 1 where they're building the app.
We want them to not make the early level mistakes.
So you want to use our agents there.
Then there's the CICD bit where your applications moving from

(21:02):
dev to staging to prod. And then there's in the prod
where you want more real time guard railing and observability.
But it's the same sort of ingredients behind the scenes
which are sort of part of this one powerful platform.
And that's what Galileo gives the user.
In an era of vibe coding and increasing LLM generation of

(21:23):
software, it feels like this need for observability, this
need for robust testing and evaluation is more important
than ever to ensure reliability and to ensure trust in
applications, particularly as we're building these multi
agentic systems. I have to say, I'm really
excited by the fact that people can also now try the platform

(21:45):
for free and can have enterprisegrade evals, can test it out,
can give us feedback over Galileo dot AII think there's a
massive opportunity for us to help developers to increase the
reliability of their applications while also getting
the feedback we need to continueto hone in on how we solve this
measurement problem. I wonder if either of you is

(22:08):
seeing maybe Austin if you want to start any sort of increase in
the need for these reliability systems, whether it's the
observation piece, the evaluation piece or otherwise
based off of the shift to LLM driven or at least assisted code
Gen. What Wipe coding has done is

(22:29):
it's made developers 10X more faster at getting to a
particular outcome. You know, it would take you, you
know, a few weeks to build something that you can build in
an hour. So props to you know, all the
innovation that has happened towards it.
Even though wipe coding has a bit of a negative connotation,
it is true that if you completely blindly wipe code,

(22:50):
you will end up with a pile of crap and which is not
deployable. It will be a suboptimal
application. Please nobody here, look at my
GitHub. Yeah.
I mean the same here. So it's great for prototyping,
but there's certain standard sort of thoughtful design
patterns. I always say that we've kind of

(23:12):
gone back to the era of 2019 or even 2009 or even 50 years ago
where we discovered there's a limited number of ways where you
can build very high quality applications.
And the same sort of design patterns are re emerging in this
new era with the LLM in the mix.So wipe coding allows you to be
efficient at the lowest layer ofcoding, which is just, you know,

(23:34):
brass stacks writing boilerplatecode.
And any developer worth their salt will probably tell you that
over time people have discoveredways to efficiently wipe code or
thoughtfully wipe code by prompting the LLM back and
making making sure it doesn't make the same mistake, etcetera,
etcetera. Even then, there's only a limit

(23:56):
to how much you can scale a veryhigh quality application with
wipe coding, which really accentuates the need for
evaluation and reliability in general.
And that's where we kind of comein.
I think for us it becomes more important to like I've been
saying, adapt to the new sort ofworld in which we are in, which

(24:19):
is developers are using Co pilots.
We need to be, you know, in linewith them and meet them where
they are. But the fundamental problem,
like Vikram said earlier, is still the same whether you ask a
developer or a leader. Which is that I just want to
build a high quality applicationthat is deterministic, which is
reliable, which gives me outputsthat I can expect and an

(24:41):
efficient way to root cause it and take action on whenever
things go rogue. How should developers or
technical leaders who are building a systems think about
balancing seeking reliability while also leaving in the magic
that is hallucinations and iterations that the AI is

(25:04):
creating because it's a fine line to walk depending on the
type of application you're developing and you know what the
agents that you're you're putting loose in the world.
But at the same time, if we completely try to eliminate this
creativity, we are also getting rid of some of the secret sauce
behind AI. So Vikram, I'm curious from your
perspective how you're seeing enterprises that we're working

(25:27):
with adapt to this problem and solve for solve for reliability
while also keeping the upside ofof leveraging non deterministic
systems? If you talk about wipe coding
along with the EI reliability problem, I, I do think there is
they're trying to solve two different, two different things
that one is for speed of building AI systems and the

(25:49):
other one is quality of the AI systems.
And I think both of those are going to be important and both
of those are going to be continuously accelerated from an
is from an industry perspective.So I think the idea of wipe
coding is going to come from theidea like, you know, you have to
build these AI hubs faster. A lot of the context for
building out these AI applications turns out is has

(26:11):
always been right. The business logic side of
things has always been a very big part of software
development, right? In different shapes and forms.
It's interesting how, like now the business logic is coming up
even closer to the, you know, the actual coding layer by using
wipe coding, where are things like incredibly technical, but
like honorably for you and me, you and I can actually build out

(26:32):
apps much faster now with just using our knowledge of the world
and just figuring out quickly using, I don't know, lovable and
a bunch of other tools. So that's interesting from a
speed perspective. I don't, but that's that's
separate from to Athens point from thinking about this as like
a production AI application out in the world where you have to

(26:53):
have the right kind of guardrails and pays.
You have to have the right kind of safeguards and pays for
making sure that it's actually reliable.
As an example, with agentic systems, right.
It's much more about just bettersoftware engineering principles
all over again, where you just want to make the right kind of
tool calls. You have to have the right kind
of, you know, functions and different kinds of conditional
statements to make sure that you're calling the right staff.

(27:14):
And now increasingly making surethat you're doing the right kind
of instructions to the, to the model to choose the right kind
of prompts. And, and so, you know, from a
reliability perspective, it getsback into what are the best
practices that you're using overthere?
If it is wipe coding, then, you know, are you missing out on
some of the robustness that you might need as in the enterprise
to actually make sure that this again, see the light of day as

(27:38):
a, as an actual AI product? So I do think there are two
different problems as much as wipe coding can help in
accelerating from going from thezero to 1, that one two
production is still going to be something better to Athens
point. You you need an expert in the
loop where you can actually helpget to the other side.
But that isn't the idea that youactually need AI reliability to

(27:59):
your second part of the questioncorner, which is about like non
deterministic systems. That era is very much upon us
right at this point. I think about a year and a half
you could have said that is thislike is this like Bitcoin or Web
3, which never made any sense tome, still doesn't as a market,
but for but is AI like that? It's absolutely not.
It's actually shown a bunch of value in the enterprise and has

(28:22):
it absolutely arrived in the enterprise and everybody's
seeing a bunch of ROI, not yet lot of OpEx reduction, not a lot
of increase in revenue from use cases perspective.
I think that's going to change especially with agents now,
given that non deterministic customers and non deterministic
software is already a here and now thing, where the buck is
moved to is this question of as an enterprise leader, should I

(28:46):
or should I not adopt AI, right.And once we're saying I'll not
adopt AI are going to go the wayof the Rotary phone.
They all know that. So given that they're all going
to be adopting AI at some point now the pucks moves this
question of how do I make sure it's reliable?
And that's kind of where this platforms they gather to come in
and it's going to get more and more important for us to be able

(29:07):
to think of solving their problem in a very ballistic way.
Think I'd love to give my two cents on this one because it's
an interesting question. I know creativity versus
determinism, was this one of thefirst questions that was asked
when chat GPD happened and you know, LLMS were upon us?
I think in the agentic era, thatquestion is a bit dated.

(29:30):
It makes less sense in terms of number one with the innovation
on the model side, on the LLM side, they're actually much
better at language. And like if you, you know,
rewind the clock maybe 12 to 15 months ago, someone could easily
detect that output has come out of open AI versus entropic.

(29:52):
They were distinctly different. And now these models are a lot
better to, you can ask them to write in any style and you know,
they will adapt to your style. So there's a very marked
improvement in the ability for these models to spit out
language, but in the agentic erait's less about the language
itself, it's more about the actions and the outcomes of the

(30:13):
application and what people are really trying to do.
The modern developer is build anend to end system, which is
primarily software and you want to inject LMS at the right
places to be able to do the right things for you which were
not possible erstwhile. And in the end, your system is

(30:34):
compound in the sense of it has traditional software components,
it has LMS in the mix doing justthe right things where you don't
want hallucinations, etc. And the outcome of the end to
end system can be language if it's AQ and a system, but it can
also be something else. You have many long running
agents where the output is not, you know, a piece of text, it's

(30:54):
something else and it's more about actions.
So in this new era, think the creativity question kind of
becomes a bit myopic in the sense of yes, you want if the
application say is a marketing writing tool, yes, you want that
element of creativity which is actually a lot more controllable
and deterministic. And you can actually rely on the
LLM itself to be creative in language.

(31:16):
But the problem is entirely different for the vast majority
of the more end to end applications, which are sort of
this mix of traditional softwareand LLMS.
What are the specific use cases or problems that you're hearing
enterprises you talk to come up with and and say, hey, we need

(31:38):
help solving X? So like we have this broader
reliability challenge. We have, you know, evaluations
that we need throughout this process, but other specific use
cases, Vikram, that you're hearing more of us about or are
there other needs that you're hearing from these enterprise
customers? If you think of the shape of an
enterprise, it's typically like a 20 to 30,000 organized person

(32:00):
organization or often times evenlarger than that in the world of
retail, manufacturing, aerospace, banking, telco,
right, massive number of people,massive number of operations to
actually keep that alive and thriving.
So you have sales, marketing, all of these are the size of
like 100 different start-ups in the Bay Area, right?
Each of these organizations are so a large number of their use

(32:22):
cases have been internal facing,which V sitting over here might
poo poo on that. That's internal facing is
started to external facing. Who gives a shit?
It's that's not how they're thinking of that, right?
Like they, they go a lot of these enterprises, they've built
out dozens and dozens of these AI applications and and egentic
systems because #1 the, the bar for the need for it to be super

(32:47):
high accuracy is a little bit lower because it's internal
facing. So they can experiment really
rapidly. But at the same time the flip
side of that, the ROI from an OpEx perspective is massive.
So the number of these enterprises that have been
building out EI based tools and applications for their entire
sales fleet on the ground, the number of them are that are

(33:09):
building out static telcos are building out these systems such
that outage detection can happenfaster so that their analysts
can work better. Wealth management teams at large
banks are building out ways thatthey can generate reports faster
for their analysts. Customer support is, is
obviously a massive use case across every enterprise
globally. Don't even get me started on

(33:29):
accounting. And then the finance department
and it's, it's massive. So you have dozens and dozens of
these. So the question mark in the
enterprises becomes, I have 100 different use cases.
I want to centralized this in one singular place so that the
entire platform, the shape of myentire platform is, is
centralized, maybe with one team.
But then now can I get a single pane of glass view of exactly

(33:50):
what the risk vectors are acrossthe entire enterprise, wherever
the models are being used for? Let's say Connor, you're sitting
in accounting at a large aerospace company and you start
asking questions, which is breaks the system.
Some centralized AI platform team should know about that,
that, hey, this is a risk vectorand could break the system for
others. Or somehow your question is like
generally this massive report for me and it starts to take up

(34:11):
a lot of huge number of tokens in the output and now there's a
huge cost spike from where you're sitting.
So now they need to know about that or they might want to guard
reel the response so that you don't get the response right.
So that's kind of what we're trying to hear a lot more about
that internal operations, massive, massive use case.
And some of these organizations,especially more on the retail
side, they start more, more external facing as well, but the

(34:34):
bar for accuracy is even higher.And that's where, you know, real
time guard railing and real timesystems become much, much more
of a, of a, of a need at scale, right?
Like it has, every single query from millions and millions of
queries has to go through a bunch of different customized
guard rails that they've built out which fits to their use
case. This makes total sense because

(34:55):
we've already seen teams not only go through major code
transformations and change theircustomer support features, but
also limit the intake of ticketson customer support and all
these other options that are happening at the enterprise
level. So thank you, Vikram, for
highlighting several of these these ways that enterprises are
already seeing value and alreadypursuing this.

(35:17):
I think it's really interesting to consider this because so
often when we talk about AI, we talk specifically about the
exciting things coming out of fast shipping startups without
considering that for companies that have thousands of
employees, 10s of thousands employees, you can aggregate
such significant value from something as simple as helping

(35:38):
people find docs internally. You can save hundreds of hours a
week, if not more across the organization from some of these
simple fixes that can be enabledby agents or chat bots and a
variety of other workflows. And in particular, I think it's
interesting, Vikram, to hear about some of the industries
that you're talking about here. There's multiple highly

(35:58):
regulated industries you've mentioned here, healthcare,
finance and and others that are already seeing value and a lot
of naysayers externally. I think we're initially looking
at these highly regulated industries as areas where AI
wasn't going to have this transformational impact, wasn't
necessarily going to be a huge driver.

(36:19):
And it seems like that's not thecase with growing needs for AI
reliability engineering, AI observability engineering to
correlate with things like security teams, governance
platform teams internally, it very much feels like there's an
opportunity for every industry to realize these benefits.
And I would almost say, and I'm curious if you agree with this,

(36:40):
that companies that already havea history of successfully
dealing with regulations, dealing with, you know, higher
customer trust bars are, are more prepared to actually take
on the that what you need to do in order to have highly reliable
AI systems and leverage them internally.
Yes, there is also a direct correlation with the like the

(37:03):
pre AI era of for what those industries were doing for.
And as an example, a lot of the healthcare companies have been
very slow into adopt these newermodels, which are over a billion
parameters. A billion parameters is small.
But like you know, a lot of organizations kind of got the

(37:24):
the model size for small versus large at that point.
Right now we're seeing that of it with financial services as
well, right. But model risk management teams
used to audit different kinds ofmodels, like the different
version of bird models. And they used to say, like, this
is fine. But now they're living in an era
where if they give acceptance ona very specific model, the teams

(37:47):
are going to come back the next week and say, like, there are 5
new models that you need to lookat.
So that's almost like MRM needs to move on almost a real time
basis. So they're adapting really
quickly. And So what we're seeing with
financial services, with healthcare and some other
regulated industries like that is if there are 100 of these
organizations, it's maybe five of them that are at the

(38:09):
forefront right now because they've been able to adapt
because they already have the compute layer, the data layer
figured out. And so now it's, they just have
to get out of their way by creating the, by figuring out
the OPS layer internally. And their MRM teams are, you
know, moving really fast. And we're working directly with
them to figure out what the future of their, of their, of
the shape of, you know, AI adoption in those companies

(38:30):
looks like. And they're very, very open to
working with us on, on the, on that side.
And those companies, you know, just looking, looking at that,
you know, the general, what's itcalled?
The diffusion of Innovations graph if you will like their you
know. Their early adopters versus
early majority etcetera. Exactly.
Yeah. So you have the innovators and
the early adopters. And these folks are definitely

(38:51):
on the innovator side. And what happens is like one
person in the in the financial services side of things, they
figure out how MRM should work. They figure out how all this
other stuff should work. And they see this massive amount
of impact and then that just hasthis domino effect across the
entire enterprise. And we're seeing the same thing
happening now with some healthcare businesses where all
the others are having this massive amount of FOMO around

(39:13):
like, oh crap, they figured thisout.
They're seeing massive amounts of optimization efficiencies.
We're going to left, be left behind.
We used to do data science, we can do AI, we have developers
here. And the same thing is happening
in retail and manufacturing in aerospace.
And so it's, it's going to, that's what typically takes a
while. And it's exactly the same thing
that happened with like cloud isright, right where everyone was
like there's no way me as an aerospace business, we're not

(39:36):
going to put all of our data on Amazon servers some.
Folks are still doing it slowly,surely.
Yeah, exactly. And so it takes a little bit of
time, but then they'll look at the next aerospace company and
they're like, oh, they did it and they're moving faster now,
so we got to do this too now. And then there'll be a new CIO
and they'll just do it. So it's the same thing that's
happening here. It's happening a little bit
faster than the cloud, but I do think there's a little bit of

(39:59):
like the whole like which industry is seeing the
innovators already come about and how much are those
innovators advertising themselves and then that just
leads to this domino effect. I love it.
Thanks for these great insights,Vikram Auten.
I am curious if you have anything to add on this front.
I mean, I'll spare the kind of Vikram covered the pretty much
the entire gamut of the the use cases in the industries and

(40:20):
nothing more to add on that particular front.
But I guess I would say that the, yeah, I mean, the cloud
analogy is pretty, pretty spot on, except that this is like the
access to this technology is so much easier.
Like to port an application on the cloud, say, even if you
rewind the clock to 2015, it's alot like you have to figure out

(40:42):
fundamental infrastructure and too many impediments to say.
But now that you know, there's already enough appetite for the
cloud. A lot of these, the fundamental
technology which is powered by the LLMS, they're on the cloud.
As long as you have enough compliance for that, I think the
rest of it is just on your fingertips.
Pretty much anyone can, you know, write 2 lines of code and

(41:05):
get a lot of intelligence baked into their system.
So I would say that this adoption will adoption will
happen 100 X faster than the cloud.
It's interesting you bring that up because I do think one of the
big differences we're seeing is that inherently consumer facing
LM products have led this wave where I mean ChatGPT went

(41:29):
extremely viral, has continued to do so regularly gaining
massive market share or 5% of the world and Sam's claiming 10
that points are using it at withsome regularity.
So we, we have this major, you know, moment where people are
simply aware of this technology,they can try it out for
themselves. They can see that it may give

(41:49):
them better information than a Google search in some instances,
if at the most like basic use case and there are simple use
cases that individuals can solve.
And so that the comfort with thetechnology is often quite high
because people are just able to use it day-to-day if they want
to. So it's going to be really
interesting to see how that and the, the infrastructure that's
already been built out accelerates this trend.

(42:10):
Vikramatan, thank you so much for this conversation.
It's been fantastic getting a chance to hear what you guys are
hearing from the enterprise. I'm curious if there are any
customers or, or particular folks you want to shout out here
as we, as we wrap up because I, I know people always love to
hear that they're being impactful in these
conversations. There's, there's a whole bunch
of customers that are allowed tothank and talk about, but it's

(42:32):
mostly in the these Fortune 50 telcos and banks and others, as
well as a lot of others in the fast moving AI space.
Like there's the amazing folks at Twilio, at HP and many, many
more that would love to thank. So we have dozens of these folks
that we're closely, closely working with and many more who
are who we're working very closely with that we can talk

(42:53):
more about maybe in the next oneor two months.
But it's amazing to build with them to see how the market's
moving really fast and actually make sure that we can add a lot
of value because their systems need to be much more reliable
than they are right now. Still, things still still break.
And when when things do, you know, Galileo has to just be
there to make sure that they cansleep at night knowing that

(43:14):
there's their AI systems are nondeterministic, but reliable at
the same time. That's the only way this this
entire industry is actually going to see not just the light
of David actually start to shine.
Well said, and we'll certainly be talking to you both a lot
more in the coming weeks and months as we continue to shape
the future of reliability for AIsystems.
Atin Vikram, thank you so much for helping us navigate the

(43:37):
complex and fast moving currentsof AI in 2025.
For all of you builders and leaders who are listening and
looking to stay ahead of the curve, I highly recommend
subscribing to the Galileo YouTube channel.
It's full of fantastic content, including demos of our
reliability platform, our observability and evaluation
features, how to's and so much more, along with discussions

(43:57):
like this episode, which will help keep you informed and
inspired. Plus, check out Aten and Vikram
on LinkedIn. They post a lot of great stuff
there. Aten in particular, I want to
shout out for constantly sharinghis insights and kind of
thoughts on what's moving forward in AI and some great
papers that I I certainly enjoy reading.
You can also always find more info at Galileo dot AI slash

(44:17):
blog for everything we're writing.
Gents, thanks again for coming on the show today.
We'll have you back again soon for another check in on all
that's happening the world of AI.
Thanks, Connor. Thank you, and thanks ever for
listening. We'll see you next week.
Advertise With Us

Popular Podcasts

NFL Daily with Gregg Rosenthal

NFL Daily with Gregg Rosenthal

Gregg Rosenthal and a rotating crew of elite NFL Media co-hosts, including Patrick Claybon, Colleen Wolfe, Steve Wyche, Nick Shook and Jourdan Rodrigue of The Athletic get you caught up daily on all the NFL news and analysis you need to be smarter and funnier than your friends.

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.