AI Meets Data Infrastructure: Cost, Performance, and What’s Coming Next — with Barzan Mozafari - What's New In Data

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:15):
Hello everybody.
Thank you for joining today'sepisode of what's New in Data.
I'm super excited about ourguests.
We have some awesome topicstoday.
We have Barzan Mozaffari, ceoof Kiboai.
Barzan, how are you doing today?

Speaker 2 (00:27):
Doing pretty well.
Thanks so much for having me.

Speaker 1 (00:32):
Yeah, absolutely, Barzan.
You're an expert in this field.
There's so much incrediblestuff going on in terms of the
implementations with Snowflake,Databricks, Data Lakes in
general, and it's growing tosuch a massive scale, and we're
going to talk about that today.
But first I wanted you to tellthe listeners a bit about
yourself.

Speaker 2 (00:52):
Sure happy to.
I'm a co-founder of Keyboardai.
Prior to that, I was anacademic.
I was a professor in computerscience at the University of
Michigan at Armburg.
Before that I was at MIT and UC, berkeley and UCLA.
Before that, I also worked fora number of companies in this
space.
I pretty much spent the lasttwo decades of my career doing

(01:14):
research at the intersection ofAI, slash, machine learning and
data-based systems.

Speaker 1 (01:20):
Excellent.
Yeah, so you bring an awesometechnical research background
and now it's applied in thisincredible product that you're
building, which really solvessome critical problems for data
teams that are operating atscale and sort of the
intersection of data engineeringand FinOps.
But I wanted to first talkabout this at a high level.

(01:43):
So your product, kiboai, doesfocus on cost optimization.
What are the biggestinefficiencies you see generally
in how companies use Snowflaketoday and do you see any common
mistakes that lead to runawaycompute costs?

Speaker 2 (02:01):
I think there's a lot of interesting sources of
inefficiency.
So, like we basically ourproduct.
So, just for real viewers, whatKibwaai does is that we're a
data learning platform.
We actually our models, learnfrom how your own users and
applications enact with your owndata in the cloud where you
know whether it's Snowflake orDatabricks and then use that to

(02:23):
automate and accelerate thetedious aspects of the
interactions between the datateams and the cloud data
warehouse, and then that allowsthem to actually get more out of
the platform.
So some companies use that toreduce their spend on their
Snowflake bill, on theirDatabricks bill, et cetera.
Some of them use it to actuallyenhance their productivity, to

(02:43):
get more done with the moneythat they're spending on these
powerful platforms, andsometimes people use that to
actually improve performance,for example.

(03:06):
You know I wouldn't call itnecessarily mistakes that teams
make, but it's more about howthese platforms have created new
patterns and how some of thetraditional data engineering and
data pipelines are no longer aseffective, right, and data
pipelines are no longer aseffective right.
So, to sort of look at thebigger picture, there's a reason

(03:29):
why the likes of Snowflake andDatabricks have been very
successful in this market, right, like what they've done
traditionally.
If people had to rely on thesecentralized on-prem data
warehouses, go to DBAs, go to acentralized data team and
everything basically had to goto DBAs, go to a centralized
data team and everythingbasically had to go to that
bottom.
But what the likes of Snowflakeand others in this space have

(03:50):
done is they've really loweredthe adoption barrier.
Right Now, it's significantlyeasier for organizations of any
sizes.
Like you know, a startup oflike two people, all the way to
companies of, like you know,50,000 employees can leverage
data.
Any team can just tap into dataand that time to insight is

(04:10):
drastically carved out.
That's all great right.
That's the positive side of thestory.
But the side effect of this isthat the flip side of this is
that now you've got more userstapping into more data and doing
more things with it and,because it's so easy, they're
also bringing in more datacombining more data sources.

(04:31):
So when you have a situationwhere more data is being queried
, more data sources are beingcombined and joined together and
then you have more usersquerying that, that means your
modern data pipelines aresignificantly more complex, and
it's not just about the mistakesthat people make.
It's just that there's a humanlimit to how much you can

(04:53):
optimize a very complex pipeline.
And now add to this complexitythe fact that now that you have
a lot more users querying thisdata, then it's just a natural
fact of life that not every useris a database expert right Back
in the day you had a BI analystwho knew exactly where to find
what kind of data, how tohand-tune and hand-optimize

(05:15):
every single query.
Now you have an analyst from thesupply chain department,
there's someone from themarketing team, there's someone
from sales ops Like they're allwriting queries.
And you know, to put it inlight terms, not everyone who's
writing SQL queries has taken a,you know, has a degree in
computer science, right?
So some of those SQL querieshave some, you know, room for

(05:38):
improvement, right, let's justbe honest about this.
But now you've got likemillions of these queries
hitting your store.
David Pérez, your store yeah.

Speaker 1 (05:48):
It's just a matter of skill right, absolutely yeah,
and I like the way you put it.
But you're right.
You know to use a marketingbuzzword, you know Snowflake and
these data platforms havedemocratized data.

Speaker 2 (06:02):
Exactly Very well, it's an effective democracy.

Speaker 1 (06:07):
And sure they made it easy for large companies to
sort of build this customer dataplatform, data platform that
everyone can use acrossdifferent levels of expertise.
So, yeah, you might have peoplewho are non-technical just doing
like a select stars and dumpingit to a excel file and
filtering there and you know,and there's really.

(06:30):
It is really great because it'sincreasing the efficiency and
how fast companies can makedecisions with data and you know
what types of teams can makedecisions with data because,
like you were saying,traditionally, yeah, you would
have a a very trained bi analystwho's working with a very
finite set of uh resources in anon-premise data center and they
have to write really great,efficient queries and everything

(06:51):
has to be planned because thatcapacity was built out by an IT
engineer who says this isrunning on a 48-core box in our
janitor closet In our datacenter, but now everything's in
the cloud and it's just like autility and anyone can use it.
But that's great that you'resolving this problem and you

(07:17):
know your product, interestingly, leverages AI to automate some
of this query performance tuning.
Can you walk us through how AIis being used under the hood?

Speaker 2 (07:27):
Yeah, absolutely.
So, you know, if you thinkabout the complexity of what's
happening, right the way.
So the way these days peopletalk about AI is sometimes
misleading, right, like Isometimes joke that like you see
people who are basicallyselling cookies pitching AI.
It's like you don't need AI foreverything.

(07:47):
There are situations where youdon't need AI and AI is not
applicable.
So if you look at thecomplexity of the data pipelines
, the fact that you've gotmillions of these queries
hitting your cloud datawarehouse at different times,
with different teams and tryingto g lean different types of
insights from your data, some ofthese queries are pretty
complex, right.
So that's where I'd say it'shumanly impossible to make the

(08:12):
optimal decisions.
It's humanly impossible to makesure that every single query is
hitting perfect, optimalwarehouse.
It's humanly impossible to makesure that every single
warehouse has the optimalsettings of resources.
Like, and it's not just amatter of effort because right
now possible to make sure thatevery single warehouse has the
optimal settings of resources.
And it's not just a matter ofeffort because right now, let's

(08:34):
say today, for example, at 9 am,a medium snowflake warehouse
was optimal for my reportingworkload, but two and a half
hours later, a medium is notenough.
I need to go to a large.
Maybe after 5 pm I can go downto an X small.
Maybe a sudden report comes in.
I need to go to a large.
Maybe after 5 pm I can go downto an X small.
Maybe a sudden report comes in.
I need to boost it up.
Right, I need to bump up thesize.
So, making these changescontinuously by analyzing

(08:56):
millions of statistics andcalculations, it's where,
actually, that's the sweet spotfor AI.
I call sometimes what we do atKibo.
We're building an infinitelypatient, infinitely competent
DBA and a lot of people haveseen what AI can do when they go

(09:17):
to chat GPT and ask a verycomplex question.
They want to chat GPT toresearch something and distill
it down.
We're leveraging AI in a verysimilar way when it comes to
optimizing your cloud datawarehouse, right.
So we actually analyze millionsof statistics over the last
whichever number of days of thiswarehouse which queries have

(09:38):
run on it, which users have sentwhat kind of requests, what has
been the performance, how longeach of those queries spent in
queuing and based on those, wemake optimal decisions about
okay, right now, this is theoptimal size for this warehouse
Right now to get the SLAs thatthe users care about.
Here's the kind of resources weneed.
Right now.
It's idle.
Right now.

(09:59):
This is unnecessary.
We can get away with a smaller,let's say, compute size.
Let's say, right now we decidethat, hey, we can get away with
a smaller, let's say, computesize.
Let's say, right now we decidethat, hey, we need more clusters
, we need smaller clusters,larger clusters.
And then we can also haverecommendations for users in
terms of what are somelow-hanging fruits they can do
to improve their ownproductivity, improve

(10:20):
performance, but alsodrastically reduce the
computational footprint and theoverall cost to their company.

Speaker 1 (10:28):
That's excellent.
Are you using the mainstreamfoundation models or are you
training your own models or amix?

Speaker 2 (10:35):
So it depends on what the user is trying to do.
So what we do at Kibo is we'renot replacing the data
engineering team, we'rebasically augmenting them.
We're empowering them to getmore done right.
Our mission at Kibo is toempower data teams to basically
take control and drive growth toour region.
So the data team basicallydefines their own performance

(10:56):
guardrails, their own goals.
So, for example, if a team'sgoal is to say you know what,
this is a mission-criticalworkload and needs to finish by
this particular time, then theyset up those guardrails
performance guardrails in thesystem and then we basically
train our models and to answeryour questions in that situation
, for example, we leveragereinforcement learning.

(11:18):
That's roughly the third stepin geriatric AI is reinforcement
learning.
You train an agent and much likehow you teach your kid how to,
for example, eat right, whateverthey eat, without making a mess
.
You congratulate them, you clapfor them right and every time

(11:38):
that they make a mess you tellthem hey, you shouldn't do that
right.
So the same way withreinforcement learning agents.
Every time that that agentmakes a decision, for example,
that agent decides to send thisquery to a different warehouse,
or decides to reduce the size ofthis warehouse or to optimize

(11:59):
or increase the memory on aparticular instance.
Every time it makes the rightdecision.
And what's the right decision?
It depends on what the user'sgoal was.
If the user's goal is to savemoney without improving, without
causing the performanceslowdown, then whenever the
agent takes an action that leadsto that outcome, we reward the
agent, and whenever it takes anaction that doesn't help with

(12:20):
that outcome, we penalize theagent.
So very quickly these agentsbecome an expert in your own
workload and it actually allowspeople to think that, hey,
that's going to take days orweeks or months for me for that
agent to learn my workload.
But the thing is we actuallyaccelerate the learning.
The models actually start warm.

(12:40):
They only have built-inknowledge right General
knowledge about how Snowflakeworks, but they also pull the
metadata from the last threemonths.
So most of our customersactually see significant savings
in less than 24 hours from whenthe second that they are moved.
So these models are actuallyvery, very quick to learn and

(13:04):
start delivering value.
That's the beauty of freepersonal learning.
But we've also startedinvesting in allowing users to
leverage Gen AI and MLM toactually rewrite queries.
We actually have papers outthere that people can read in
terms of how you can build.
So the work is called, theproduct is called GenRevite,

(13:26):
where we actually leveragegenerative AI to rewrite these
queries using an LLM, but in away that preserves the semantics
, make sure that the relatedquery returns the exact same
answer, but also issignificantly faster, is
significantly faster.

Speaker 1 (13:43):
Yeah, that's excellent.
And on top of the hard costsavings here that you're
providing just on the utilitythat we described as Snowflake
and other data lakes andinfrastructure there's also an
element of total cost, of totalcost of ownership, because when

(14:06):
I'm a data engineer and I get,you know, tons of requests from
my business team, my firstresponse isn't oh, how do I, how
do I make sure this is costoptimal?
No, I'm like I'm spending 100of my brain power make sure I'm
solving the business problem.
And then delivery, deliveringthat and getting acceptance and
you know, uh, thumbs up frombusiness leaders that leaders

(14:26):
that these new reports or thesenew data applications are
working and providing value.
If I can just have somethingthat will automatically cost
optimize everything for me, Iwant to be lazy, I want to just
write bad SQL because that'sfast.
I can optimize anything if you,if you give me extra time,

(14:49):
right.
But you know, having a toolkind of do that for me, right,
it just accelerates the, thetime to value there and overall
cost ownership of building andmaintaining these data pipelines
.
So so, yeah, it's a greatapplication of AI for that
specifically.
So, yeah, I love your approachthere.

Speaker 2 (15:11):
No, thanks for saying that.
So I think one way to look atthis is, to your point, like
it's like given enough time andpatience, everything's possible
where there's a cost associated.
So one of the biggest problemsthat we see teams face like we
were talking to this majorinsurance company their C-suite

(15:36):
got leaned outside of us, sothey decided we're going to use
Snowflake, right, but then theadoption was really slow.
So they purchased millions andmillions of dollars worth of
credit, but then we landed thisproblem where they weren't able
to onboard those use caseswithout the worry of hey, what

(15:58):
if someone leaves a 3X largewarehouse on over the weekend?
What if someone writes a reallydumb query?
And then we burn through themillions of dollars very quickly
Because we invested $10 million.
Now we have to protect thatinvestment.
So, instead of sort of beingfocused on how to drive routing
from the $10 million, now thedata team becomes guardians of

(16:22):
that $10 million investment, andthe way this works is that they
have to basically build a lotof in-house tooling to make sure
that they have alerting inplace, that they have guardrails
in place, that people don't dosilly things with that cloud
data warehouse.
And a significant portion ofthese data teams is just spent

(16:46):
and maybe, allow me to say,wasted on staring at these
queries, figuring out how tooptimize them, looking at the
bill, trying to make sure theystay within budget, and all of
that stuff.
And that's not value add andthat's not fun work to do.
Right, like, for example, ifyou're an interest company, you

(17:10):
grow your business by offeringmore competitive rates.
Right, if your engineers areconstantly worrying about how to
improve those great, you knowthe performance, every single
query, making sure they don'trun out of budget, they don't do
anything bad.
Like you know all of that stuff.
If you could automate that andhave your data teams focus on
growing your business instead ofrunning and optimizing your

(17:31):
infrastructure, I think just youcan imagine how many
engineering hours, how manythousands of engineering hours,
that there'd be for you to haveto do actual productive work
that drives your businessforward.

Speaker 1 (17:46):
Absolutely, ed.
And now we're seeing so muchevolution and fast-paced
adoption of AI in the data stack.
So, seeing AI, like you know,for example, you're solving a
very targeted use case using AIto optimize the compute and
resources of these dataengineering workloads.

(18:06):
Beyond query optimization, like, where do you see the biggest
opportunities for AI to improvehow companies use and analyze
their data?

Speaker 2 (18:16):
Beyond optimization.

Speaker 1 (18:18):
Yes.

Speaker 2 (18:19):
So I mean you've seen a lot of recent, probably
investment in RACs, right, likehow we can actually leverage
MLMs for better retrieval fromyour own database, right.
So that's an area we're alsoactively looking into is you can
go to ChatGPT, you can go toCloud and other LLMs out there

(18:43):
and ask general informationabout publicly available data.
But you have these extremelyvaluable data sources but the
value is kind of locked out, andso we've seen about three
decades of BI technology tryingto make the data accessible,

(19:03):
like this idea of democratizingdata and insight is not a new
one.
People have been pitching forthe last two decades, but the
better truth is that it hadalready happened.
You know, john Smith, who'ssitting in the marketing
department or ops department, isnot empowered to just go and
open Looker and just magicallyask whatever questions they have

(19:26):
.
There's these legacy internalsearch systems like hey, go and
search Confluent to see ifthere's anything, but the search
is really, really bad.
You cannot really get to data.
It's just like what we'reseeing is that we're
transitioning from search actualconversations, right.
So, like Google, even like youknow, back five years ago, the

(19:50):
primary way of learning wassomething you would Google it
and Google would do a reallygood job of showing you the most
relevant sources, but thenyou'd have to go and read those
sources, distill the information, decide if that's what you
wanted or not, and then go andrevise your keywords and rinse
and repeat right.
But now you just go to ChatGPTand ask exactly what you have in
mind and they give you theanswer.

(20:11):
So what we're seeing here isthat on the enterprise side and
even mid-market people aretrying to sort of have the same
conversational interface totheir own internal data.
Hey, curious, what happenedyesterday?
Did we sell more in thisdepartment?
Or?
I see that basically mycustomer acquisition has gone

(20:37):
off today.
What are the biggest rootcauses of Like?
You need a really competentdata science team to go and
perform those root causeanalyses and it's going to take
them a long time to come backwith reliable answers.
But if you have a rack built upinternally, you can hook it up

(20:59):
with your own database.
Have an LLM next to it in a waythat addresses cost
considerations.
I do not bring throughthousands of dollars on LLM
invocation costs, api callswhere it addresses hallucination
problems.
It addresses complianceproblems.

(21:21):
You're not sharing yourinternal confidential data with
an outside LLM.
It addresses security concerns,that hey, I am authorized as a
user in this company to look atthis particular table and I can
get quick, meaningful,actionable insights.
I think that's where the futurelies.
There's a to be honest with you, there's an armband right here.

(21:43):
There are a lot of peopletrying to provide solutions in
this space.

Speaker 1 (21:50):
Yeah, it's a super interesting area and kind of
combining the whole semanticlayer with this.
Just, you know, naturallanguage interface where, like
you said, someone who kind ofjust wants to ask kind of
arbitrary questions about thedata right and get very useful

(22:10):
answers, kind of like what we'regetting today, like you can
chat with chat GPT, kind of likewhat we're getting today, like
you can chat with chat GPT.
You know, you don't have to beparticularly specific about what
you want and obviously the morespecific you are, the better
responses you get.
But you can still get greatresponses just by chatting right
, just without being too precise.
So you know, being able toextend that to data where data,

(22:33):
like queries, are typically verylike deterministic and you have
to know the exact table be ableto extend that to data where
data like queries are typicallyvery like deterministic, and you
have to know the exact table,you have to know the data model,
you have to know the right wayto query it, et cetera.
Now you can kind of open it upby saying, okay, just let me
chat with the data.
You figure out what SQL that'sgoing to generate, or you figure
out what Python you're going towrite to, to to come up with
that To get that done Exactly.

(22:54):
That's an exciting area.
Yeah, for sure, and yeah, sohopefully we'll see some cool
stuff continue to come out there.

Speaker 2 (23:02):
No for sure.
I think those three, liketraditional conservations, are
still going to be there, right,like cost is going to be there
the cost, the effort and theaccuracy.
Those are the three things thatI think are going to be the cost
of all of this that, like wesaw, like with the adoption of

(23:25):
Snowflake and Databricks, peoplevery quickly realized these
things can get very expensivevery quickly right, then the
effort needed, right, the amountof like we actually I've never
seen a company that's spending,for example, more than $10
million on their snowflake, oreven north of a million dollar,
except that they also have asignificant a number of

(23:45):
engineering cycles being spenton monitoring and optimizing and
worrying about it.
Right, and it just makes youwonder is it actually worth a
company that's in a differentindustry to go and build tooling
to monitor their own toolingspend?
Right, and that's why you seeall these final solutions coming
up, because cost is becomingprohibitive very quickly.

(24:09):
But then the way that peopleare trying to adjust cost is by
putting in more effort, which isagain expensive.
So people have figured that,hey, we're spending, for example
, $5 billion on our Databricksor on our Snowflake.
So maybe if we have three fulltowers just constantly

(24:29):
optimizing these pipelines andkeeping an eye on it and whatnot
, that's five times $250,000.
That's not too bad right, butif you it and whatnot, that's
five times $250,000.
That's not too bad right, butif you think about it, that's
not a really good ROI.
Yeah, you know, this is exactlywhen you can leverage AI to
monitor your own uses of AI andmake sure that things the wheels

(24:51):
don't fall off the bus.

Speaker 1 (24:53):
Yeah, and you know, when I look at what data teams
want to do just generally withAI when it comes to business
logic and their own internalapplications, you'll see data
teams kind of gravitate towardsyou know they understand that
and they want to implement it.
But you know cost optimizationand you know some other you know

(25:13):
kind of things that areadjacent to solving the core
business problem is such a greatplace for AI to just solve that
for you, right, and you know Ithink solutions like Kibo make
that super simple, especiallysince you sort of in your own
way, generalize the problem andyou've done reinforcement

(25:33):
learning on your modelsconstantly and you're bringing
that in with your product.
So one thing I want to ask youis self-like users are
constantly having trade-offsbetween performance and cost.
How does Kibo help them strikethe right balance between
keeping queries fast and thencost optimal at the same time?

Speaker 2 (25:56):
That's a really good question.
I think the biggest learningwe've had and we work with
hundreds of data teams, right,like literally hundreds of these
data teams, if not thousandsthe biggest learning we've had
there is that there's no onesize that fits all the um.

(26:18):
The particular data team atthis particular company at this
particular time has a completelydifferent view of where what's
concerned.
In the right trade off betweenthe two, then a different
company, even within the samecompany, we see very different
understandings of what'sconcerned good performance on
this particular workload versusthis other one.
There's situations where thecustomer says you know what this

(26:40):
workload, as long as it'sfinished before 6 am, so the
dashboards are off.
Today I couldn't care less howlong it takes, just keep the
cost low.
You know, be as aggressive aspossible as you want, as long as
the jobs don't fail.
They finish before the firstemployee shows up.
And then there's some you knowother workloads where like this
is like if this thing slows down, people start complaining and

(27:04):
then I'm going to rip off thesolution.
So what we have done is we'vefollowed two design principles
in our product and I thinkthat's been incredibly helpful
both to us as well as to ourcustomers, aligning what we're
doing with what our customersare looking for.
Number one is that we'vedecided that whenever there's a

(27:25):
fork, we're going to go withperformance, and the reason for
it is when you talk about fullautomation, the analogy I
usually give is like autonomouscars, an autonomous car when
you're building an autonomouscar, your first crash is going
to be your last crash, becausepeople will never trust you.

(27:48):
Let's say I could save thiscustomer 30% today without any
chances of a slowdown ornegative impact on the pipelines
, or I could push it and savethem 35%.
But there's a small chance thatsome of these queries are going

(28:09):
to actually slow down and theusers will complain and then the
customer will complain and thenthe trust will be ruined.
So whenever there's a situationlike this, we actually go to
30% savings, not the 35%.
The reason for it is no onegets angry.
No one yells at you why youonly send me 30% instead of 35%.

(28:29):
But they will yell at you ifyou cause their work pipeline to
break.
If you make the user'sexperience you know degrade,
they're going to complain, right.
The same way with an autonomouscar If you get home at five,
right, you're not going tocomplain.
Why did my car not get me homeby 4.58?

(28:53):
It's just two minutes of delay,like you know.
So what?
They took him some mile to myroute, maybe drove a little
slower than they should have.
It's tolerable.
But if that autonomous car getsyou into a crash, you're going
to be really upset.
So that's the first thing wefollowed.
Is that all other things evenand equal?

(29:14):
Whenever we have a fork, we gowith protecting performance
rather than saving more money.
So the default is how can wesave the most amount of money
without impacting performance?
That's what we call no-brainershavings, because if we tell
customers we can save you, cutyour snowflake bill by, let's
say, 40%, but 10% of yourqueries are going to be

(29:39):
considerably slower, then theyhave a trade-off.
They have to think about it.
Those decisions, believe it ornot, are very difficult.
They're paralyzing decisionsbecause you don't know how
people are going to react totheir queries taking 10% long.
But if you told that samecustomer, guess what?
I can save you 30% and noimpact on your work.
That's a no brainer.

(29:59):
No one has to think about it,right?
So that's the first thing we'vedone, and the second thing is
back to what I was sharingearlier.
Every customer is unique, everyworkload, every warehouse is
unique.
We we have the mostcomprehensive suite of basically
flexible uh suites where basicpeople can go in there and pull
their own levers they can.

(30:20):
We have a slider, for example,where they can say how
aggressive we're all aroundConservative they want to be
with this particular workload.
You know, that's the firstlayer.
The second layer they can go inthere and actually define SLAs.
That's, for example, I want tomake sure the Maginot personnel
latency stays less than twoseconds.
I want to make sure the 99%latency stays less than two
seconds.
I want to make sure the numberof queues is less than this and
that there's another layer ontop of it where they can

(30:43):
actually even define rules thatthe system will protect.
So those are all guardrails andthen we train the agents AI to
actually operate within thoseguardrails that the users have
specified.
So short answer to yourquestion is we never have to
make that decision.
What we have done is that webuilt a very flexible interface
where users can tell us what isit that they consider good

(31:05):
performance and what is theirgoal, and then the agent exactly
delivers what they care about,what is productive, what team
performance, maximizing savings,et cetera.

Speaker 1 (31:17):
Yeah, it's so interesting and it might be one
of those things where if I ran adata team and I really had to
solve let's say, the FinOps teamor the finance team has been on
me about my cloud costs first Iwould use something like this
to optimize my costs and thensee, you know, is anyone

(31:37):
complaining by the queries being10% slower?
You know, if yes, then you knowI'll pull the throttle back on
Gauss optimization, but you know.
But like you said, I mean it'sno one size does not fit all for
sure and you know it'sdefinitely something that every

(31:57):
data team needs to evaluate fortheir own operations and their
own business.
But it's great that you've kindof figured out how to add the
right levers and ways for themto approach that, you know.
And I like the autonomousdriving example because it's
like, yeah, I need to be home by5 pm.
You know I can probablysacrifice, you know, some some

(32:20):
time to get there safer and morecomfortably, smoother ride,
less lane changes, like maybeI'll get there at 502 pm, that's
fine.
But if it's going to get methere at you know, 5, 45 pm,
because it's going to drive in10 miles per hour in the right
lane, uh, the whole time.
Then I probably won't acceptthat.

Speaker 2 (32:35):
That's so, but I really liked your example too.
It's like you know, that'sexactly the kind of interaction
that you, as a data leader, wereable to do is like to try
something, wait to see if peoplecomplain or notice, and if not,
then you know that's okay andthen you try again until you
find that sweet spot right.
So that's exactly where humanscan handle that right and we

(33:00):
rely on our champions and ourdata engineering that's
leveraging Kibo to have thosekind of decisions, make those
kind of decisions and have thosekind of conversations.
We just give them the leversToday, if you want to go 10%
more, here's a lever, and thenyou tell us if it was too
aggressive, you pull it back.

Speaker 1 (33:18):
Yeah, exactly, barzan .
You have both an academicbackground and an
entrepreneurial industrybackground, so I want to get
your take on this.
Where do you see AI-drivenautomation going in the next one
to three years?
Or do you see AI-driven?

Speaker 2 (33:37):
automation going in the next, you know, one to three
years.
I usually say, like you know,predictions are incredibly hard
but also incredibly easy.
At least it wasn't the time.
Where people make predictionsabout the future, no one goes
back and holds them accountable.
I will.
It's not going to work.
So that's why I said it's hardand easy.

(33:59):
But, joking aside, I think a lotof what we're seeing like LLMs
have really gotten to a pointwhere there's a the analogy I
give people it's rather anovernight thing, right, like if
you're staring at the wall rightand someone has a ladder stood
up against that wall and they'reclimbing that ladder every
single day.
They're making progress,they're getting closer to the

(34:21):
top, but you're waiting on theother side of that wall.
You don't see any progress,right, until that person, their
head, makes it above the wall.
You know the wall line and thenthat's when you see it.
So to you it feels like asudden thing where after, let's
say, 20 minutes, nothing washappening and suddenly you see a
person's head and now you cansee more and more of their body

(34:43):
as they climbed that wall.
But for the person who was onthe other side it was not an
overnight thing.
They'd been basically climbingthat ladder continuously.
So I think that's a differentperspective between academia and
industry, right, like it wasn't.
Like hey, suddenly now we havechat GPT, like you know, we've

(35:04):
been part of that progress andwitnessing it and witnessing it
and contributing to it, like forthe past decade and a half, two
decades, right, but I thinkfinally the compute power and,
you know, a sheer volume of datathat we could actually train
these models on, got us to apoint where now the public can

(35:25):
see the value.
So you asked me whether I thinkit's going to happen a year to
three years from now.
I think we're going to seebasically a lot more real-world
adoption of basically theseagents and all walks of life.
Right now, it's just a cutething to go and ask ChatGPT, but

(35:51):
people are actually buildinginteresting verticals and apps
on top of ChatGPT where you can,for example, go and redline
your document, review your leasewith these agents.
Like you know, you're hiringsomeone in a completely
different country.
You can actually ask one ofthese LLMs to check for

(36:11):
compliance, right, because theyhave.
You know, they have trained onthose local laws and all of that
stuff.
So I think we're going to seemore adoption over the next year
.
But if I were to look at thenext 10-year horizon, I think
what's going to factor, honestly, is that a lot like I don't

(36:33):
want to say software engineeringis going to go away, like you
know, because there are peoplewho say you know what?
No one else is going to bewriting code.
Everyone turns into a promptengineer.
I don't think that's going tohappen, but I think what will
happen is that an averageengineer will be a lot more
those engineers who want to stayrelevant, whether a data

(36:53):
engineer or a software engineer.
They will be a lot morewell-versed in machine learning,
statistics and AI.
Right, I think what's happeningright now with this, I guess
over the next 12 months and thisis something that's been
happening over the, I would say,over the past year and a half

(37:18):
and probably another year we'reseeing some irrational
resistance and you see someengineers or some teams feeling
threatened by AI that oh, ai ishere to take my job away.
So what happens is that there'sstill a lot of teams who are
very locked in to try to buildevery tool inside.

(37:39):
I mean, we face this withSnowflake too, like you'll be
surprised how many people wantto actually build their own
optimization tool in-house, andthis tendency to I think that
sounds awful.

Speaker 1 (37:51):
By the way, I would never volunteer for that.
No, you shouldn't nevervolunteer for that.
No, you shouldn't it's a buyversus.

Speaker 2 (37:55):
Like you said, it's a probably century long, you know
, buy versus build kind ofquestion.
But I would say that's going tochange quite a bit over the
coming years.

Speaker 1 (38:03):
Yeah, absolutely.
And you know when I hear thatyou know being thrown around.
You know AI is going to replaceengineers.
I always just anthropic andopen to AI hiring all these
engineers that are supposedlygoing to get fired in the next
year.

Speaker 2 (38:17):
My joke is that if you're afraid that AI is going
to take your job away, itprobably is so.
Instead, you need to learn moreabout how you can best leverage
AI, rather than resisting it.

Speaker 1 (38:29):
Absolutely, barzan.
This was really great.
Thank you so much for joiningus today and thank you to all
the listeners for tuning inBarzan.
Where can people continue tofollow on with you?

Speaker 2 (38:41):
If they just go to kiboai K-E-B-Oai, or follow me
on LinkedIn.

Speaker 1 (38:49):
Excellent.
We'll have those links down inthe show notes for the listeners
.
Barzan, thank you again andthank you to the listeners for
tuning in.

Speaker 2 (39:00):
It was a great pleasure, John.
Thank you so much for having me.

All Episodes

AI Meets Data Infrastructure: Cost, Performance, and What’s Coming Next — with Barzan Mozafari

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

24/7 News: The Latest

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}AI Meets Data Infrastructure: Cost, Performance, and What’s Coming Next — with Barzan Mozafari