Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
SPEAKER_02 (00:11):
So first, uh
Sharish, I'd love for you to uh
tell the listeners aboutyourself.
SPEAKER_01 (00:16):
Yeah, thank you.
Sure.
So um, you know, I'm corporatevice president for Azure
databases.
So what it means is that I havethe remit to uh to manage all
operational all TP databases forMicrosoft.
This includes on-premisesinstances like SQL Server,
obviously.
Um we have IaaS offerings forSQL Server and you know a couple
(00:37):
other databases, but primarilythe pass offerings in Azure SQL,
Azure Customers DB, which is avery differentiated
non-relational database, andthen the OS databases, both
Azure Postgres as well as AzureMySQL.
And we've recently shipped it,uh shipped the database into
Fabric as well.
So SaaS, so all the way fromon-premises uh to IaaS to pass
to SaaS.
All anything related tooperational databases is kind of
(00:59):
under my remit.
And you know, if you want me totalk a little bit about the
carrier, then I could just go inon a just a few bits here.
Um it's it's been a long journeythrough the evolution of
databases, I would say.
Uh I started out working on SQLServer primarily.
Uh I, you know, that's kind ofreally where I learned the
fundamentals of OTP systems.
(01:20):
Uh I worked on various aspectsof SQL Server, shipped a bunch
of releases back in the day.
Uh, you know, it also helped melike learn the craft of building
software.
It was sort of my formationyears.
Uh and then I moved on to beingone of the founding members of
Cosmos DB.
Uh I that was a complete shiftin mindset, you know, going from
a single node, high reliabilitysystems, to kind of thinking
(01:43):
about globally distributed,elastic kind of databases.
And it's also non-relational, sothere's no schema there.
Um and and it kind of taught theuh distributed systems uh
consistency challenges, uh,scalability, internet scale,
user-facing applications, andit's kind of a different muscle.
Uh along the way, I also took asingle uh small detour uh to
(02:06):
join Single Store.
It's a great company, uh, whichbasically was focusing on
real-time analytics, you know,sort of really challenging the
transactional analyticalboundaries with a very novel
architecture.
Um and that experience wasvaluable because it kind of
forced me to think about likespeed agility, leading closer to
customers and deeply thinkingabout performance at scale and
(02:28):
such things.
And now I have the you know theprivilege of uh effectively
leading uh the databaseportfolio that I was mentioning
a minute ago.
Um and I I think in a way, youknow, I find myself privileged
and and grateful that I got achance to uh to experience
different kinds of theseexperiences across the board
(02:48):
from SQL to Cosmos to you knowsingle store and then back to
all.
And then now I'm focusing quitea bit on all these databases as
well as open source and Postgresin particular.
And you know, it kind of reallyputs all of these things into
the context of Azure's broadermission.
So so yeah, that's kind of likemy uh the the the through line
in my journey is that you knowI've kind of really been drawn
(03:10):
to the challenge of likedevelopers, enterprises,
systems, etc.
So that's kind of really who Iam.
And uh yeah, I I enjoy what Ido.
SPEAKER_02 (03:20):
Yeah.
Incredible here, just the the uhboth depth in what you've gone
into database systems as anengineer and also the breadth of
you know the different types ofsystems you've worked on, uh,
which is I think incrediblyvaluable, especially as
especially as these workloadsare evolving so much and the
requirements are are now beingcompletely transformed by AI.
(03:41):
We'll get into that.
Aloka, you know, this is yoursecond time on the show, and you
know, you also have a very deepuh database background.
Uh, would love to hear aboutthat as well.
SPEAKER_00 (03:51):
Yeah, thanks, John.
And uh Sharish, yeah, veryimpressed.
You know, uh I think the journeyis kind of remarkably uh
parallel, I would say, right?
The formation years and thefoundation years, I got to be
part of the Oracle database anduh kind of grew up there, um,
got to got to work on a lot ofinteresting features.
This is my second time, John, onyour show.
(04:11):
Uh so uh hopefully you'll inviteme again and we'll make this one
of those Saturday Night Live.
I'm hoping to get a robe fromyou as well.
Uh you know, SNL5 or maybe JK5or something like that.
And hopefully Sharish will beback as well.
Um yeah, so you know, um likelike you know, I said, uh I
started off in the Oracledatabase.
Um uh and and I've taken aninteresting journey all the way
(04:33):
till till uh Stream where I amright now.
Uh so I'm uh one of theco-founders in Stream.
I run all of the product areashere, uh right from engineering
to uh product management toproduct support, uh
documentation, all of theproduct aspects come under me.
So I really enjoy doing that.
Um and then um also it's a it'sa it's a it's a great place to
(04:54):
be.
Um, you know, on one hand, weare working with some of the
leading global large uhenterprise customers.
Um so one day, you know, I findmyself in a conversation, you
know, with some of the largestbanks in the world, uh, try to
look at some AI use cases.
And the next day I'm uh arguingwith one of my engineers that um
the amount of you know, justdebugging stuff in the in the
(05:17):
stream server log is excessiveand I don't really like it, and
we need to do something aboutthat.
So um, but just going back tothe database, spend a lot of
time in in recovery, John.
Um uh I think at the time that Ijoined, I found that to be one
of the hardest areas.
You know, uh oftentimes uh insupport, I would look at the
most challenging things, and uh,you know, people would take a
(05:38):
backup and they would restore itand the database won't open.
And um, you know, oftentimesthese support engineers would be
like patching file headers andfixing you know checksums and
stuff like that.
There's like all kinds ofstrange utilities they had to do
block edits, which are at the inthis day and age, like highly uh
not secure.
But at that time they used to dothat.
So that's what kind offascinated me towards uh towards
(05:59):
recovery.
Had a good fortune of uh uhgetting involved in some of the
redo generation layers.
Uh so I used to own LogWriterfor a long time.
And it's been fascinating, youknow, what uh sort of that one
piece of code has uh how it hasshaped kind of my own thinking,
because we went all the way fromkind of logging uh for backup
recovery purposes to physicalreplication purposes, to high
(06:21):
availability purposes, toheterogeneous integration
purposes, and now to kind of youknow streaming intelligence uh
purposes.
So it's been fantastic.
Um and I got to spend some timeat a company called Golden Gate,
uh um which was acquired byOracle.
Uh so uh got my hands dirty innot just sort of the replication
(06:42):
technology, but also in dataintegration, data quality, those
those pieces.
Um, and then um have been busywith uh with Stream uh for the
last decade, uh, where we'retrying to build a unified
platform that brings together uhyou know real-time integration,
real-time replication, andreal-time streaming uh for
analytics and AI reasons.
So it's been it's beenfascinating and excited to be on
(07:02):
the show.
SPEAKER_02 (07:03):
Also, just great to
hear how we've evolved from
using logs for redo and recoverypurposes on a on a single
database.
And if you if you take thatfurther, you can use it for
heterogeneous replication whereyou know you can take those
database logs and and use it toreplicate data to multiple
different types of databases.
Obviously, lots of challengesthere uh that that you work
(07:23):
through.
And then can also be applied foruh you know distributed uh
compute uh use cases and a lotof the the address a lot of the
requirements for the modern uhcloud analytics and now AI use
cases, which we'll we'll getinto why uh replication is
critical for for uh modelcontext protocol specifically.
(07:45):
Uh so excited to get into that.
Uh but both of you, you know,you you you know, you uh the the
great part about you knowspeaking with both of you is
that the amount of experience indatabases and different types of
databases.
And I wanted to ask this firstquestion.
Uh Sharish, if you want to takethe first one.
Um, you know, when you thinkabout the right database for the
(08:05):
job in 2025, knowing all thechallenges that are sort of
ahead of us now with the knownunknowns and the unknown
unknowns that are sort ofcreeping in the background, I
think you see a lot of consensustowards like using popular open
source databases like Postgres,but there's also other great
offerings.
So I want to ask you, you know,how do you advise teams to pick
(08:25):
the right database for the job?
SPEAKER_01 (08:27):
This is a great
question.
And I, you know, you can thinkof it in two different ways.
On one end, you definitely wantto have unification.
You don't want to have too muchof fragmentation, you want to
have fewer choices and justinvest in it to get it uh right.
And then on the other hand, youwant to think about best to
breed and what exactly is theright tool for the right job,
um, that approach.
(08:48):
We may have taken it a littletoo far uh in the DBMS market,
where you, you know, if youthink about Stack Overflow,
there are around 400 databasesor something like that.
Um, I don't know if the worldneeds 400 databases, uh, to be
frank.
Now, and you you mentionedPostgres.
Postgres is definitely having areal moment.
Um, you know, in 2025, it's thehonestly, it is the default
(09:09):
answer for a huge range of apps.
Developers love it, theecosystem is thriving.
Um, and cloud providers, Azureincluded, of course, you know,
we've made it easier to spin upPostgres uh sooner and easier to
manage, easier to upgrade, andall that.
Um, so if you're building a SaaSapp prototyping quickly and need
a uh reliable transactionalstore with a vibrant sort of
(09:30):
ecosystem in Postgres, like inthe form of extensions, um, just
use Postgres does have some uhmeaning to it.
Um, you get good stuff, SQLstandards, like different kinds
of data types, you know, all thelatest and the greatest
indexing, including vectorindexing these days.
Um great community that keeps itmoving forward.
(09:50):
All the hyperscalers believe init.
So um so I buy that argument.
Having said that, though, I willhave a lot of caveats.
Um we are, you know, obviouslyuh definitely want to be very
clear that Microsoft believes init.
We are investing deeply.
In fact, you know, amongst thehyperscalers, uh, we are uh
we're one of the, in fact, thetop committer into the Postgres
(10:13):
source code.
Um we've been committing quite abit.
We've been adding a lot offeatures in Postgres 17, we
added a lot, 18, you know, we'vedone quite a bit.
Um, and we'll continue to dothat.
Um, but back to your point aboutthe you know, the right uh
database for the right job, um,there are limits in terms of any
engine design.
Like, you know, there's nobody,no database engine can be really
(10:35):
designed in a way that it cancapture all the workloads.
And it's always been the case,it will be the case for a long
time.
Database design is a non-trivialexercise, so you do have to make
some choices.
So whenever you make thosechoices, you'd have uh some
gaps.
Um, you know, there aredatabases like SQL Server for
instance.
Uh SQL Server is designed to bea general purpose, amazing
(10:57):
relational database.
So you could do uh through andthrough relational transactional
systems, but you could also doSMP kind of workloads in SQL
very well.
Um, on the other hand, you know,there have been lots of other
scenarios where you have thingssuch as um mission critical,
like you know, there's been anin-memory for some time, column
store for some time.
(11:18):
There are lots of advancedsecurity integration challenges,
uh, analytics integrations, etcetera.
And these are some other placeswhere SQL really shines very
well.
It's optimized for compliance,high availability, uh advanced
security features, like we haveyou know, all this encrypted for
a while, uh I'm just as anexample.
There are lots of other Qrelated improvements in SQL,
(11:40):
which are really world-class.
Um, so you know, it's notnecessarily just that Postgres
is the only answer.
Uh, looking at the workload anddeciding is important.
Uh, and there are also limits interms of how far a relational
database can go.
You know, for that matter, ifyou are looking for something
that has uh a world-class RPORTO characteristics, if you want
something internet scale,user-facing, then Cosmos DB is a
(12:04):
great choice.
You know, it handlesmulti-region rights for sort of
like RTO zero characteristics.
It has single-digit millisecondlatency guarantees with a four
nines HLA guarantees in a singleregion, five nines with
multi-region uh lets you tuneyour consistency.
And there are many apps.
Um, of course, you know, one ofthe most prolific apps of the
(12:24):
day, uh, ChatGPT, uses Cosmos DBfor that very reason.
Um you know, Postgres wouldn'tbe able to do that kind of work
to it.
It's not just designed for that,right?
So there's that.
Then the final piece that Iwould say is that um, you know,
this is a is an importantcategory that was quite the rage
a few years ago, vectordatabases, right?
Uh and my view, uh my team'sview has always been that we
(12:45):
don't need a separatespecialized engines to do uh
vector databases.
You should be able to sort ofevolve it in conjunction with
your operational systems.
And Postgres is a great example.
It has strong vector extensions,um, and you know, it covers most
of all the rag scenarios, graphrag scenarios, you without
having to move the data from atransactional system to another
(13:08):
kind of a system.
So um, yeah, dedicated vectordatabases will have their place
in terms of some extreme scaleor extreme design point, some
exotic sort of scenario.
So I'm pretty sure there isalways a scenario.
Um, I don't want to say thatthere isn't any, but for the
vast majority of theenterprises, um, the complexity
outweighs the benefits.
And so, you know, keeping themall together in one place and
(13:31):
having fewer choices uh is is isa is a good good way to go
about, but but I also don't wantto subscribe to the um to the
dogma of like you know it has tobe only one uh or none.
Um so both points are extreme,uh, but I don't think you know
there's a place for 400different databases either.
SPEAKER_02 (13:50):
Uh and and fast
forward to that question, you
know, how is AI changing youryour business, your business and
in terms of you know, uh lookingat the entire database landscape
and and and offerings, uhincluding examples of, you know,
you mentioned Chat GPT runningon Cosmos DB, but you know,
other examples of how you'reapplying like the roadmap of AI
(14:13):
innovation to databases.
SPEAKER_01 (14:14):
In a big way, you
know, a huge, huge way.
So obviously, um a few patternsthat have emerged, and there's a
lot of frothiness in this space,clearly.
So like dust has to settle downa little bit, but there's
clarity.
Uh there's clarity on a certainaspects, and there are a few
things that are still evolving.
Um, the way I think about it isthat one of the best the best
ways, simplest ways that Iexplain it to my teams is that
(14:36):
there are, you know, there arevitamins and there are
painkillers in this space.
Uh you absolutely need ragpattern because a lot of the
enterprise system of recordinformation will be stored in
operational databases.
And that's never going tochange, uh, right.
And LLMs are not going to betrained on those kinds of data
points because those are bydesign confidential and private
(15:00):
and secured, which essentiallykind of comes back to the whole
challenge of how do you reallyget the entire value of LLMs, uh
semantic searches.
Well, you have to marry thefoundational knowledge of the
training with that of some ofthe system of record uh sort of
engagements that you typicallygo through operational systems,
uh, and even analytic systems orwhatever databases effectively.
(15:22):
So rag pattern is real, it'shere to stay, and we'd have to
do everything we can to reallysupport it.
The way people search data indatabases is changing.
Um, so natural d vector indexingand trying to uh if we were to
design like clause uh in themodern era, it wouldn't be like
a regex-based like clause,right?
We would have done somethinglike semantic like clause, but
(15:43):
we ended up with cosinesimilarities and you know a very
interesting looking uh syntax.
Nonetheless, uh the the searchby meaning, not search by regex
or exact predicates, et cetera,that's going to take off and
it's already happening.
We are seeing that.
And it has some reallyinteresting implications in
terms of how people think aboutsearching their data, attaching
(16:04):
it to their AI apps and all thatstuff.
Uh, and in particular, if youthink about some of the major
problems that databases alwayshad, um, and this is kind of
really where, so far, what I'vesaid, I kind of think of them as
vitamins, the painkillers.
And you know, it's veryimportant to think about the
painkillers as well.
The core cohort of databases,uh, either DBAs, develop data
(16:27):
developers, people who spend alltheir time day to day managing
these enterprise, really bigapplications, they really need
to understand the databaseschema, need to understand the
performance, they need to keepon tuning it.
Uh, they cannot really uh, youknow, they you can't design a
database and forget about it,right?
Um, so how can AI really helpthem in their day-to-day jobs?
(16:49):
A very classic example that'ssomething that you know we are
very focused on at Microsoft, isuh help them chat with their
query plans, right?
Not just do something simple onthe periphery, but go deeper and
help them like solve their pain.
It's not just a vitamin, it's apainkiller, right?
Um, you may have query plansthat are very deep, very
(17:10):
complex.
Uh, how do you really enablethem to go chat with them
without having to go through allthese graphic trees?
You need to keep, you know,there are many queries which the
tree itself is really hard tonavigate.
Um, could you make it easy forthem to in a natural language
chat with these uh and go deeperdepending on their expertise,
depending on their problem?
We are not far away from a pointwhere the data the these
(17:32):
developers, DBAs can come in andask the question, hey, why is my
database slow today?
And it can give a simple answer,like, hey, you're missing an
index or some, you know, yourqueries have changed and this
this needs to happen.
Or it can go all the way, like,hey, I see these wait stats and
like you know, these deadlocksare happening here, or maybe
something's really happening.
Um, your write patterns havechanged.
You go really, really deep, orit could say that, you know,
(17:54):
maybe it's time for us to lookat your query plans for these
queries.
Uh, some things have changedhere.
Let's go look into it.
You could go deeper and deeper,and it can be like a
conversation, doesn't have toreally require some really deep
expertise.
So, those are a few examples,and I haven't gone touched on
some many other things that youcould do.
You could rewrite queries to bemore efficient, you could do a
lot of things.
Um, you could certainly, youknow, the simple example is
(18:17):
natural language to querylanguage, where people want to
just give you a prompt and thenget a uh get get get just a full
SQL query.
Those things are real.
And and it's not just about theengine query, et cetera.
It could be uh cost governance,it could be sort of resiliency
governance, it could be securitymanagement, et cetera.
So yeah, it's truly pervasive.
(18:37):
It definitely captures all thesescenarios.
Um, and you know, on on theother hand, I also want to point
out again, as as you werereferring to, um, some of the
biggest workloads of the of theday.
Uh ChatGPT, of course, it's it'smassive, it keeps growing
significantly.
Uh they all rely on ourdatabases.
In particular, ChatGPT usesCosmos DB, all the user-facing
(18:59):
applications of ChatGPT.
Every time you you interact, anymessage uh operationally it goes
through Cosmos DB, all theuser-facing apps.
We also use Postgres SQL in thatspace.
Um it it that's sort of like theuser-facing thing.
I haven't even touched on how wedo our development and how we
think about using AI.
Um, but you know, suffice it tosay that it has dramatically
(19:23):
changed in the past uh severalmonths.
SPEAKER_02 (19:26):
Yeah, absolutely.
It's it's it's changing veryfast.
Like that's and then that'susually the main takeaway from
every uh episode that you knowwe have folks talk about AI is
just the the rate at of ofchange is just remarkable and
something we didn't see in thethe 2000s or 2010s.
Uh Alok, I wanted to ask you asimilar question.
(19:47):
Obviously, you know, we wetalked about uh vector databases
a few years ago, you know,stream, we you know, we're
focusing on database replicationand data streaming.
Uh, and obviously vectordatabases came up when there was
like a hype cycle for it.
And you had a similar view,which is sort of like this is
more like an index type ratherthan you know a full class of
databases that we needed tosupport.
(20:09):
So, and and now a few yearslater, we've seen that play out
where you know vector extensionsfor the mainstream databases
seem to be getting the the lionshare of a of adoption there.
But I also want to ask youbecause you know, you by working
in change data capture, gettinginto the logs of the databases
when you truly know how thedatabase works, so you have a
(20:29):
unique perspective on you knowhow to choose the right database
or the right job in 2025 andbeyond, you know, given all the
AI transformation we're seeing.
SPEAKER_00 (20:38):
Yeah, um, I mean
it's a great question.
Uh, and I think Shirish kind ofcovered it uh, you know, fairly
in detail, right?
I I do think that, you know, onesize kind of doesn't fit all.
Um, you know, and and that'ssomething in the database
community, you know, we'vepublished multiple papers on
that and debated it uh far andwide.
I I personally uh still believethat to be very true.
(21:00):
You know, there are some cases,uh, John, where it makes sense
to kind of invest the energy andthe effort and the resources to
address a very specificworkload, right?
But then, you know, the otheraspect of it is treating that as
more of a general workload forwhich there is an engine.
So I think databases, you know,have come a long way.
(21:21):
And I do think at this stage,take a vast look at, you know,
maybe largely from the customersthat we interact with, how do
they think about it?
I think that's kind of aninteresting question.
On one hand, they have like sortof as their legacy workloads,
right?
And they're sort of invested inthat.
Um, and so on one hand, I thinkwhat happens is because of that
(21:44):
legacy investment and some ofthe engine choices and the
feature choices that have beenmade maybe 20, 30 years ago,
they're not able to change andevolve that very fast, right?
So that's kind of like one thingthat I see.
So that begs the question thatwhat do I do if I really want
something fast?
I want, you know, some new uhtype of uh MFA, or I want to
introduce some new security uhlevel uh feature, which
(22:07):
otherwise wasn't thought throughin the original design and so
forth.
So um I do see them sort of thensaying, hey, how can I actually
introduce these newer servicesand newer workloads, newer
applications, but I don't haveto be sort of limited to the
choice that I made a while ago,right?
So that introduces kind of thisnext aspect of my choices, um,
(22:29):
which says, oh, maybe I canactually now uh go in and you
know take a look at the best ofbreed for the stuff that I'm
doing.
And we and I'm seeing I thinkthat a lot, right?
So I think just to just to kindof uh add on to Sharish's point,
like you know, you may thinkthat, hey, this specific
workload is best suited onCosmos DB on Azure, right?
So I'm going to actually go inand for a lot of this uh data,
(22:52):
and we have seen this in someyou know manufacturing areas and
so forth, right?
They're like I the this this isjust like sheer petabytes of
data, and I want to like justyou know push this um in a
scalable way.
Um so we so we are we are seeingkind of that as the second part
of it.
And then there's hey, I'm a newuh kind of startup, I'm a new
young uh company, and so whatare my choices?
(23:15):
Um and that's where I think costbecomes also an interesting
part.
And I do see that many of themwill naturally gravitate towards
flexibility, open source, uhrelying on the community for
support.
And that's where you knowchoices like Postgres make a lot
of sense.
And I and at stream ourselves,when we had to make that choice,
right?
Right, you know, uh very earlieron, we said, well, you know,
(23:37):
let's go uh and we looked atJavaDB uh and we looked at
Postgres, right?
And we started off uh and thenas we evolved, you know, then
customers kind of came in andsaid, Well, look, I'm already
running, you know, one standardpotentially for my mission
critical workloads.
Uh, could you guys also supportthat?
Right.
So then we started moving intokind of you know supporting
additional RDBMS engines, uh,but we started off with
(23:59):
Postgres.
That was kind of the main point,right?
I don't think there's like a onesize fits all uh at all.
Um I am seeing a lot of uhshift, I think, in terms of the
movement towards newer types ofengines, especially uh on the
cloud side.
I think this attitude that I'mgoing to manage kind of like you
(24:19):
know my own schemas and my owntables and worry about the space
management, worry about kind ofthe backup restore part of it.
I think that is is becomingsomewhat kind of outdated.
I think I do see even very, verystrict customers in the
financial arena and so forth nowchanging their eight, nine years
ago, they were like, no, we'llwe'll never get to the cloud.
(24:42):
And now, you know, you wouldalmost uh that's a joke, right?
I mean, I think you'd probablynot be in your role very long if
you have that kind of anattitude.
And I think so that that is astrong shift.
So we are seeing a lot ofworkloads sort of now, you know,
either migrate or or or split uhacross sort of like the legacy
and the newer types of systems.
(25:02):
So I think again, the no nomagic bullet here, but I do
think it really depends on sortof what journey are you in at
the time that are you inheritinga workload or are you creating a
workload from scratch?
To you know, what are the costchoices that come into play?
Um, and then finally, you know,you know, is this are the
existing systems not adequatewhere I need to go invent my
(25:24):
own, right?
I think so those are, and whichare in the minority at this
point, I would say.
SPEAKER_02 (25:28):
So that was a
long-winded answer, but you
know, John, hopefully I address,you know, you know, absolutely,
and you know, having the thebroad kind of reactive view of
what's what's happening rightnow and and and what changes
we're gonna make to um to to tosupport all the dynamic shifts
in the market, but also kind oftying it back to okay, you know,
(25:49):
databases have been around fordecades and they work the way
they do for a reason.
I mean, it's incredibly valuableto have that uh that perspective
on top of that.
Uh so I think that's that'sthat's always gonna be valuable
for uh for executives and andengineering leaders to apply
that in their in their thinkingwhen they do choose, when they
do architect their nextgeneration uh applications.
(26:11):
So speaking of next generationpatterns, I'm I'm I'm gonna ask
about model context protocol, uhspecifically for databases, just
to define it quickly, modelcontext protocols, a really a
simple wrapper for LLMs tointeract with APIs for uh for
either interacting withapplications or retrieval from
(26:33):
uh warehouses or databases.
Sharish, I wanted to ask you uhwhat for MCP for databases
specifically, what value andpatterns are you seeing from
early AI applications that areusing it?
SPEAKER_01 (26:47):
Yeah, I so I do
think that MCP has um real
potential to become the missingbridge between the large
language models and enterprisedatabases.
There are certain ways to getthe large language models
interoperate with enterprisedatabases, but it requires a
very intended, intentional uhsort of access pattern from the
user without MCP.
(27:07):
Uh and there's just a lot oflike sort of repetition in terms
of the implementations.
So you know the tension hasalways been how do you let an
LLM understand your datastructures, policies without
actually giving away the keys tothe world, right?
Uh MCP solves that.
And apart from all the work thatyou need to do, sort of every
server needs to be implementedin a different way, uh in a
(27:29):
unique way, in a bespokenfashion.
MCP solves that by creating astandardized sort of like a
policy aware layer between themodel and the databases.
Um and instead of like the modelscraping schema or free forming
SQL, uh, the server on thedatabase side, in this case, can
expose um what is safe and whatis like sort of clear metadata
(27:52):
information, could be schemafragments, curated tools,
whatever.
They can all be put together.
Um and they can be mostimportantly, uh can be filtered
by permissions, governance, andcost awareness.
And that's important because thefilter really can help you in a
big way.
Um if you kind of zone in on thesecurity part, it's essential
because the principle of leastprivilege can be applied to LLM
(28:14):
database interaction.
Um, and that can safely unlockthe LLM-driven apps while
keeping compliance and trust andthose kinds of things.
So, you know, some of thepatterns that come up when I
think about the AA apps usingMCP uh with our databases.
Uh so there are a lot of folkswho are thinking about
policy-aware introspection.
Uh, models don't need to see thefull catalog, don't even see it.
(28:36):
They get a sort ofbusiness-friendly, least
privileged kind of a view withsensitive fields, masked and
like with row-level rules bakedin so that the the applications
models basically can takeadvantage of the data and
databases safely.
Um, there's also like aconstrained execution.
Uh, you don't want to justinstead of running anything, the
(28:59):
model can have uh higher levelverbs that they can get um in
terms of let's say, hey, get mesome customer metrics or like
explain this query or whatever.
The MCP server can thentranslate it, validate, and
force uh, how do you really, youknow, just give the data that
the model wants?
Um, and filtering really isimportant.
Um, yeah, and you know, finally,I would also say there's like
(29:22):
auditable sessions.
Um, access is like time boxed,read-only by default, every
action is logged.
So that's also important.
But just generally the power ofMCP is immense, but you need to
be very careful in terms of uhhow you're auditing it, what are
the tools, etc.
There have been lots of cases inthe recent times where some of
these LLM models, apps, etcetera, can go and do quite a
(29:44):
lot of damage to your database.
So you've got to be very carefulabout it.
Um, but once you've taken careof that, then uh you know the
design patterns in terms of likebuilding a semantic layer, um
all the easier ways that themodels can give to LLMs, those
are very powerful.
Definitely.
Definitely helps them for LLMsto be deeply database aware,
(30:04):
understand all the detailswithout without again
compromising on safety orcompliance.
That's an important shift.
Use the power carefully, butthere's a lot of power here.
SPEAKER_02 (30:15):
Absolutely.
And and Alok on your side, youknow, what's really required to
make model context protocol workfor databases in a way that
enterprises can deploy this, notonly with the right scale to
handle their workloads and alsothe right governance.
SPEAKER_00 (30:33):
Yeah.
I mean, great question.
And again, look, I think numberone, uh it has significantly
simplified how you're exposing,you know, all kinds of engines
now.
Um, you know, you don't have tosort of in a proprietary way
redesign every single timeyou're kind of you know uh
working with a newer type of uhof an application, uh APIs and
(30:55):
uh and engines, right?
So MCP is super useful for that.
I think in terms of databases, Ithink some of the things that uh
that Sharish mentioned, right?
Um, around right from uhprivileges and roles uh to kind
of the access to the dataitself, um, these are
significant constructs thatalways seem to you know you have
(31:17):
you you must address.
Now I view this as just yet onemore workload, right?
Um like agents coming in andthen over a protocol
conversationally trying to justsimplify what all of us really
are trying to do at the end ofthe day, which is, you know,
let's okay, you can think of itin queries and engines and query
plans and optimizers, butfundamentally I'm trying to
(31:38):
answer a few questions thatcurrently are are interesting
for me for let's say operationalanalytic reasons.
So the conversational style iskey, right?
So I do think that that's thepower here.
Now, as you start havingconversations, we've seen this
in human conversations also.
Some people are superinterested, really good at
probing uh and gettinginformation out, and some of
(31:59):
them are not so good, right?
And so with the right kind ofagent, you could leak
information, you they could getinto some of the you know uh
sort of not so protected partsof your database and cause havoc
and damage.
So, one of the ways in which Ithink um just like a traditional
workload, right?
And you Shirish mentioned theword read-only, right?
(32:20):
So that's why I kind of wentdown this path.
We've seen in the past wherethere was uncontrolled um taxing
of resources on a productionsystem.
And, you know, being a databaseuh person, you know, an easy way
to address that is to say, well,why don't I actually try to have
maybe a server farm where youknow I could have the rights
being routed in to one location,and then I have sort of a
(32:42):
replica form.
As long as the latency is goodenough for the questions, uh,
you know, it's it's great,right?
So I think that's one of theways in which one of the
patterns that's emerging um oris to expose kind of a database
engine you know over MCP.
Um, and then uh so let me takethat back, database engine you
know, with some replicas, andthen now having one of the
(33:03):
replicas being exposed over MCPto an agent.
So that's kind of your firstgo-to.
And then if something reallywarrants kind of an actual
transaction where it's notconversation, but I'm actually
making a transaction, then Ithink there's a rerouting back
to back to kind of theproduction system.
So I think that is going to beone of the interesting patterns
here.
Time will tell whether we'll allmove in that direction.
(33:25):
But we saw that implemented verywidely globally for some of the
like most critical workloads.
And agents kind of spawning moreagents and having that kind of
uncontrolled, that does put alot of load onto the production
system.
So, how do we mitigate againstthat?
Um, I think that's where kind ofsome of these new patterns are
are going to uh to emerge.
(33:46):
And at least and specifically atStream, obviously, we're we're
we're right in there, right?
Where we are trying to say, hey,you know, we want to make the
actual intelligence that'sexposed by not just one engine,
but by a combination of enginessimpler.
And that's sort of like where wethink MCP can be used to take
portions of a database and thencleanse parts of it for
(34:08):
governance and et cetera, andthen expose that via maybe
another system.
And that could be a low-costsystem.
It doesn't have to be the samecost as the original system.
And now you can actually have aninteraction over MCP with that
system.
And this is sort of some of theways in which you can try to
protect uh, you know, some ofthe challenges that Sharish
talked about, as well as youknow, try to broadly just solve
(34:29):
like a distributed replication,distributed database problem,
you know, over MCP.
So that's kind of like those aresome of the early thoughts in
the in this area.
Absolutely.
SPEAKER_02 (34:38):
And you see these
patterns for real-world agent
deployments.
And it's so clear that the keyto success there is in the agent
implementation, giving, creatingthese deep agents that have
autonomy to solve the businessuse case and have the right
layer of indirection where youhave the infrastructure to be uh
(34:59):
set up to handle these deepagents and any subagents that
are created in the process tosolve these problems.
So, you know, we talk about thepopular use cases like customer
support uh agents that can goquery the transactional data,
maybe make, you know, like youwere saying, Oloc, actual
transactions on the databaseitself.
But doing that in a in aprotected manner through read
(35:23):
replicas, which minimizeslatency and minimizes risk.
Because you don't want the agentimplementation to be concerned
about governance, right?
You want the governance to besolved outside of that and let
the agents do what they're goingto do, but have the right
guardrails to effectivelyprotect uh you know any any
issues that could cause.
(35:44):
So I do think this is one ofthose situations where the um
the innovation is certainly newwith uh with uh model context
protocol, but the patterns areare very similar to having
analytical read replicas, right?
So uh the the actualinfrastructure that teams are
are gonna roll out here is verysimilar to what uh how database
(36:07):
architects have addressed thisin the past.
Like you said, Alog, it's it'syet another workload.
And how would you handle anotherworkload?
Well, uh, you know, there'sthere's there's ways to do this.
Um so very exciting.
I think there's I I think thisis a much more nuanced takeaway
for for the listeners.
I know that when when ModelContext Protocol first came out,
(36:28):
you know, if you looked at thelike the official docs for it,
or at least one of the sort ofdocs on it, I think the
suggestion was just to have a uha read-only role for for your
agents on on a on a productionPostgres database.
And people obviously ran intothe obvious issues that can stem
from that.
Um, you have to be a little bituh go a little uh deeper into
(36:50):
that problem to solve iteffectively.
So I I I do think the listenerscan can take away uh some some
really actionable uh designpatterns from from uh what you
both shared.
So thank you.
I wanted to move on to uh toPostgres and and chat about that
a bit more.
It's it's always a very populartopic on on this show.
Uh we've had several guestswho've who've gone deep into uh
(37:12):
uh really unique uh cloud scalePostgres implementations.
Sharish, first I wanted to askyou, I think Postgres is a
superpower's extensibility andthe community behind it.
You know, what are some of thebig architectural bets you would
like to see the community rallyaround in in Postgres going
forward?
SPEAKER_01 (37:31):
Yeah, so you know, I
totally agree that the power,
superpower for Postgres isdefinitely extensibility.
It's extensibility.
And you could add new datatypes, new indexing methods,
even runtimes.
And that's why it's as popularas it is today for many
developers.
Um the shadow side of that iscomplexity, as you said, right?
Uh it is definitely somethingyou can end up with.
(37:53):
Um you can easily end up with apatchwork of extensions in your
workloads.
Um, they may not really allinteroperate in the way that you
want it to be, uh, in the in thecleanest way.
Uh, and that can make it harderfor Postgres to scale in
production.
Um, so you know, to yourquestion about what are the few
things that the architecturebest that I'd like to see the
community rally around.
(38:13):
And by the way, we as Microsoftare invested deeply.
Uh, we ship many extensionsourselves.
Um, but I I a couple of thingsthat I really like to see.
Firstly, I think it'd be good tosee like a unified story for
scale out.
Um, we have, you know, we wehave a few solutions uh uh and
we invest in CITES, we reallybelieve in it, et cetera.
(38:33):
And we're trying to push it,make it easier for developers to
shard, not just on the uh uh arole level or a table level,
even schema level.
So we have different kinds ofabilities to short, but um
having a native consistent wayto scale a workload out uh would
really make Postgres uh uh thevision of sort of being a
default database or whatever.
I mean, again, I don't believein being a default database, but
(38:56):
to the extent that developerswant to keep it as a default for
most scenarios, um, that idea ofunified scale out, and I know
different people have takendifferent uh paths here.
Uh so that's like you know,that's really important, I would
say.
The second thing, back toextensions, I do think that you
know, standardizing uh thepackaging and lifecycling, I
know there's this work that'shappening here, but installing
(39:18):
and upgrading extensions, it's abit of wild west.
Um if if there is a if there's away that the community could
converge on a model whereextensions are like clearly
versioned, certified, they payall well, play all well
together.
Uh that'll help the community.
It will definitely unlockconfidence for enterprises.
You know, you're looking atadopting Postgres.
(39:39):
Um, it's one of the challenges,right?
So you could really go solvethat.
Um then I would say, you know,the third one, I I think there's
a lot of work on AI primitivesin Postgres, but I would love to
see like them done in thePostgres way in a native way.
Uh and this is where like, youknow, we are definitely leaning
in in terms of bringing in someof the cool things like disk
(40:00):
ANN.
Um I know there's PG vectorextensions, which is very
popular.
Uh, but things around vectorsearch, embeddings, retrievals,
etc.
I think the community needs tospend a little bit more time on
agreeing on standard operators,indexing strategies, governance,
and how they really operate withthe runtime capabilities
outside.
Um, there are a lot of you knowcompeting sort of ways to do a
(40:21):
few things, and it's not itneeds to be deeply thought
through.
Uh, more love is needed there.
Uh, I think those are a fewthings.
Maybe, maybe finally, I wouldjust add one more.
Um I think operability as afirst class concern uh is an
important thing for Postgres.
Uh, given the amount ofengagement that it is seeing,
uh, the workload isolation,resource governance, uh, those
(40:45):
are the kinds of things that arereally important when you are
trying to deploy this in uh atscale, uh especially in cloud,
the core and the extensionecosystem, they need to embrace
this thing natively.
Uh I think that'll really go along way.
It'll reduce the cognitive loadfor anybody running at scale.
So Postgres has already won onextensibility.
I don't think there's anything,there's nobody debating about
(41:07):
it.
Um, but I think the nextfrontier I would say is about
coherence.
It's about really making surethat you scale it up to the
enterprise grid, think aboutoperability deeply, and make AI
like you know, first classinstance and make the extensions
more certifiable, extensible,and that kind of stuff.
The community is just um it'sit's a brilliant community.
(41:27):
They care about all these piecesvery well, deeply.
Um, and you know, we are here atMicrosoft certainly engaging
with them, uh, contributing.
We do have uh very deeplywetted, deeply embedded sort of
uh contributors and commuters aspart of our team.
Uh you know, we are very engagedand we look forward to
partnering with the communitytowards these goals.
SPEAKER_02 (41:49):
Absolutely.
And that's one of the mostpowerful parts is you know,
through community, you getstandards and you have less uh
you know duplicative problemsolving between companies.
Uh and Aloka, I wanted to askyou, you know, you're you're
also working deeply with uhPostgres through through through
stream and and databasemodernization and innovation
(42:13):
projects.
You know, what have you seen interms of scale and adoption of
uh Postgres for uh forparticular enterprise use cases?
SPEAKER_00 (42:21):
From a horizontal
perspective, um, I think we're
seeing a lot of adoption ofPostgres.
Um and let me just clarify whatI mean by horizontal, because I
mean, with with specifically uhyou guys, it can mean one thing
if you're in the engine levelversus sort of like a like a
general level.
What I mean by that is umbroadly from uh from an adoption
(42:43):
perspective, as folks arethinking about adopting AI and
analytics and newer workloads,they are thinking of spitting up
a lot of their existing umworkloads and applications.
So there's this whole journey toget to the cloud.
Um, and what we are seeing isnumber one, the question that I
get asked most often is hey,look, right now I'm running this
(43:04):
on like a massive Oracle Rackcluster.
Is Postgres, is it mature?
Right?
Is it going to be able toactually scale to these
workloads that I'm running thisvery, very mission-critical
application on, right?
And awkward question for mebecause I don't want to take a
stance.
Uh ultimately I am um, but weare seeing that question come up
more and more.
Um and I do think that some ofthe very, very large enterprises
(43:29):
are making that bet, especiallywith the with the hyperscalers,
right?
When especially when they havetheir own flavor of Postgres.
Because then I think they dorealize that, hey, if there's
something is not uh up to whatyou know they they need in terms
of SLAs, in terms of their theirroadmap for the future, they can
(43:50):
go in and hold somebodyaccountable for that, right?
So that I think we're seeing ashift there.
Now, in terms of the adoption ofPostgres, I can tell you at
Stream specifically, I mean,we've had massive
implementations where we'vedone, you know, literally within
a few weeks, thousands of actualuh migrations uh into Postgres
(44:11):
from you know on-premise typesof systems.
And these are not sort of liftand shift migrations, mind you,
right?
These are actual what I callzero-downtime live type of uh
migrations, oftentimes withbi-directional links being set
up, because you don't want togive up on an existing
on-premise system.
You want to actually have a wayto move over and pull the plug
(44:32):
at your own leisure, if so tospeak, right?
You don't want to say, hey, I'mgonna keep my entire um partner
community, user communityhostage and you know, December
31st at midnight and hope forthe best, right?
That I think those days are alittle bit tough to swallow
nowadays.
So we are seeing that adoption.
The other piece that also we areseeing is sort of this from the
(44:56):
cloud to maintaining, and thisis largely cost reasons as well
as some SLA concerns, where weare seeing a reverse flow of
limited portions of that datainto what I call like you know,
managed systems in Postgres thatcustomers are managing
themselves.
So that's where they could havea version that now they're
(45:19):
maintaining, um, but they justwant to have that for maybe
let's say in a geographicallydistributed retail system, it
could be per online store.
There's a Postgres that'srunning.
And so they do want the reverseflow also coming in.
And if to that degree, I mean,this sort of should sound
familiar to your audience, goingback to maybe operational to
(45:40):
data warehouses to data marttype of uh of a pattern.
Now we're seeing sort of likethis you know legacy to cloud to
sort of like you know, back,which is sort of like this, it's
not quite a data mart, but Iwould say it's a data product,
right?
Where a team wants to actuallyget portions of that data, but
they don't want to necessarilyhit just the the core system on
(46:03):
the cloud.
And that's where, again, from ascale perspective, literally
I've seen single customers adoptyou know 10,000 to 15,000
Postgres databases, um, youknow, and and they're they're
they're pushing data from thecloud-based systems onto these
uh these uh self-managedPostgres uh systems.
So I think I think it has a, andPostgres has come a long way,
(46:24):
right?
From uh I mean Sharish wouldknow better, like I think from
version nine till now 17, 18coming up and so forth.
I think right from, I mean,they've caught up on on
partitioning and on replication,on security, on uh, you know,
just uh just developerproductivity, supporting basic
operations like upserts andmerges and all kinds of uh JSON,
(46:46):
JSON B data types and the in thevector vector types.
So I think they've made a lot ofprogress, but I do think I want
to pick up on one point thatSharish made, which is on the
mission critical part of it.
I think customers do have havehave concerns there right now,
where the automated uh part of,for example, um maybe just
(47:08):
moving over from a primary tocutting switching over to a
standby and back, uh, or havinglike a cascading system where I
do move to a standby, but thenall the workloads that are were
relying on the primary getadequate notification and it's
seamless.
I mean, and I personally dealwith, for example, Postgres
replication slots, right?
(47:29):
So I was very happy to seefailover slots, you know, so
that at least you're you don'thave to reinstantiate uh if you
just go to a standby, you'reable to take advantage of that.
I was happy to see that there'ssupport in these decoding layers
for more than one gigabyte of uhof redo uh for if a transaction
spans that.
So I think I think we have comea long way.
And um, as the future seems verybright, and customers are super
(47:49):
interested in it.
Kind of that's sort of my takeon Postgres.
SPEAKER_02 (47:53):
I always think
that's a very important
perspective for uh engineers andleaders to hear, because I think
there's there's sort of twosides of the coin of hey, you
know, you can try Postgreseither in a lab or if you're an
early stage startup scalingincrementally, you know, over
you know, over the span of ayear.
That's completely different froman enterprise migrating an
(48:14):
existing production workload uhthat touches thousands of
employees and millions ofcustomers with lots of revenue
on the line and lots of businessrisk, and and then migrating to
a completely new database there.
So like the the risk calculus isis is completely different, but
it's also very promising to hearthat there's certainly adoption
(48:36):
and examples of wins andmaturity uh there.
So I want to ask aboutmodernization broadly, and and
maybe Sharish, we can start withwith you here.
So for teams that aremodernizing data applications on
Azure specifically, what's yourpractical playbook for success?
SPEAKER_01 (48:55):
Yeah, it's a great
question.
You know, at Microsoft, webasically uh we we definitely uh
have a lot of evidence andstarting to see a developer
story emerge around uh aroundyou know the Azure AI and it
comes down to unifying threelevels that that I think used to
be separate uh for a long time.
Uh, there is a developerexperience that we go after with
(49:15):
Copilots and VS Core.
There is the data foundation,which is where Fabric and Azure
databases, Microsoft Fabric andAzure databases play.
Um, and then there's anapplication runtime in between
the developer experience and thedata foundation, which is
basically what we have withFoundry for agents and AI
workflows.
So that's sort of like thethree-piece stack of Microsoft
(49:37):
that we go with, and that's thethe stack.
Like there's a developerexperience on top, application
runtime, and then there is adata foundation.
So that combination is powerful.
Developers don't have to jugglelike multiple tools anymore.
They can just open the VS Code,use co-pilots to design prompts,
queries, or really go deeper ifyou are a pro developer, or you
(49:57):
could use like GitHub Spark forlike, you know, just getting
started with VibeApps.
Um and then you you you getpowerful extensions where you
can see the governor, the datagoverned right there.
You could push to productioneasily, uh, you could monitor,
but you have all these keypieces from Foundry coming in to
make that stack really, reallyhelpful.
Now, if you're a teammodernizing data apps on Azure,
(50:19):
specifically since that's yourquestion, um, the playbook that
I would generally use is like uha couple of steps.
The first step I would say islike think about the developer
UX.
You know, you you obviouslytools, tools, tools.
You gotta start there and you'llreally make it easy.
Uh, and in the world where thereare so many tools to do a lot of
these things, it's reallyimportant to standardize on a
few things.
I would suggest VS Code, GitHub,Copilot, Azure AI extensions,
(50:42):
and such.
You know, these are very wellproven.
Uh, the developers should testprompts, build workflows, debug
database interactions in theIDEs, get confidence because it
kind of really helps them.
Um they have to get familiarizedbecause it's a new era for
everyone.
You know, it's like honestly,99% of the developers haven't uh
ever built workflows in thisway.
(51:03):
Uh so you have to lower thebarriers and build the momentum
quickly.
So betting on some of thesetools like VS Code, GitHub
Copilot, AI extensions in Azure,et cetera, really good starting
point.
The step two, I would say, islike you bake in your
operational guardrails.
Uh, and and this is like data inspecific, I would say.
Uh, this is where like fabricAzure databases with built-in
(51:25):
governance, row-level security,workloadized, all those things,
um, wrapping it with Azurepolicy, et cetera, comes really
uh handy.
Um every AI generated query, yougotta have a very clear path,
you know, what is the MCPstrategy or what is the
governance strategy, what is theobservability, et cetera.
So those guardrails need to bevery clearly defined, rate
(51:46):
limiting, you know, from dayone, for instance.
So those those that is a steptwo, I would say.
Then the step three, I wouldsay, is that you've got to
choose your workload wins uhthat prove the point before you
really go and scale.
Um this is perhaps one of themost important mistakes that
most of the customers do.
Um, you wanna you wanna likereally get something going, a
clear pain point that you can gobuild and then establish your
(52:09):
playbook.
Uh we do see patterns likecopilots for analytics teams
that accelerate SQL and BN,Fabric, Agentic style apps in
Foundry that automate thingslike customer support, IT ops,
et cetera, with clear SLAs, um,high volume, cost-sensitive
queries for databases.
These are all a few workloadsthat you could get get going
(52:31):
with, right?
You sort of like win them, provea point, uh, and then then scale
about it.
So I think the net effect thatI'm trying to say here is that
developers stay in their flow.
Um, ops need to get theguardrails that they want.
And then on the other hand,business leaders want to see a
quick proof of performance, costsaving, reliability.
You could totally achieve thesethree things that I touched on
(52:54):
the developers, the ops, and thebusiness leaders.
Uh, so I think this is kind ofreally the playbook that I
generally discuss with many ofour customers.
Uh, then you can go fromexperimentation to uh to true
modernization on Azure um at alarge scale.
So we got all the tools, youknow, we have all the things
that you need at every step ofthe way.
Um, but being very clear aboutthe stack and then going these
(53:17):
in in these orders can reallysimplify.
I think that's a that's a goodplaybook and in my mind.
SPEAKER_02 (53:22):
Alok, I wanted to
ask you specifically, you know,
we we we worked with so many uhcompanies that look at the data
foundations that Sharishmentioned in the in the
modernization process.
What's your recommendation andadvice strategies to to leaders
who are going through majormodernization?
SPEAKER_00 (53:41):
I mean, I think from
my perspective, I think the last
point, right, about businessleaders trying to do it
piecemeal is very important.
Because often I think that um,you know, they could take sort
of like a like a, hey, let'sjust do this thing.
And then if something doesn'twork, then they start
re-questioning and going backand saying, hey, was this the
(54:02):
right choice or not?
So, and I've seen multiple folksbe successful, large uh
companies that have literally 50to 100,000 databases be
successful in this is byidentifying workloads that could
take advantage uh of some of thecapabilities, uh, let's say, um,
on the cloud, either services orinfrastructure, uh, storage,
(54:22):
compute, uh, elasticity, scale.
For and and so then the questionis like, you know, what do what
do we do with the data, right?
So I think it's it's key to makesure that if you're trying to
deliver a new contained workloadin the cloud, you identify sort
of like, hey, what is thatworkload, number one?
And and second, what is the datathat you're going to to need?
(54:45):
And that's where uh at least myconversation start, right?
That we're giving a lot ofchoice to these to these uh to
these businesses to say, look,you could actually take subsets
of this data and safely go inand try out uh and test out the
the newer uh service, the newplatform, the new
(55:05):
infrastructure, storage system,compute, whatever, however, what
whatever choices you made, butstill allow your workloads to
run there and see whether theymeet your requirements and your
SLAs or not.
And I think that is what themodernization journey looks
like, you know, to me, right?
Because I would recommend that.
That, you know, oftentimes ifyou do a lift and shift, I think
(55:27):
it works for test and dev typeof workloads, it may work fine,
but for production, it's alittle risky.
And shutting down a business insort of like 24 by seven global
economies is no longer anoption.
So, what that looks like is tohave a sort of a proper testing
methodology towards gettingthere.
And and I think loosely to bevery specific, you could just
(55:51):
think of this as either databaseor platform modernization or
application modernization, andthen having a path to that by
saying I'm going to actuallykeep maybe concurrent systems,
and then that gives me thatchoice.
Uh, so I think that should be akey part of this.
Um, along the way, I think therewhat we do see in modernization
is a tool set that's neededbecause rarely do we see people
(56:14):
say, okay, I'm running aworkload, I'm just going to go
lift and shift and run itsomewhere else.
They do want to take advantagealong the same time to say, hey,
all these choices we made 15, 20years ago, maybe there was not
um you know instant alertingavailable at that time, uh, you
know, based on some trigger.
Um, I do want to address that inmy newer system.
(56:34):
So along the way, do I want toactually now revisit uh some of
the design choices forpotentially my schemas?
Um do I want to um, you know, goin and denormalize certain
things or normalize certainthings along the way, right?
So now you're introducing sortof like the the requirements for
a tool set that allows you toaddress this in more of a
(56:56):
comprehensive manner ratherthan, hey, you know, leave it up
to individual developers to kindof go in and do that.
So, number one, introducingthose changes, number two,
tracking the state of thosechanges and the metadata around
that.
So you're aware that, hey,there's two multiple systems
involved, but they are runningwith different types of
configurations, but somebodyknows about it and that's
(57:18):
queryable.
So that's kind of like one partof the modernization uh that
kind of comes in uh here aswell.
So I think those are two keypieces.
And it gives you, at least fromfrom the from the stream side,
I'll tell you that we have aswe've revisited this uh the
modernization journey, we'vealso seen customers say, Well,
look, I am actually doing a lotof you know what I call instant
(57:40):
dynamic alerting, dynamicdecisioning by through polls,
right?
And so there's this ability tointroduce that in the path,
especially if this is going tobe a multi-year journey, right?
So then if you sort of put thatin the path, in the in the
pipeline, that's where you sortof add this whole modernization
intelligence uh to the wholesystem, where you're saying
you're getting into now some ofthe interesting concepts like
(58:03):
the difference betweenhistorical deep analytics versus
streaming real-time analytics,um, doing alerting based on
push-based events as opposed topoll-based events.
So that's also part ofmodernization because that's if
you take a look at the consumerstoday, right?
Um, if you talk to anybody who'sunder 20, um, they don't quite
understand this concept ofgetting an email the next day
(58:25):
and they almost laugh at it,right?
And we laugh at it too.
And that's what it is, becauseyou're seeing uh, you know, some
devices where people aredynamically saying, oh, let's
change this route, order this,get my food delivered by the
time I'm there, my Uber isthere.
That is the mindset.
And so I think there's a greatopportunity to modernize if you
also are trying to say, let'stake advantage of these newer
(58:48):
types of consumer-orientedaspects within my modernization,
modernization journey.
So long-winded thing.
So I would say that there arefour phases here.
There's sort of like the, hey, Ijust want to adopt the cloud and
I want to just like migrate tothe cloud.
Then there's, hey, I want toalong the way re-platform
certain things, right?
Redesign certain things.
Then there's the third piece,which is, hey, can I revisit
(59:10):
some of the choices I'm makingin terms of, you know, where am
I doing my analytics?
And the fourth one, obviously,is like, hey, the readiness
towards, hey, the AI workload uhin the cloud.
So what's needed for that?
And I didn't touch on thatearlier, but an example of that
would be if I have a lot ofunstructured document I've
collected over the last 30 yearsas a healthcare organization,
maybe it's per a perfect time toactually create vector
(59:32):
embeddings and stick them intothe newer system so that the AI
services can take advantage ofthat through semantic searches
and so forth through rackpatterns.
And I think that is what themodernization journey should
look like, those four pieces, inmy view.
SPEAKER_02 (59:45):
I know we mentioned
vector indexes a couple of
times.
It is a lot more powerful thanpeople give it credit for.
It is almost magical.
I mean, my anecdotal experiencesit's it's always been sort of
this magical thing that allowsyou to interface with.
With data in ways you didn'timagine possible.
You know, I know it's it'scomparable to to search and you
(01:00:06):
know uh uh I like and and youknow regex-based matching, but
you know, it can really bepowerful when you uh lean into
its its capabilities uh for forclassification and
summarization, sentimentanalysis, things along those
lines.
So like and I think it is goodadvice, like you know, whether
(01:00:26):
you're you know an enterprisethat's taking this 20-year-old
militarized on-prem databasethat can't connect to anything
and just runs its workload.
Now you're moving it to thecloud, you know, as you're going
through that journey, yes, thinklike what's the art of the
possible here?
Because now this now thisdatabase can scale horizontally.
And I have, you know, uh VS Codeand copilot and all these
amazing developer tools to do somuch more with the rich
(01:00:50):
operational data that thisdatabase is handling in real
time, right, through uh throughembeddings or streaming
analytics or AI agents with MCP.
And it's good to kind of getahead of that in the in the both
in the migration process and themodernization design and what
the future looks like.
Uh, because uh the the the mainthing you get there is is just
(01:01:13):
speed of execution and gettingthe ROI of the modernization so
much faster rather than justsaying, hey, we were doing this
on-prem, now we're doing it inthe cloud, and you know, almost
like an apples applescomparison.
Um, you know, it's it's it'sit's really uh a lot more
powerful than a lot of companiescan can imagine, in in addition
(01:01:33):
to getting a much betterdeveloper experience and
internal velocity.
So definitely uh uh great advicethere.
And then, you know, for for forcompanies that are already in
the cloud, you know, uh I Istill see this sort of you know
uh segregation between like theoperational database workloads
and then the analytical AIworkloads.
(01:01:55):
And I think this this it wasreally great talking to both of
you about this because theyreally have to be converged.
I mean, when I I think the magicis really when you combine the
operational workloads with theAI, and we're just scraping the
surface of of agents.
And I and Sharish, like you saidas well, uh MCP is sort of the
bridge between these AI agents,these deep agents that have
(01:02:18):
autonomy and can solve problemswith tools, uh, and get them to
interact with the with the richoperational data, whether it's
the customer data, theuser-facing application, uh, and
and and get the full valuethere.
Because that's that's howcompanies go from having, you
know, the uh and Alokioelaborated on this too, like the
(01:02:38):
you know, the uh the the modernexperiences that people expect,
uh, you know, whether it's it'sa chat interface, because you
know, chat is taking over userexperience in in so many
applications.
Taking your your you know, yourlegacy application, which might
have been built in 2018, uh justlike five, six years ago, and
turning it into an AI nativeapplication.
(01:03:00):
I think that's where engineeringleaders, executives really have
to look at combining thedatabase with the AI.
SPEAKER_00 (01:03:05):
Yeah, and maybe
John, I'll add one quick thing,
which was you know, we wereworking on some some interesting
stuff uh that has to do withgovernance and validation and so
forth.
Um, and this this is my personalrequest to Sharish also so we
can actually think about.
So, one of the things is thenimagine that these systems are
SQL and Postgres and Cosmos DB,right?
And there are there are subsetsof data that are that are
(01:03:27):
supposed to be identical.
So, one of the interestingthings is you know, we have this
validation capability now thatwe're adding to say, go in and
compare these things.
But I don't want to stop there,right?
I want, you know, uh Sharish,you to expose some uh you know,
engine or MCP, but I can alsosay, hey, these things don't
look the same.
(01:03:47):
Were you doing some uh were youresponsible for replication
between these systems?
Uh and if so, can you tell meabout these keys?
Like when was the last time youhad visibility about this stuff,
right?
So this is truly now taking sortof like this isolated
transactional view within thecontext of one system to
conversationally sort of goingacross these multiple agents and
(01:04:08):
truly trying to answer thesekinds of questions, which are
super interesting for forensicsand troubleshooting and
debugging and all of thesethings that you know our
engineers spent so much time ontoday.
Absolutely.
Yeah, so that's I think that'swhat I'm really excited about.
The about that.
SPEAKER_01 (01:04:20):
Yeah, no, especially
on the cyber tech, for instance.
You know, these kinds ofquestions really matter a lot.
SPEAKER_00 (01:04:25):
Yeah, yeah,
absolutely.
SPEAKER_01 (01:04:27):
Cybersecurity, I
meant to say.
Yeah.
SPEAKER_02 (01:04:29):
Sri Stota, Alok
Parik, thank you so much for
joining this episode of What'sNew in Data.
I think it's extremely valuableuh for all the listeners today.
So thank you for generouslysharing all your insights and
experience and and and whatyou're seeing uh through the
work you're both embarking onright now, working with all
these very innovative, uhscaled-out companies uh
(01:04:50):
launching their next generationapplications.
So thank you so much forjoining.
SPEAKER_01 (01:04:54):
Oh, thank you.
Thank you, John.
Thank you, Alok, for giving methis opportunity to join you
guys today today.
I I really enjoyed going deeper,talking about many of these
technical aspects.
Uh, it's an exciting time, anduh and I'm um once again thank
thankful for giving me thisopportunity to talk to you all.
SPEAKER_00 (01:05:10):
Yeah, and and thank
you, John, again, for having me.
And Sharish, it was a pleasure.
Thank you.
And I look forward to uhmultiple future conversations
with both of you guys.
Thanks.