All Episodes

May 1, 2025 41 mins

Spencer Cook, Lead Solutions Architect at Databricks, joins to unpack how enterprises are moving beyond hype and building practical AI systems using vector search, RAG, and real-time data pipelines. He and John Kutay get into what it really takes to serve production LLMs safely, avoid hallucinations, and tie AI back to business outcomes—without losing sight of governance, latency, or customer experience.

Follow Spencer on:

What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:05):
Welcome to what's New in Data.
I'm your host, john Coutet.
In this episode, I'm joined bySpencer Cook, senior Solutions
Architect at Databricks.
We dive into how enterprisesare deploying generative AI and
agentic applications at scale,the critical role of real-time,
high-quality data, and what'snext for streaming and RAG
architectures.

(00:26):
If you're a data engineer orjust curious about building real
business value with AI, thisone's for you.
Let's dive right in.

Speaker 2 (00:42):
Spencer, how are you doing today?

Speaker 3 (00:45):
Spencer, how are you doing today?
Doing awesome.
I've been looking forward tothis, so excited to be able to
do this.
You know, sleep together rightbefore the holidays.

Speaker 2 (00:53):
Yeah, perfect timing going right into the holidays,
wrapping up record-breakingyears for both Databricks and
Stream, and what's New in Data.
So you know just a lot to bethankful for this holiday season
, spencer.
First just tell the listenersabout yourself.

Speaker 3 (01:11):
Yeah, absolutely Thanks, john.
So I have been kind of atDatabricks coming up on four
years now.
In June Before that I waspretty deeply involved in the
Azure data space generically andat the time there was a lot of
web apps, but also datamanagement and obviously Azure

(01:31):
Databricks, and so that's beenmy background and that's a lot
of what I do at Databricks.
I mostly just help ourfinancial services customers in
particular leverage our platformin the cloud to solve their
problems.

Speaker 2 (01:47):
Absolutely, and you know this is going to be a fun
episode because I love havingguests who are always really in
the weeds of things and canspeak to real world experiences,
either building these tools forthe product or building these
tools in the field for customers, where it's really deployed in
the real world.
Both of them have their ownunique challenges and yeah, yeah

(02:11):
, so you've done some awesomework, specifically with LLMs and
actually getting business valueat large enterprise scale.
How are you working withcustomers to actually innovate
with ai and lms in a way that'sshowing visible business value?

Speaker 3 (02:31):
yeah, so like there's a lot of talk about the types
of use cases that we're seeingbusiness value from, I think,
with lms, and it's a lot of it'saround information retrieval,
but then also coding assistancegenerically, and you know that's
all fine, but getting there isreally challenging still, and it
tends to be the same processthat we saw with traditional ML

(02:56):
or even doing analytics, where alot of it is around getting the
correct business processes inplace, cleaning and organizing
your data and basically gettingreliable data into the right
systems at the right time to beable to do these more
sophisticated techniques.
It's almost like LOMs are thedessert and you stop like eat

(03:19):
your vegetables and your proteinand that's the nice reward at
the end.

Speaker 2 (03:25):
Yeah, that's a great analogy.
The making the data accessibleto the LLM and you know that's,
you know, sounds easier saidthan done.
Really, everything around largescale data management, indexing
the data, make the data,chunking it for LLMs and vector

(03:48):
storage and all thesechallenging things that go on in
the background to really makeit so the data can bubble up to
the user as a chat-drivenexperience.
We all know chat GPT, but a lotof enterprises are trying to do
some form of chat withenterprise data in one way or

(04:08):
another, either internally forknowledge bases or externally
for support or customerexperiences.
Like what are you seeing interms of real world adoption
there?

Speaker 3 (04:22):
Yeah, I think that there is real adoption.
Um, we are you know, Ipersonally have been involved
with a lot of new states thatare in production in various
stages throughout this year um,I think that a lot of what
companies are trying to do isfigure out basically what they

(04:43):
can do, take value while stillprioritizing safety, and not
like putting themselves kind ofin a position where they're over
their seats and they're like onthe front page of the Wall
Street Journal kind of a thing.
We even like in a lot of ourpresentations internally, when
we talk about data intelligenceplatforms, we show the use case

(05:06):
where the guy was talking like aFacebook chat for a Ford
dealership and he's like youknow, you're going to give me a
car for $1, you know, doingbasic, prompt engineering, and
so I do think that that causedsome initial skepticism and
maybe even fear.
But we, because we have kind ofestablished primitives for

(05:27):
things like guardrails and, uh,putting security and governance
on top of loms, I do think we'rekind of in this second wave of,
you know, enterprise adoptionand we're kind of doing it in
the correct, reliable skillthat's excellent.

Speaker 2 (05:44):
what are some of the things that are required?
Like you mentioned, you know,ai can hallucinate and give
users the wrong answer and makethem think they can buy a car
for one dollar, as an example,you know, which is definitely in
the realm of probably, you know, uh, relevant possible outcomes
, especially when you're dealingwith these kinds of
probabilistic models.

(06:05):
So now to make this real, like,how, how do customers deploy
LMS and Jenny I in a way whereit doesn't hallucinate,
especially with with customers,uh, as the end users?

Speaker 3 (06:18):
Yeah.
So, uh, a great uh use casethat a lot of us have, I think,
been accountable for is likewhere, say, the CEO is your
customer or some like internalleader answers where a ai
engineer and inside of thisgenie space basically ask
prompts, get answers back, etc.

(06:51):
And as it's generating sql codeto answer these questions, when
you say, hey, that's spot on,that's how you calculate fiscal
year over year, you can certifythat as an answer.
And so when someone gets back aresponse later on a
non-technical user, that showsup as certified and kind of

(07:12):
stamped with the seal ofapproval that came from that
person.
But the LOM is stillcontextually applying it to
questions that come up, applyingit to questions that come up.
And so if you think aboutapplying that to external users,
customers can widely vary Likea CEO tends to be, at least like

(07:35):
an authority on, like thedomain of the company or
whatever else With yourcustomers.
That can vary a lot.
So basically, you use standardMLOps techniques that have been
augmented for LLMs to collectthat type of data and basically
pursue those kind of correctanswers for all the different
domains, and so we can do a lotof things to help with that,

(07:56):
like help you generate syntheticdata, help you track all these
different responses that aregoing back and forth.
But that's generally the ideaall these different responses
that are going back and forth.

Speaker 2 (08:07):
But that's generally the idea.
Yeah, then this is one of theareas where people are trying to
align on best practices for LLMops, and some of this already
exists, with borrowing conceptsfrom ML ops.
I had two great guests who gotinto the weeds of this On
previous episodes Avi Aryan, whopublished a book on LLM ops,

(08:30):
and then Andy McMahon, whopublished a book on machine
learning operations with Python.
And there are so many ways thatyou can always sort of fine
tune and come up withdeterministic filtering for LLM
results.
Some of the vector databasesalso have capabilities to

(08:52):
pre-filter results in adeterministic way.
Before doing the LLM-based,whatever it is, nearest neighbor
search on the vector formatwhich can give you the
directional correct answer onthe vector format which can give
you the directional correctanswer.
So enterprises really have toarchitect their solutions with
some of those best practices inmind, especially before rolling

(09:15):
it out.
So when you're working withcustomers on this because you
have all this extremelyinnovative technology all under
the Databricks umbrella withwith ai and data management like
, how, so like these lm basedapplications are, you know
they're only as good as the datathey're.
They're trained and run on.

Speaker 3 (09:32):
So how can companies use their data to to power real
customer facing ai operationsyeah, I mean, uh, I I think we
need, like probably a couple ofsets, like maybe we'll do a
series on that one of thesethat's like at a high level.
I think that by creating whetheryou call them data products,

(09:56):
whether you call it, you know, agold layer at the end of the
day feature store tables, vectorstore databases they're very
similar to facts and dimensionsfrom the old days with BI
reporting, and so you wantreliable, scalable processes
that can basically hydratewhatever that layer is, with

(10:19):
reliability and also ideallyreally low latency, viability
and also ideally really lowlatency, because once you get
the data quality problem solved,then all of a sudden people
start being worried aboutfreshness.
So they're like this data isgreat, can you get me more of it
sooner?
And so what we see with LLM iskind of interesting in
particular is you take thosesame processes, maybe they're

(10:43):
well-established for relationaldata, like it's easy to write a
merge condition, but now all ofa sudden, we're managing
documents, right, we're managingimages and video and all these
different things.
So one of the aspects that hasbeen interesting about Spark and
Delta Lake which a lot of ourplatform is still based on, that

(11:04):
drew me to it early is theabstraction that we can provide
over those things when it's likea document.
Let's extract that into bytesor let's extract it into tokens
and a column, and then we'regoing to apply the same kind of
primitives in Databricks that wewould do for other types of AI

(11:25):
processing in the past.
You're kind of standing on theshoulders of giants not
developing a whole brand newprocess for this new type of AI.

Speaker 2 (11:36):
Absolutely so.
What you're really alluding tois that a lot of these
fundamentals and data management, data processing and things
that a lot of data warehousearchitects have applied for
years is that is also applicableto high quality, ai on accurate
data, because the data can bepre-aggregated, pre-filtered,

(11:59):
cleansed all the things that youwould do going from like if we
were to talk about it in thesimplest form.
You mentioned facts anddimensions, the you know uh,
which is a great way of uhorganizing your data warehouse.
You can do things like starschema.
You can do things likemedallion architecture, which is
very popular.
Uh, you know, basically, youhave your, your raw tables,

(12:19):
which you know those might beyour normalized data database
tables, your, your raw datacoming in from APIs, those
documents, things that are justcompletely not really queryable
from an analytical perspective.
They're more representing theunderlying application they were
sourced from.
You're pre-aggregatingcomputing, running some compute

(12:44):
to filter that data, to join it,to make the columns make sense,
add the right metadata and,ultimately, this is something
that you can throw in L.
You turn it into some datamodel that you can throw an LLM
at and an LLM will be able tomake sense of it because the
column names will be humanreadable.
There'll be metadata there.
All that is really positive forthose who have strong data

(13:09):
engineering fundamentals, bothindividually and as a company.

Speaker 3 (13:16):
Oh yeah, I mean, I think, one of the cool things
about our conversations over theyears.
I think we did one of our firstwebinars a couple months after
ChatGPT was released, and sowe're talking about, you know,
the things that Stream canprovide, the things that
Databricks can provide, towardsthings like DI or analytical

(13:37):
reporting, ml, and now it's kindof all the same stuff that's
pulling us towards LOMs.
I think what really drives thishome is, with agents, which is
this new trend that's trying toreplace RAG as the hot
architecture.
You need to be able to pullanswers from not just your

(13:59):
documents, but also from yourold analytical reporting, like
you might need a SQL query tohelp an agent answer a question,
and so all that stuff that wasold school now is a part of your
LLM system as well.

Speaker 2 (14:15):
Yeah, absolutely, and I came across a really
interesting post on LinkedInactually from Eric Elson, and
he's done some incredible workwith AI on LinkedIn actually
from Eric Elson, and he's donesome incredible work with AI and
LLMs.
And what he said in hisLinkedIn post is the returns

(14:35):
from increasing compute andmodel size tend to be
logarithmic.
That means that the next 1,000xwill give us less than the last
1,000x and much less than thelast a hundred million X.
But all is not lost.
Models will continue to getbetter at being useful to humans
.
In fact, we've only starteddown this path.
We're entering an exciting erawhere the most interesting
problems are at the interfacebetween the models and the real

(14:57):
world.
And he wraps it up by sayinghe's excited to work on these
problems at Databricks.
So it is really all comingtogether where the inference
time is where a lot of theinnovation and IP and deployment
will be focused on at thispoint.
I think this does tie back towhat you're mentioning around

(15:22):
making sure the data is properlymodeled in a way that LLMs and
even small, more preciselanguage models can work with
and essentially come up with theright way to summarize or
action or generate text fromthat data.

Speaker 3 (16:07):
Absolutely.
I think one of the challenges,like maybe a next kind Excel
report to agree with the entireBI report knows how complicated
that is.
But at Databricks we have atool called AIBI that
essentially is trying to threadthose two things together to
deliver that kind of anexperience.

Speaker 2 (16:25):
Yeah, tell me more about AIBI.

Speaker 3 (16:43):
Yeah, tell me more about AIBI building visuals.
What's interesting is, when youclick on any of the visuals,
it's a text-based prompt so yousay this is the data I want back
and it decides whether to do abar chart, line chart, whatever
it writes the sql aggregationfor you.
There's no you know dax or mdxhere.
And then what's reallyinteresting is, once that's done

(17:07):
, you get a button that allowsyou to just ask questions of the
same data model in like anatural language and that's what
I was alluding to earlier wherethose responses you can certify
, you can build them intoreports of your own the same

(17:32):
data model that's backed by youknow catalog.
You get this ability to askquestions in both a natural
language frame or a traditionalBI frame and they're both going
to return the same answersagainst the same data.

Speaker 2 (17:40):
Yeah, this is mind-blowing in terms of the
amount of innovation here, wherenow, with applying AI to BI,
you're able to automate moreinsights.
And you know I think there'sbeen a few people comment on
kind of like point and click biexperiences might get replaced

(18:02):
by more low code.
Have gen ai go, you know, take,take a natural language query
and just run it against my dataand then give me the results in
a way that you know I can bestinterpret it, either via charts
or just summarizing the data forme in plain English.

(18:23):
And it's sort of one of theinteresting areas I will
continue to see evolve.
I don't think the standardreport is going away, but when
people actually look at reports,the follow-up question is
always you know, what does thisdata actually mean, right?

(18:44):
So I think having the AI BIexperience where it can just
summarize the data for you andsummarize the insights and and
give add perspective to it,that's actually something that
lms are are good at, assumingthat the underlying data is has
all the context that it needs,which is the tricky part, which
you know.
Of course, data bricks and andstream both work on this problem

(19:06):
respectively in different ways.
So that goes to my nextquestion there's batch-driven AI
models, and then you havecontinuously updated real-time
AI systems that use incrementalinference on data that's
constantly changing.

(19:27):
I'm curious to see what you'reseeing out in the field in terms
of what's being deployed andbest way to work with real-time
data versus batch data for AI.

Speaker 3 (19:41):
Yeah, I always love the batch versus real-time
question.
I hope that we always kind ofget to talk about this within
the data space.
Batch for AI is something thatI don't think is ever going to
go away, because there are a lotof use cases, almost like in
the customer 360 or experienceenrichment space, where, like

(20:06):
they don't change thatfrequently, but you need a lot
of them.
So maybe you have 300 millioncustomers.
You want to give them all acustom, you know background when
they log in that doesn't needto change hourly, right, maybe
it doesn't even need to changeweekly.
So, like one of the things thatdatabricks that we just
launched, um, was a much fasterbatch inference system behind

(20:31):
our AI query tool for DatabricksSQL, and so basically what it
allows you to do is take anymodel against a giant batch of
data and just do inference in ahighly accelerated concurrent
way Classic Spark, right, and sowe're still excited about that.
We're still innovating in that.
We also were one of the firstcommercial platforms to offer a

(20:56):
continuous find, which basicallymeans you have a model like
llama, whatever else.
It kind of knows uh, all theall about the internet, all this
public vocabulary.
Hopefully it doesn't know aboutyour IT and so your subject
matter expert terms.
Yeah right.
So basically in say, like aninternal chat, you can be

(21:18):
feeding it tokens and vocabularyand project names and sprint
names and all this differentstuff from your internal system,
and it just is picking that upas vocabulary, almost like just
how we pick up language in thereal world.
So I think you'll see bothcontinue to be relevant.

(21:39):
But Databricks has some prettycool innovation happening in
both areas too.

Speaker 2 (21:45):
Yeah, absolutely, and the continuous fine fine tuning
is certainly an area that datateams are continuing to explore.
The interesting part here is,with RAG, you can have these
larger and larger contextwindows for the model to work

(22:06):
with and, on the other hand,what you can do is fine-tune and
update the model directly, soyou see cases like that and then
you can update the data that'saccessible to the model.
So you know like, for instance,when we work with UPS, they're
working with streams who havesome AI-driven policies to

(22:31):
protect packages that aredelivered, which ultimately
comes down to better customerexperience because the package
is more likely to arrive attheir doorstep.
They branded the solution asbattling porch pirates and
battling porch pirates, andthat's one of the examples where
they're taking the real-timeshipment data, claims data you

(22:53):
know other types of data pullingit into their delivery defense
system, which is powered by AIand that has to rely on the
real-time operational datacoming in.
So that's one of the otherthings that you know from their
perspective.
You know they work with Streamto just get it in, make it, pull
in the new data and make itaccessible to the model, rather

(23:14):
than fine tuning the modelitself.
It's kind of it's a form of rag,right?
Yeah, We've augmentedgeneration.
So lots of interestingapproaches.
What do you see as being mostadopted from your perspective?

Speaker 3 (23:29):
You know, rag still continues to dominate.
Um, I I do know of a lot ofcustomers that are starting to
adopt agentic systems.
Um, we have, uh, some publicuse cases I think it was at day
that I summit where we'retalking about query language,
where it's basically this youknow, know model that does text

(23:52):
to SQL, but for theirproprietary query language
within the Fax tool, and they'rebasically combining commercial
frontier models with morefoundation models that are
fine-tuned on their tokens, likeI alluded to earlier, where it
knows their different types ofexample queries and they're

(24:15):
combining that in an agentic wayto basically leverage best each
type of system and ecosystem.
So I think that that willcontinue to rise To your point,
though RAG is kind of.
It never really went away, butit's having a new emergence in
the way that we can add newtypes of augmented data.

(24:38):
So, instead of, like you said,not everything has to go through
a vector store, that canactually be kind of bad for
latency.
With the ability to querydirectly off of a stream, for
instance, you don't necessarilyneed to know the vector
embedding relationship betweenthose terms.
They're in the same topic on astream and there is a natural

(25:02):
kind of temporal relationship inthe streaming events, and so
you can pull something like themost recent customer in the last
five minutes directly into aRAG response without it needing
to go through a vector databaseand all these different things.
So I think that, to your point,rag and streaming are a really

(25:22):
cool match and I think we'llcontinue to see innovation in
that area.

Speaker 2 (25:27):
Yeah, of course, of course.
And you know, and we're alwaysexcited when we have a joint
customer that's using bothStream and Databricks,
supporting hundreds of streamingconnectors into Delta Live
tables, which is a great way toingest data in a way that tracks
incremental changes and bringit into the data lake and then

(25:49):
from there, you can run all yourprocessing within Databricks,
like we said, prepare that datafor analytics and AI use cases,
do all the joins, the datamodeling, setting up your facts
and dimensions, setting up yourgold tables, your platinum
tables Big quality roles inthere, absolutely A lot roles in
there, absolutely A lot ofpowerful stuff.

(26:12):
So the other thing I want toask about is have you seen
examples where stale data canimpact AI accuracy and decision
making?

Speaker 3 (26:24):
Yeah, uh, say that one more time uh, stale data
meaning that's oh stale data.
Sorry, I heard sale.
Yeah, stale data.
Yeah, no, you're, you're great.
Uh, stale data.
I I'm sure that there is, uh,you know quote like from mark
twain or something about this,but it's like misinformation can

(26:45):
be more dangerous thaninformation, right.
Where it's like, if anything,if you're going to return a
stale result, it would be betterto just say I don't know, I
need to refresh, right.
Oftentimes they're giving thewrong answer, and so I think in
a normal data engineering system, normal analytical process, you

(27:07):
might even design it that wayLike you might have time to live
things like that Often in LOMs,where you're presenting just a
blunt paragraph of informationto the user.
That's context-free, right andso I think stale data can be
even more dangerous than everwhen we think about these types

(27:27):
of AI systems.

Speaker 2 (27:31):
Yeah, definitely.
And especially if you'relooking at customer-facing
experiences where someone wantslet's just take airlines, for
example I made a change in myreservation and I want to go
into a chat experience and say,hey, give me my latest boarding
pass, right?
And if, whatever the chatexperience is behind the actual

(27:55):
core reservations database, it'sgoing to give them the wrong
boarding pass or give them somesort of error, right?
So this ultimately you know,stale data ultimately
materializes in poor customerexperiences, especially when
it's more operational and theanalytical use cases where
you're summarizing your annualsales performance.

(28:16):
Sure, of course you want tomake the right trade-offs and
say, okay, there we can rely onmore batch processing and less
on incremental data.
So it's always about looking atthe use case.

Speaker 3 (28:33):
Absolutely incremental data.
So it's always about looking atthe use case absolutely and
like people uh get excited aboutthings like lineage and audit
trails and this different stuff.
I think a lot of that it's goodfor uh compliance.
Hopefully you're only gettingaudited, you know, once a year
like in a predictable manner,unless things go south.
I think day--to-day kind ofoperationally lineage is really

(28:53):
powerful because you can betterunderstand the relationship
between tables.
But if something is stale youcan kind of understand the
production line upstream andhopefully optimize it, work with
that team to get your datafaster and things like that.

Speaker 2 (29:11):
Yeah, absolutely.
And the other part of AIbringing it into enterprises is,
of course, another foundationalelement, which is data
governance.
So how is data governanceimpacted by AI?

Speaker 3 (29:27):
Yeah, I think you can see it as like a blocker or an
accelerator.
In my opinion, it can really bean accelerator because, like an
engine, when you have lineage,when you have an audit trails
how people are using all thedifferent objects in your data
platform.
In the Databricks case, likethis isn't easy, but in

(29:51):
Databricks we make it easy.
You can basically create whatwe term data intelligence, where
you're combining that metadataas basically another rail of
information that goes into anLLM.
So when you ask the assistantlike, hey, find me a table about
X, it knows that you're part ofthis team, it knows the other

(30:16):
tables you've accessed in thetask, etc.
So we try and bring all thosethings together.
But I think it also has made itso that we can innovate in areas
that a lot of maybe AI tends tostruggle, which is like
regulated industry.
I was just at Money 2020 acouple months ago I think now

(30:38):
and MasterCard did a great pressrelease where they have a Gen
AI assistant platform helps outcustomers with onboarding.
But one of the things theyhighlighted was basically, by
deploying it on Databricks, theyhad more comfort with things
like governance and that allowedthem to basically innovate

(30:59):
faster because they were able tobe comfortable with governance,
being kind of a first-partycitizen in that data
intelligence model.

Speaker 2 (31:11):
Yeah, that's a super powerful case study.
And, of course, money 2020 isone of the great events in the
finance industry.
I believe it's in Vegas thisyear, right?

Speaker 3 (31:20):
It's always in Vegas, like the big Venetian Hall, and
it's so interesting.
It's like the kind of way tosee what the sort of fintech
meme of the year almost is,where you can kind of see the
trend that everyone's going tobe talking about in 2025.
That was sort of the hot topicof, you know, money 2020 and

(31:42):
2024.

Speaker 2 (31:45):
Yeah, definitely going to be an area where we're
going to see continuedinnovation and adoption in the
enterprise and especially,bringing business value back.
That wasn't possible before.
Ai just the speed at which youcan iterate and launch
transformational products withdata and AI.
And then, yeah, making iteasier than ever for data

(32:11):
engineers to really haveleverage and bring value to the
business, because before dataengineers were mainly focused on
bringing reports online.
Now it's these AI drivenapplications.

Speaker 3 (32:22):
It's so funny you bring that up.
I was a data scientist bytraining but I kind of, you know
, got into the world for thefirst time where you have, you
know, dirty data it's not, youknow, being prepared by your TAs
, you know, and turns outthere's way more data
engineering problems than therewere like ready to be action on

(32:46):
data science problems.
If you look, say, 10 years agoto be actioned on data science
problems.
If you look, say, 10 years ago,and so that's kind of what got
me into data engineering and alot of these areas like
governance in the first place isto try and create that you know
era where we can finally dodata science at scale.

Speaker 2 (33:01):
you know, like we've kind of all been dreaming up for
a while yeah, and definitelyaugmentinging all the human work
required for data science andI've worked so many data
scientists who just end up beingdata engineers because that's
where 99% of the work isactually required to make the

(33:22):
data usable, you know so thiscan only be accelerated by AI.

Speaker 3 (33:29):
This can only be accelerated by AI.
Yeah, I think we had a prettyinteresting article like a year
ago and then we updated itrecently.
It's basically like what dothese assistant tools and kind
of what does a Databricksassistant mean for data
engineers?
It is pretty interesting.
I really think that it's notdisrupting people's careers.

(33:51):
I think that, if anything, thislets you focus on this stuff
that you care about as a dataengineer instead of the stuff
that you would consider likeoutside your role or like
boilerplate.
So I think you know my adviceto any sort of practitioner
working with these tools is likeembrace it, use it as a way to

(34:12):
accelerate.
You know what you're doingalready, but they can really be
great tools.

Speaker 2 (34:17):
Absolutely, and it's going to be really cool to see,
especially how data engineersare able to move fast and adopt
these AI-driven workflows.
Move fast and adopt theseAI-driven workflows it does
sound like a lot of theinnovation ahead of us is going
to be mainly centered on theactual insights and getting the

(34:41):
actual value, either throughapplications or data science,
and finding these insights onvast amounts of unstructured and
structured data that's hard tomake sense of.
So, yeah, it's a super excitingtime right now.
I think we'll definitely, if wedo this podcast again in

(35:02):
another four months, we'llprobably have tons of other
exciting stuff to talk about.

Speaker 3 (35:08):
Last time we yeah, last time we talked it was on
the TWI webinar, and I thinkthat might have been six months
ago, and even since then thingshave changed so rapidly oh, it's
kind of ridiculous if youactually take yourself back, um,
you know I I try and not stopand think about it and just
enjoy the ride, right, but Iknow, especially at Databricks

(35:32):
here in the next four to sixmonths, you guys wouldn't
believe what we are cooking up.
So it'll continue to beexciting coming from our side,
for sure.

Speaker 2 (35:43):
Yeah, absolutely.
And now when we work with jointcustomers, it used to be all
about getting data into the lakeand making it accessible for
analytics and reports.
And there's amazing use casesthere as well.

(36:05):
And they actually showed thereports.
The executive facing reportsevery airport and the average
maintenance time for aircraftsthere.
It's really their operationaldashboard for tech ops and
flight operations and that's allpowered by you know, streaming
the data into the lake house andrunning those reports on top of

(36:25):
the Databricks warehouse andthat's powerful.

Speaker 3 (36:31):
It's a total.
Yes, sam, like the way I try tosave it at people that have
budget power is like they're,you know, think about.
Like you know, you can't createor destroy math, right?
So your IT budget, let's sayit's fixed, right?
Well, the only way that you'regoing to come up with money to
spend on all these awesome newgpu hours you need to, you know,

(36:54):
host all these amazing models,is by freeing up budget kind of
elsewhere.
And so I think we do thatthrough the stuff you just
alluded to continuing to pursuelakehouse architecture, continue
to replace, you know, stale outof the etl systems with modern,
you know, streaming uhpipelines on streamless data

(37:15):
bricks, like.
I think there is a lot ofpotential there.
And the point of all of it,it's not just like taking that
money and putting it in the safe, it's taking it and putting it
right back in the coffers foryour LOM projects and using that
as an accelerant.

Speaker 2 (37:33):
Yeah, absolutely, because there really is a lot of
ROI for these AI-driveninitiatives like what you were
alluding to.
Especially, the time to valuefor these data engineering and
data science projects issignificantly accelerated.
And then, of course, the art ofthe possible thinking of what

(37:56):
you can actually do withcustomer-facing experiences to
innovate there.
I mean, I know it's very early,but you look at, like Apple
intelligence and people aresaying people have opinions on
how that works at this point.
And people are saying peoplehave opinions on how that works
at this point.
But if, when Apple isrethinking their whole UX to be
around intelligence and AI andpartnering with OpenAI and

(38:17):
ChatGPT to make that a nativepart of their product, that's
the future of UX.
Right, it's AI-drivenexperiences.
Now, every company that has amobile app, every company that
has a website, you know in everyindustry.
Now you have to think about,well, what's my ai, intelligence
driven, uh user experiencegoing to look like?

(38:38):
Because that's what everyone'sgoing to rely on in the next few
years absolutely.

Speaker 3 (38:43):
I think apple, uh, is kind of validating our approach
because in their case, it'sit's you know kind of validating
our approach because in theircase, it's it's you know kind of
your local device.
It's not a data lake or a VPC,but this idea of taking a
foundation models and frontiermodels, combining them with your
private data you know local wayand using that to provide

(39:04):
enriched intelligence.
That's, that's a dataintelligence platform.
So you know, I'm a I'm superexcited to kind of have that in
my pocket as well.
It will be validated betterperhaps.

Speaker 2 (39:17):
Yeah, yeah.
So now, similar to howenterprises had to build their
iPhone and Android apps andtheir web experiences in the
time of the internet, nowthey'll all have to build their
intelligence-driven userexperience, and that's where
these AI data platforms likeDatabricks and then ingesting

(39:40):
the data with Stream, it's goingto be table stakes in every
enterprise architecture.
So it's an exciting time.

Speaker 3 (39:49):
I think we are very close to you know, in the
marketplace Databricks.
As a marketplace, you also haveApp Sounds similar to App again
, right, and so I think we'revery close to this reality where
you, you know, click a buttonin the marketplace, it spins up
an application that's pointed tosomething like Stream on the

(40:09):
backend and the customer allthey see is this awesome AI
experience on the back end.
It's connected to this rich,fast data.
They don't need to care aboutthose details and they're just
off to the races.

Speaker 2 (40:23):
Yeah, 100%.
It's going to be an excitingfuture for sure.
Spencer, where can peoplefollow along with your work?

Speaker 3 (40:32):
Yeah, absolutely so.
I'm most commonly posting onLinkedIn.
You should just be able to findme, spencer Cook, databricks,
and then I also contributepretty heavily to the Chicago
Databricks user group.
So you know anyone in theChicago and region.
Definitely come check out ourevents and then I'm always at.

(40:54):
You know, data and AI Summit,money 2020, the big show.
So please, you know, reach out.
It would be great to you know.
Chat more about these topics.

Speaker 2 (41:03):
Spencer Cook, great having you on this episode of
what's Soon Data.
We'll have the links out toyour linkedin and for people to
follow along with your storygoing forward, and thank you to
the audience for tuning in yeah,thank you everyone.

Speaker 3 (41:17):
Thanks john.
Advertise With Us

Popular Podcasts

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Therapy Gecko

Therapy Gecko

An unlicensed lizard psychologist travels the universe talking to strangers about absolutely nothing. TO CALL THE GECKO: follow me on https://www.twitch.tv/lyleforever to get a notification for when I am taking calls. I am usually live Mondays, Wednesdays, and Fridays but lately a lot of other times too. I am a gecko.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.