EP. 262 Solving Unstructured Data Challenges with AI & Vector Search

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:06):
Hello and welcome to MongoDB Podcast Live.
I'm Shay McAllister. I'm one of the leads on the
Developer Relations team here atMongoDB.
Imagine dealing with 11,000,000 land documents, historical
records, contracts, leases, all scattered across different
formats and different systems. The challenge is making sense of
all that unstructured data efficiently without sinking of

(00:30):
millions of dollars into manual processing.
That's where AI and large language models and MongoDB come
into play. So let's get started and to
discuss all of this. I'm really pleased to welcome
Alex and Andrew from oxy.com onto the show and to unpack how
they did this and why they choseMongo DB and what it means to

(00:51):
the future of AI powered document management.
Alex, Andrew, you're both very welcome to the show.
It's great to have you. Thanks Shane for having us all.
Thanks to you and Mongo DB for having us in.
No, it's great. It's great to have both of you
join. It's great.
As I said, when we were preparing to have you both in
the same room, it's it's always much easier to deal with two
guests in that. And before we get started, I

(01:12):
love that we have both of you here.
You know, could you pick Andrew or Alex to go first to talk a
little bit about your career path to date, you know, what you
studied, how you got here and how you ended up at Oxy?
Yeah, I actually started at Oxy probably 10 years.
Ago just about. 11 years ago, undergrad at the
telemachineering at the University of Texas at Austate.

(01:34):
Pretty early on in my career, I,I went more towards data and
less towards, it's kind of the traditional petroleum reservoir
production engineer of oil and gaps.
And so I, I kind of managed datafor most of my career.
About six or seven years ago, I did a master's in computer
science because I felt like thatwas really where my interest

(01:55):
was. And since then it's been kind of
a journey. I've done a little bit of more
like embedded systems, microcontroller stuff.
And then a couple years ago I joined Alex's team and Alex runs
the AI team here at Oxy. I'll let him take over.
Sure. So much like Andrew, I'm kind of
the confused engineer slash datascientist archetype where yeah,

(02:16):
major petroleum engineering at University of Tulsa and then
ended up working for Oxy in 2017and increasingly got more of
these data focus, AI focus role.So, yeah, now the team that
Andrew and I work on, it's it's an OPS focused AI development
team and Oxy and we get to have a lot of fun with with
applications mainly on the oily gas like kind of production side
of things, but occasionally dabbling with other departments

(02:38):
like like legal or land and thiscase that we're sharing today.
So yeah, excited to share the work.
I think, you know, the AI space people disagree about whether
it's over hyped, under hyped, but I'll just say this, it's a
lot of fun to work in. So we've been, we've been having
a good time. Last year or?
Two excellent. Thank you, Alex.
And I suppose look, before we dive into that and the problem
that you're solving and the reason why you're you're on the

(02:59):
show here today. Like I suppose people consider
the oil and gas industries, as you know, Heavy Industries,
large machineries, large refineries, distribution, all of
that sort of stuff. They wouldn't particularly maybe
think of it as using data, etcetera, etcetera, bar where to
drill or what to do or minerals or geological surveys, etcetera.

(03:19):
Tell us a little bit about the Oxy as a company and what it
does and, and I suppose the industry as a whole and, and how
you know it's using digital these days before we dive into
the specific solution that you've built.
Maybe I'll take a quick pass andthen if you feel on what I miss,
but I'll just say I mean, oil and gas has always been
technology's been really important.
Oil and gas especially you look at, you know, the history of

(03:40):
seismic data processing and someof the original large, large
scale data processing projects were were usually seismic focus.
So OXY in particular has taken areally aggressive stance towards
innovation and AI in general. So there's a couple fronts. 1 is
OXY has I think the most aggressive direct air capture
effort of any, any oil and gas company right now.
So obviously that's out of scopefor our talk today.

(04:02):
But I mean, if you check on the website oxy.com, you'll see some
really phenomenal work happeningin the technology space there.
And then second, with AI, Oxy has been really, really with, in
terms of its, its aggressivenesswith applying AII think our
team, you know, some of the workwe're going to share today as
part of that. And obviously there's a lot of
other work besides that, that, you know, we'd love to talk
about another time. But I, I think Oxy and, and

(04:22):
certainly I, I agree that the AIis going to be one of the key
differentiators. When you look which oil
companies are going to be the most successful three to five
years from now? I think it's going to be the
ones that were most successful in applying this to, to
streamlining the work we do and proving the work we do,
increasing safety, reducing our cost per barrel.
So so I think Oxy's got the right.
Attitude towards this and I think that kind of bears out in

(04:42):
the work that's going on here. Yeah, I think Alex pretty well
covered it. Just more on the data front,
there's a lot of data we gather aside from the seismic together,
tons of SCADA data. So every single well is equipped
with, I don't know, probably on average 5 to 10 sensors that are
all gathering data in real time.Facilities has orders of
magnitude more sensors than thatgathering data in real time.

(05:05):
So yeah, I, I think the, the amount of information being
captured is often underestimated.
And then if you get into drilling, it's like just massive
amounts of data for any given well that you drill.
So yeah, oil and gas, I don't think they're a stranger.
To. Large pond use of data.
I think they've traditionally been viewed as maybe definitely
not on the, you know, the frontier tech, but there's a lot

(05:27):
they do that's that might surprise you.
Excellent. Then look, I know we're going to
delve into that as well too. And thanks for everybody who has
joined us on the stream as well.We see from folks from
Minneapolis and Pakistan etcetera.
And Jeff gives you a shout out there, Alex and Jeff Schmidt who
obviously you know, so look, it's great to have everybody
join us as well too. Size and scale of Oxy as a

(05:48):
company, how big is the company and how big is is kind of the AI
team that you guys are working on?
Sure. So I mean, First off, he's been
around for over 100 years now. I've got quite a quite a history
of international operations. I couldn't even count how many
countries I think today we operate in maybe 5-5 or so.
Don't, don't quote me on that, roughly 5 or so countries.
But in terms of scale, what 15,000 employees or they're

(06:11):
about? It's a large.
Fortune 100 company significant,significant oil and gas
presence, especially these days in the Permian Basin, Gulf of
Mexico and and and DJ Basin in the Rockies.
So yeah, Andrew, anything? Yeah.
As far as production, if you're familiar with it, we're about
1.4 to 1.5 million Poe per day. Are we the biggest Permian

(06:31):
producer? Yeah, we're either close to or
or the biggest first. Or second Permian operator.
OK, OK. Yeah, we do a lot.
On shore US, I'll say one thing about Oxy, I think for those
that are joining that are outside of oil and gas, I think
the oil companies we know about are the ones that we buy our
gasoline from. And so a lot of the times Oxy
doesn't I mean Oxy doesn't do that, right.
There's no, no downstream providing selling the customers.

(06:53):
So I think that kind of reduces our our our public presence for
those outside of the industry unless they happen to attend an
Astros game, in which case you can't miss the Oxy logo on the
field there. But but yeah, that's that's a
little bit Oxy. So Oxy's big enough to be a team
sponsor then. So that's a good way, a good
bracket to put things into as well, too.
And I appreciate you, Alex, not courting controversy there and

(07:13):
calling it the Gulf of America. You called it the.
Gulf. Oh goodness, it's all good.
That aside, I didn't even think about it.
Perfect. And listen to the others that
are joining us from from Spain and from India as well too.
It's great to have you all on board.
So look, at the beginning. I said, you know, in the
introduction I talked about 11,000,000 documents, et cetera.
You walk us through kind of the challenge that you are faced.

(07:36):
When did this start? When, you know, everybody kind
of considers, yeah, well, the public per SE considers, you
know, AI engine, AI and large language models having been born
in November 22 when ChatGPT cameout to the public.
But obviously it's been around for a long, long time.
So tell me a little bit about the pre solution that you have.

(07:57):
What was what did that look likefor the folks at Oxy for all of
these documents they have beforewe get into kind of your
approach to to remedy in that and make it easier for them?
Yeah, this might be a good chance to share kind of a visual
because, you know, I come out onpodcasts and I talk to you about
our document problems and you might be thinking of something
different than what I'm actuallytalking about.

(08:17):
So you might be thinking of it'skind of document.
Right. Yeah, yes, yeah.
We we we know what documents arehere, but they're not the
documents here. Right.
So you know, the documents we'rehere to talk about is actually,
it's this kind of document. So it's a scan, a paper
document. So like I said, Oxy's been
around for over 100 years. So we've accumulated quite a lot
of documents, paper documents. And some of them look like this.

(08:40):
This one's pretty, pretty easy on the eyes.
It's it's just a classic lease agreement.
Then I might show you this one abit harder to read, a bit crusty
looking maybe. And then I could show you some
other examples like this or thisgetting a little bit harder,
right? Or this or this or this.
One's one of my favorites. I like how the hand actually
made it into the scan. An important part of the scan?

(09:02):
Yes, definitely. Oh wow.
OK. And then you get the sense we
actually have a lot of these. So this is, this is 1 hall, one
kind of row of I think there's maybe 40 of them in this
particular file room. And there's a dozen or so far.
And then a fun, you know, that'sactually, that's me at the end
of the hall there. And I'm, I'm short, but I'm not
that sure that's, that's, it's alot of documents, suffice to
say. So kind of laying the
groundwork, obviously Oxy isn't unique in this.

(09:25):
Every oil and gas company that'sbeen around for decades or 100
years has vast rows of documents.
The ones that we're talking about today pertain to land,
which is the process of, of leasing minerals and
understanding the ownership of minerals and, and rights
throughout time. So land is not the only part of
the oil and gas industry that's very document heavy, but it's,
it's one of the very important parts.

(09:47):
And throughout the years, Oxy's undertaken a number of
digitization efforts. So that's scan coming in.
It's very rare that somebody's going and actually handling a
paper document these days. As you would hope these have
been digitized. But even once they're digitized,
there's a lot of additional workthat needs to follow after that.
And So what Andrew and I found out about last year, what was an
effort to digitize, or rather they'd already been digitized,

(10:10):
but there was some additional work post digitization happening
on one and a half million land documents.
So the idea here is, you know, we have these scanned in and we
know, OK, for this lease agreement, we have maybe 100 or
150 documents that pertain to it.
But if I'm working in the land department and I'm, I'm asked a
question about a lease, it's notenough to go say, hey, here's
150 documents that govern this lease.

(10:32):
I, I would like to know which document is which type.
I would like to know some of themetadata inside those documents
without having to open up each of those 150 documents.
So you can imagine it's. It's really useful to be able to
categorize documents and extractinformation out of them.
And I would imagine that that process originally even the
scanning obviously is, is labourand time intensive and resource

(10:54):
intensive. But as you say, then having to
look these up etcetera, obviously again time and
resource intensive. You, you can imagine for, for
one and a half million documentsfor someone to go in and
categorize it and then to extract maybe 5 or 6 pieces of
information out of that document.
That's going to take a long time.
And that's exactly the, the project that we walked into last

(11:15):
year, Andrew and me and by the gentleman on the team, Eric, we
we learned of this effort to manually categorize and extract
information out of one and a half million documents that was
going to be done with a a small army of, of contract workers.
So yeah, I think 4040 people were scoped onto this project.
It was to take a year and a half.
And then so the, you know, you can, you can imagine the cost

(11:36):
associated with that. And there are also a lot of
tough decisions you have to makewhen you're having an army of
non experts, contract workers doing a task like this.
The scope of the project has to be very limited.
For example, I mentioned categorization, you want to
categorize what type of documents these are.
When we're doing this manually and we're paying by the hour for
the work to do this, we had massively limited, the land

(11:57):
department had massively limitedthe scope of this effort to just
six a different categories. So very broad stroke, broad
categories. OK.
And then when we talk about extracting some of the
information from inside these documents, similarly, they had
limited it to just the most basic facts and figures, dates,
things that were easy for a non expert to pick up and pull out
of these documents. But once it would nonetheless

(12:18):
help the land team as they're working with these documents
into the future. So, so not only were we dealing
with a very expensive project, avery long duration project, 18
months, it was also going to be a very tight, narrow, limited
scope for the purpose of of costsavings.
So that was kind of the scene where Andrew and I last year,
early last year were looking at each other and we said, man,
isn't this a perfect, perfect project for a large language

(12:41):
model? So timing was on your side
there. I suppose you're getting in at
the the ground roots of a project and seeing whether I
suppose and we'll talk a little bit about this obviously is your
approach. How did that compare to the
approach that the pathway was already indicating lots of
people long time span and only actually constraint of only

(13:02):
getting some of that informationout of those documents, right?
Yeah, I think first, Andrew and I were no strangers to to
language models. We, we've done some work on them
with them at Oxy already. I think all the work we had done
before was obviously a much, much smaller scale.
So that was a bit what, what scared us about this not, not so
much could we do it? We were pretty confident that
this is possible, but to do it at a larger scale and, and the

(13:23):
data management practices that that would require and just the
cloud resources that that would require, those were some of the
things that we were a little less confident in.
So, so we wanted to very quicklyde risk this and, and
understand, first of all, prove out.
Can we do this? Because you know, this other
team is going to take on this project, do this manually
contract this out. It's already been budgeted, you
know, the years underway. The project's about to start

(13:45):
just a month and a half away. So the question is, first of
all, can we prove that we can doit?
Can we convince people, wait, stop what you're going to do,
let's do this instead? So that was our challenge.
Was that a proof of concept for you then?
Did you had that short window? You, you had four weeks, six
weeks to be able to show. That this would work to six
weeks. That's exactly right.

(14:06):
We wanted to do a proof of concept, so we took some of the
a subset of the documents, 1000 of the documents and we set out
to to do again, there's two tasks here, classified and
extracting And just to back up, I just, you know, we looked at a
lot of those documents. You get a sense that, you know,
some of these are easier to read, some of these are harder
to read, But one of the most important things we had to do

(14:26):
before we could even do the classification and extraction
was figure out OCR, optical character recognition and OCR.
You know, people don't think about it as AI, but it actually
somewhat is it's kind of a depending on how you define AI,
that is a form of AI and it's a really important to requisite
step here. A traditionally Oxy had used a
vendor Co fax or for our and youcan see with this particular

(14:49):
document, it actually it really struggled.
You can see Atlantic operating turned into Atlantic gyrating.
While it's kind of funny, we don't want that to make it into
our land database down the road.Sure.
And you can see generally it struggles a lot with and writing
here. So the first task we had to do
was benchmark a number of the different OCR providers.
And I think if anybody's experimented in this space

(15:09):
recently, it's actually changingreally rapidly.
I think a multimodal models are increasingly being used for OCR
and cloud vendors are improving quite rapidly their OCR
solutions. The open source space is also
improving rapidly, though we found a pretty big gap when we
went about benchmarking all the different OCR options, open
source and and through cloud providers.
We found the best results on ourdocuments with Azure Document

(15:33):
Intelligence. OK, that's what we ended up
using for this product. OK.
And I'll say real quick why that's so important.
If you imagine if you're the large language model, if we feed
in that text in the top right here, it doesn't matter how
smart the language model is, it's not going to be able to fix
what's what's broken on the input.
So it's very important to get this very boring OCR step.

(15:53):
It's important to get this rightfor that reason.
Yeah, and it would, it would compound I suppose the, you
know, we accuse LLMS of being hallucinatory sometimes as well
too, but it's got fed the wrong information in the 1st place.
We don't really know where that came from.
And and we're kind of, I supposeblaming the LLM as opposed to
actually the original OC or so. So that was a task in hand

(16:16):
anyway was to get, you know, very accurate, appropriate
proper OC or so that should havewould have taken a good chunk of
your time at this kind of tiny six week window, right?
Yeah, unfortunately, as you saidit in this short window, we were
lucky. We're again, we're dealing with
a small subset, just 1000 documents instead of one and a
half million. We're just trying to prove the
point here. So we, we were able to get this

(16:37):
benchmarking done pretty quick and move on to the next task,
which was classification. We talked about classification.
I, I teased you earlier with thescope being very narrow to six
categories. Well, not so lucky in our case.
I think once they realize, hey, AI can, can maybe do this.
And you said, well, instead of using the six categories that
you know, that the 40 contractors would be asked to

(16:58):
do, let's, let's try expanding that scope a little bit.
Let's do 140 categories. This would be much more useful.
Let's 20X. Yeah, yeah.
So I don't know how we got hoodwinked into that, but
somehow that happened. You were up for a challenge,
Yeah. I will say we were really lucky
for this project to have for most of these categories we had
reliable hand labeled data from our existing legacy records.

(17:20):
So there's one and a half million new documents is what
we're going after. But for the older documents we
have a lot of of reliably hand labeled data.
So for those that do any kind ofmachine learning in the in the
audience, you might kind of see where this is going that we can
use a supervised method for the classification here.
OK, OK. And so the approach we took is
we use an AI model for for getting an off the shelf model.

(17:42):
Azure Open AI or rather Open AI releases this model text
embedding 3 large, you can use it through Azure Open AI
service. And those embeddings are pretty
high dimensional. They're 3072 dimensions.
But, but once you have that highdimensionality space, you can
here in this visual, I've reduced it so you can just kind
of see the difference between those documents, but you can
also train classifiers against those because those embeddings

(18:04):
are just numbers, right? Train a classifier.
So we just train a classifier supervised learning on these
embeddings plus some hand curated features.
And, and with that, we were ableto achieve really, really high
accuracy. We were able to achieve
depending on the exact amount ofcategories and how difficult the
documents were, we were we were on the order of 95 to 97%
accurate. Contrasting that with the the

(18:26):
non expert contract labeling wason the order of 88 to 92%
accurate, but but again that's with a much more limited subset
of categories, so much easier problem.
So we were really, really pleased with with the
classification results. And in the same way that you,
sorry Alex, in the same way thatyou had to, you know, check out
a few different OCR's to find a good one, did you have to do the

(18:48):
same with regard to choosing theembeddings models as well too?
Because there was. There's so many to choose from,
OK, definitely these days. And even a year ago, when you
were getting started with this, there was still quite a lot
knocking around. Yeah, I, I think when we were
first experimenting with this, we tried some, you know, I, I
don't remember when text embedding 3 large came out, but.
Yeah, I think at the time embedding.

(19:08):
Three large had just come out. And so we now knew this was
state-of-the-art and it made it really easy on us.
We, we might have been in the middle of the, I don't remember,
we might have been in the middleof the POC when it came out, we
were using it. It's, it's one of those amazing
things about working in AI rightnow.
It's like you're doing something, you're getting this
level of accuracy that next weeka new model comes out and
instantly with no effort bumps your accuracy up.
So it was like one of those moments where we're like, wow,

(19:30):
this is too easy. But, but not to say there wasn't
any iteration to your point about trying different models,
yeah, we tried different embedding models, we tried
different classifiers. We did some experimentation.
But I I think by and large, certainly in the POC phase, most
of this just worked quite well out-of-the-box.
Brilliant, Brilliant. So with the extracting being,
you know, on top of its game anddoing well, and with the

(19:52):
classifications now working, I mean, was that enough with those
1000 documents in your POC to get the thumbs up for this
project as opposed to the 40 humans in A room?
That's right. So just from the classification
results, there was growing confidence among the land team
that this was going to be the way to go, not just because it's
more accurate, but because getting so many more categories

(20:13):
would add a lot more value to them down the road.
That makes sense of these documents.
So the last thing for us then was to just prove that we could
do the extractions. And with that, again, we took a
very, very simple approach approach.
We we used an out-of-the-box model to do these extractions.
Some of these were very simple extractions like things like
what I showed earlier, gates, counties, just very simple

(20:33):
fields that you could that an untrained person could catch.
Some of these were more difficult things like requiring
interpreting some minor simple interpretation of legal clauses
like a few clause or a cessationof production clause in a
document. And actually that was where we
were a little unsure and we werequite pleasantly surprised at
what the language models could do in terms of interpreting
paragraphs of and boiling it down to just a database friendly

(20:56):
scalar answer that we could insert, which I guess we'll get
to in a minute. But yeah, again, off the shelf
model, we used GPT 4 O for this.And again, this is more accurate
than the the contractor level classifications.
Excellent. So, so just off the shelf, no
fine tuning, no nothing else with regard to the model, off
the shelf model. Off the shelf model.
No fine tuning, just a very veryelaborate prompt.

(21:19):
This was one of the early, earlyversions of our prompt for for
doing some extractions and full credit to Eric Smith on our team
was the person that developed these prompts.
He had worked as a as a land manprior to his role on this AI
team. So he was actually quite
familiar with a lot of this and he was able to kind of translate
from land to AI and back and forth to get these models

(21:39):
working the way we needed. OK, OK, so and the prompt text
is quite small. I'm sure most people can't read
what's up there. But this is this is where the
role of the clever prompt engineer comes in.
You know, being able to preface it, set the scene, ask the right
type of question and explain howyou would like the result,

(21:59):
right? Absolutely.
And I think, you know, we could debate back and forth about if
the role of a prompt engineer isa real job or not.
But I'll, I'll say for Eric, it sure was it's.
Another show, Alex. That's another show A.
Whole episode on that I imagine I'll say at least for the state
of the models they are today, there's you can always eke out
better performance with with better prompting.
So, so there's no doubt that it's a useful skill and I think

(22:22):
it increasingly drives what's interesting is it increasingly
drives more of the the work and the power into the hands of the
subject matter expert. I think there's a lot of fear
about, you know, AI is going to take away what what we do for
jobs. But in this case, it's the land
SM ES that they are most important because they're the
ones that you need to to craft this kind of prompt.
Or you know, some of the later versions that were far more

(22:44):
elaborate. Yeah, no, it's a very fairpoint
I think. And I deal a lot with the main
Mongo DB in helping the code assistance that developers use
day-to-day better at Mongo DB tasks.
And I can totally see that, that, you know, we do a lot of
working with partners to help fine tune and to train and, and
to give ground truths and do evaluation sets, etcetera as

(23:04):
well too. So, yeah, I'm, I fully agree
with you there. I don't think we're removing any
jobs. I just think we're getting rid
of the, the, the kind of mundanetasks, right?
And bringing everything up to a higher level.
And maybe I'm getting ahead of myself.
Where does Mongo DB fit into allof this then Alex and Andrew?
Well, wasn't that awkward. We're on among the podcast, so
we probably mentioned Bobby DB yet I think that's where Andrew

(23:27):
is going to come in and maybe speak to it.
Perfect. So this just kind of closing out
the story here for the POC phase, we obviously we had
really good results on this subset, this thousand documents
of the one half million. And so we were able to make the
decision between us, it and the land department.
We were able to make the decision that, hey, let's, let's
ditch this 18 months manual expensive project.

(23:50):
Instead, let's try and do this with, with, with AI and, and now
you know, you know, we came to, I came to Andrew, Eric and I
came to Andrew and said, Andrew,can you help us scale this?
And he was like, what have you, What have you guys done?
But he'll talk a bit about how that scaling works.
I love how you gently you approach that topic.
Can you take our 1000 document sample and ramp it up to
11,000,000? Off you go, I mean.

(24:11):
How hard can it be? How hard?
Can it be? Yeah, Andrew, how hard was it?
Yeah, it turned out it was not too bad, very modest.
So yeah, the slide Alex is showing, I guess, I guess some
things I would know here. It's just originally for the
1000 document POC, we did go ahead and build out some
architecture of how we would do this at scale.

(24:33):
And that was part of the POC. It was like, hey, you don't just
want to manually be running, youknow, a Python script locally
and that says that you could, you know, process millions of
documents. Obviously we were going to need
probably some compute and services to do this.
But yeah, to mention Mongo, I guess we actually started down
the Mongo path mainly because atthe time there was a very

(24:55):
limited amount of databases thatcould handle embeddings, at
least properly. So I think the options were like
Postgres, Mongo and then what's that one?
They were dedicated. Yeah, like dedicated.
Yeah, yeah, Pine Cone, there's some, but we wanted to go with
something more mainstream. And then between Postgres and
Mongo, we were, we were trying to move through the POC so

(25:19):
quickly that we really didn't want to be locked down to some
specific schema or design that we had worked out because we
didn't, we didn't know where it was gone.
You know, one day Land's asking for 140 classifications.
The next day, who knows what they're going to be asking for.
We might just, we might need to be changing things really
quickly. So we landed on Mongo just

(25:39):
wanted to be in the no sequel world.
And then we had talked to Peter and Robert over at Mongo DB and
they're kind of trying to, I guess warn them like, hey, you
know, this is a really intensiveprocess.
We're we're not sure if Mongo can handle it.
And it was like, yeah, this is listen, small potatoes, don't
worry about it. No problem.
And so sure enough, you know, Mongo at the thousand of course

(26:02):
it was fine. And early on we really were just
doing inserts to Mongo. So what we would do is we would
take the data from Filenet, which is our CRM and just dump
that directly into S3. And this, this was pretty
simple, just Python script and get off one place, but in
another place. And then from there we would run
things more in parallel, just through Lambda functions.

(26:24):
So we'd have the functions handling pretty much everything
Alex just talked about just kindof steps.
So, you know, go out, re OCR thePDF and we have good OCR data
now go out classify it. And now I have classifications.
All right, using classificationsand the PDF text data.
Let's go extract what we need tofrom that.
And all the while we're kind of building up this document and

(26:46):
then we just dump it in the Mongo.
And that worked pretty well for a while.
For the 1000, that was no problem.
So, you know, if we're running let's say like 100 layout does
in parallel and each document maybe takes a minute to
classify, we're talking very little transactions.
Let me see if we go. Yeah.
So one issue we started running into though was we didn't really

(27:08):
account for a lot of the edge cases that come with when you go
from 1000 documents to 1,000,000plus documents, you're
introduced to a whole lot of different types of documents
then you would have encountered.And so we were running into
things like, OK, we, the biggestdocument we saw early on was
maybe, I don't know, 50 megabytes.
And now we're running into a document that's like 3 gigabytes

(27:30):
that's going to break some stuffif you're not ready to handle
it. And so again, like, I don't know
if this is the best architecturelooking back on, it's probably
not. What we did was every time we
were kind of like timing out these lambdas from whatever it
might be. This document's really large.
It's going to take 20 minutes toprocess.

(27:50):
Lambda's only going to give us 15 minutes.
Yeah. What can we do?
We basically just start storing every step of the process in
Migo. So lots of reads, lots of
updates. OK, so we move from, you know,
one to five transactions a second to thousands of
transactions a second because we're just constantly hitting
Mongo for either getting a current state, writing a new

(28:11):
state, you know, finishing the dock, updating the whole dock.
And I mean, it sounds bad, but it's like I actually didn't
think about Mongo the whole timewe're doing it like nothing went
wrong. You know, it's just like this,
this piece of in our text stack just kind of magically worked
and we didn't have to change theconfiguration of it.
We didn't have to, you know, give Peter a call and ask for to

(28:32):
come over and take a look at what we're doing.
It was really like the least of our problems.
And so OK, OK, Yeah, it wasn't aproblem at all, honestly.
So yeah. We'll quote you on that one.
Andrew, we need a quote from youto say Mongo DB is not a problem
at all, it just works. Or something to that effect.
Definitely, definitely. But I'm sure.
Our marketing team will reach out to you at that point.

(28:52):
So and and look, obviously you touched on the fact that you
didn't know the scheme and the flexible scheme of the document
model helped a lot. He touched on scale there as
well, too, et cetera. Was there any other, and you
touched on the various types of documents, small, large and
humongous documents as well too.Any other specific challenges in

(29:13):
that in this architecture here that you you might have seen if
at the POC the thousand documents but then surfaced up
other than the different size and scale of the documents that
you already mentioned, I think. We're talking about like token
surfer limitations. Yeah, yeah.
So like, again, you have to rewind by your two on this and
think about when the language models were really hitting cloud

(29:34):
vendors in earnest. Everyone was trying to use no
one had enough GP us and so everyone was pretty locked down
on token allocations. And so at Oxy, what we did was
with Azure, we basically just had a shared resource stood up
where SM ES at the company who kind of knew what they were
doing with AI want to try some stuff.
They could get an API token to this shared resource and start

(29:57):
using AI models, which was fine.You know, we, we want people
across ox in across domains hereto test stuff.
But what that caused for us is, yeah, we're trying to process
documents full throttle. We're trying to use every token
we got as they're coming up. And people would go run their
own experiments. Maybe they're doing something
else with documents that's goingto be very demanding or just

(30:19):
like they're doing their own POC.
And so we started running into all sorts of rate limiting
issues very randomly throughout the day, maybe three times a
day, we, you know, get 420 ninesfor 20 minutes and then they'd
magically go away. And we realized what was
happening was other people are trying stuff.
And so we couldn't really just hard code like, you know, some,
some nice, like perfect amount of lambdas to be running to

(30:42):
where we never ran into token issues.
So the other thing we would do is just like if the Lambda
failed on calling an insured resource, that was totally fine.
We weren't going to lose any progress because we're
destroying everything in Mongo as it was our state or our state
machine. So yeah, that's kind of what we
did. And that worked really well.
And what I'll call out is like we're, we're AAI slash

(31:03):
engineering kind of team. We're not a software team, we're
not a database team. So I, I think where Mongo really
worked well for us here was it, it made it very easy to take
this transition from, OK, we have a, a working TOC.
How do we scale that up? And each step along the way,
you're finding new problems and you're having to bolt on new
solutions and new workarounds orhandle edge cases.

(31:24):
And just because of the flexibility and kind of
effortlessness of Mongo that that what Andrew was saying
before, it was very, very simplefor us to go through this
scaling process without having to do a total rewrite or start
from scratch or really do some major, major work.
It just made it very, very effortless.
So I think for us, you know, kind of from that perspective of
an AI team that's trying to movefast and change the fly, it was

(31:47):
the perfect, perfect thing for us to use in this case.
Excellent. Well, yeah, you're saying all
the good things about MongoDB. I haven't prompted you at all.
It's all. Yeah.
I'm not going to get a marketingphone call after.
Yeah. Yeah, it's.
No, no, it's all good. And it's, you know, obviously
MongoDB is is cross cloud, but you're, you're combining Azure
and AWS technologies here as well.

(32:07):
Was there any in that kind of transform and loops and
everything else involved here any any issues in that space at
all or just as as with everything else was just working
fine scaling up from that POC? Honestly the only challenges
here as far as cross cloud and stuff like that was probably
more internal to oxy. Like basically Mongo is a

(32:30):
marketplace app instead of, you know, just native US service.
And so we we just have to go through a couple things in
supply chain to kind of iron that.
Out and be able to use Mongo instincts, but that was very not
a big deal easy for boosted AI work that we do that if you ever
see cross cloud you kind of scratch your head you think
maybe that's going to cause sometrouble.
But generally if we're working with text like a lot of heavy

(32:53):
text data and most of these applications and and really the
amount of data you're sending back and forth since it's mostly
text, it ends up being pretty small and and not causing a huge
issue. There are certainly cases that
are like video or image driven where you might want to stay
better within the constraints of11 cloud ecosystem, but for this
it ended up not being an issue. OK, great.
And just for our audience joining us, if you've any

(33:13):
questions for Andrew or Alex or anything on the solution that
they've built, drop them in the chat and we'll try and take care
of them as we as we talk througha bit more.
So what stage are we at here nowor what state?
Like have you scanned and extracted and classified
everything that Oxy have now? Is that the case and and is it
the case that this architecture did you have up here is just

(33:35):
invoked as you sign a new agreement or lease somewhere?
So. Speaking to this one point. 5
billion that was like the original project scope that that
finished. And do you want to talk to kind
of how this architecture kind ofhelps us moving forward to to
other document projects? Yeah.
So I'd say for one thing is I think Alex mentioned early on

(33:58):
Land's not the only spot in Oxy that has lots of things like
PDF. Every company think of like HR,
legal, supply chain, you've got all these agreements, PDFs,
contracts, but not. So we, we actually did have a, a
project with legal where they had some needs for OCR and
extracting, classifying many, many PDFs.

(34:20):
And they've gotten wind of what we did here with land and we
were approached by them and they're not to do the same
thing. You know, let's bring on a team
to help us process all these documents.
And we basically just dragged and dropped this thing onto the
legal documents and, you know, updated the prompts.
Couple things there, but like itwas a very low friction quick

(34:42):
solution. So where this might have taken,
I don't know, six months from start to finish, the legal one
took about six weeks. That was 2.
Two weeks, sorry, technically 5 weeks, but one week.
Was waiting for feedback for thelegal team.
So we've seen these cases there.And then on the to answer your
question more on like, you know,what are we doing for all
documents across Oxy. I think this did churn up a lot

(35:03):
of interest in saying, OK, thereis a way we can take all these
documents we have in file net, which is a massive amount.
And maybe we can start the process of getting good OCR,
extracting useful metadata just on a very general scale of how
we can classify those documents and sort of rework these whole

(35:24):
document foundation into to something more up to date.
So I think that's being explored.
That's a much larger. Scope and I'll say we're being a
little bit vague here on purposewith what we're allowed to talk
to on some of these upcoming projects, but our ongoing
projects. But yeah, suffice to say I think
the core capability here of of classifying and extracting

(35:46):
information from documents is super, super scalable to to many
different applications. And I think where we see, I
think this everybody tries, I'm not trying to start a fight, but
but I think everybody tries withtheir AI project.
The first thing everybody tries is, is like a chat bot or a rag
based application. And everybody goes straight to
like the harder projects. And if I'm going to speak one

(36:09):
suggestion to anybody dabbling with AI and looking for internal
applications, it is start simple, start with just
classification, just extraction,like really, really easy.
And then go after those those rag or more complex applications
next. And that's what we've done.
And I think doing it in that order worked out a lot better
for us than some of my peers whoI talked to that have tried to

(36:30):
do the reverse. I think a lot of people get
stuck on that rank project and and and spend a lot of waste a
lot of time. OK, OK, that makes sense.
So in this is like this is the extraction classification, but
tell me a little bit about that retrieval process.
So you you did or didn't do a rag approach to getting this
information back out again for whoever needs that is it is what

(36:50):
does it look like to the user now to be able to access all of
these these land documents that you would have put through the
system? So with this original project,
the goal was to get that extracted data back into our
land information systems. QLS is the Quoruman system is,
is the main database we use or system we use behind that.
So once it's back in here, it's quite effortless for, for the

(37:11):
folks in the land department to access that data, do automated
analysis with that data. To your original question, which
is what are we doing with with RAG?
Like what's the future for land in terms of are we doing
anything with RAG? And Andrew could speak briefly
to some some stuff with Atlas, Iguess.
Yeah, I don't know if you want. Yeah.
I mean, so I think like I was saying, we had very simple
objectives early on and now we're sort of exploring maybe

(37:35):
what you might call the more interesting use cases like or
talking to a document and getting some information out of
it. And so we do have a couple
applications for that. We've played around with Mongo's
Atlas search index. I think Mongo just posted some
GitHub examples of just like basically AI and Mongo.
So some examples for people to follow along with.

(37:56):
So we'll probably test some of those out and see how they do on
some of our documents. But yeah, we're, I guess we're
not sure what the future holds here.
We're going to try some of this about.
Good, good. And look, thank you for that
call out. Yes, we have, we have a repo,
public repo called the Gen. AI showcase that a lot of the AI
Devrel folks and indeed a lot ofour engineers and product team

(38:16):
have been creating examples and demos of how to build.
Obviously you're very familiar with vector search, but you
know, people would know MongoDB as a database company back in
the day. And as you said, you know, there
was, you know, two years ago there was dedicated vector
databases and you know, we brought vector search capability
to Mongo DB back in June of 23. And and since then we've been

(38:38):
having great success being able to kind of participate in this
Gen. AI revolution that we see going
on at the moment. But it I just I and I was really
good answer because I had assumed that the retrieval
portion of this bit was a rag type thing.
But exactly as you said, Alex and Andrew, so you it was the
extracting and the classifications that you then

(38:59):
was powering the existing systemthat you already had in Oxy to
make everything work. How hard was that to integrate
with that old, I'm assuming an older existing system, right.
But it was just getting the datain the shape that that system
needed automatically, yeah. It was extremely unpleasant.
A lot of these I think, yeah, anybody again, who's working
with a lot of AI products, a lotof times the worst part is just

(39:21):
getting what you need out and back into these legacy, legacy
systems, which often times don'thave like nice API layers for
communicating and out of them. It's it's more awesome than
like, yeah, just just really clunky ways of getting data in
and out. Just to continue my rant, I
guess about against rag. I think the natural instinct,
what when looking at something like this would be to say like,
yeah, let's do a rag rag app where somebody can come and ask

(39:43):
like, oh, what's what's the how many days do I have before this
lease suffers from a cessation of production event and the
lease is cancelled. That would be a really, you
know, yeah, easy, you know, right example.
You kind of adjust it all. You have some filtering that's
like just just like, sure, spaceand some that's more like vector
search base. But that whole way of doing it
still requires a human in the loop to go to the chat bot and

(40:07):
ask the question. Much, much better if we have an
existing system, plus in this case, where we can populate all
the data properly and have some automated system working against
that, which is effectively just a query or a scheduled task or
scheduled process where you don't have to have a human come
in and type that and remember todo that.
I think, yeah, increasingly where we're trying, like if we

(40:28):
need a human in the loop, we'll keep the human in the loop.
But if we don't, let's let's just automate this and make it
as easy as possible rather than having a chat ball where people
have to go ask questions. OK, excellent.
Which is a nice segue into and you touched on this at the
beginning was you know this project after the POC and then
after you know putting it acrossall the other you know kind of

(40:49):
taking it up to 1,000,000 million and a half and beyond.
What was the business impact forOxy and what was the return on
investment? You had said that was going to
potentially be 40 people doing amuch smaller classification
project. Tell us a little bit about how
those numbers resonated within Oxy then, having done this
project yourselves. Yeah.

(41:09):
So like you said, yeah, a couplemillion in savings by averting
the manual classification effortgot our answers 12 months sooner
because instead of 18 months, this was a six month effort.
Wow, OK, wow. That's yeah.
Cost savings, time savings was big on this.
But but yeah, the larger thing is, is now that we're, you know,
this year we're looking at a lotof more sophisticated AI

(41:30):
projects and land. And I, I think what this first
project did is it really gave the, the wider company and land
department especially a glimpse of, of what this tech can do.
And it gave us to be honest, an idea of what, what this tech can
do. And I think it's, it's really
given us the license to go in and look at any process that's
very repetitive, very manual, very unpleasant to do.

(41:51):
Today we're we're looking at automating large parts of those
projects and maybe maybe in eight months or so we'll have
something else we can share on the on the Next Iteration
podcast about some of that. But yeah, I think we started
simple and I think now land knows and we know what's
possible and there's a lot more,a lot more getting done.
Yeah. And as you touched on earlier,
HR got wind of this and legal got wind of this.

(42:13):
And as you said, this, this architecture with little or no
changes was applicable to their use cases as well too.
So obviously this is, you know, spearheaded given the cost and
time saving potentially lots of other projects and more work for
you guys, right? Yeah, for, for better or for.
Worse, right? Yeah, I had an old boss of mine
who would say the reward for good work is more work,

(42:35):
basically. And I think you 2 essentially
exemplify that, right? You, you did it, it was a third
of the time with only a few people, much, much less in cost.
And now everybody else wants to use it, right?
That's right. Excellent.
So what advice given this project and I'm assuming given
our audience is pretty varied, what advice would you have?

(42:56):
You know, you went in this from Apoc saying, look, we got to
test the OCR, we got to test theembeddings, we got to see, does
this work? What advice would you have for
companies or people in a similarsituation looking to to
modernize, in this instance, their document management
systems, but you know, modernizetheir applications and their
database and using AI? That can start.

(43:17):
I I probably mentioned two things, one of which Alex
already talked on. What else talked about where
it's much better to just do something than have to kind of
have a hybrid solution of humansstill have to interact with it
and now it's doing it. But like when you, I guess when
you introduce tooling basically forces a human to learn
something else, that's somethingusually faces some opposition.

(43:39):
Whereas if you introduce some tooling that instead says, hey,
you used to do this thing that you didn't enjoy, and now you
just don't have to do that anymore.
That's how much. People prefer that.
Yeah, like that's my ears, definitely.
Solution. You're not like, hey, check out
this training for this new frontend we go and it'll make your
life easier because like in thatperson's head, they're just

(44:01):
thinking all this is one more training I have to do, one more
thing I have to learn something I still have to keep doing.
And so one just philosophy I think our team has taken is how
can we just cut processes all together?
You know, let's not think about like necessarily enhancing it
unless that enhancement means cutting workout completely.
So that's one thing. And then the other thing I've

(44:22):
seen on this team is I think we have a lot of but people who are
willing to wear a lot of different hats.
And so we don't we don't have like, you know, Adba who does
all the manga work. And once the manga stuff feels
good, we have a cloud engineer who does only cloud stuff.
And then once that seems good, we have a dev OPS guy.
And so, you know, it's everyone is doing everything.

(44:44):
And yeah, different people have different strengths and that's
great to lean on that. But at the end of the day, I
think most of the team could probably do this whole project
start to finish. And and that makes that gives us
a lot of flexibility on how quickly we can move stuff we can
get done. And so, yeah, that's that's been
something that's really helped us out.
It's just people are willing to figure out how to do stuff in a

(45:07):
fairly constrained environment and with Maple skills they
didn't have before, but they're willing to go learn.
Yeah, we were able to move really quickly on all this
stuff. Accent.
Yeah, I think that's really goodadvice, yeah.
Yeah. The only other third like piece
of advice that I would add is isgoing on what Andrew said is
just try things. I'm just experiment.
I think the only way you learn what these models can and can't
do is by experimenting with it. I think it's so new and it's

(45:29):
changing so fast. You have to try things.
I think we've had cases where some team comes with us, comes
to us with a problem and says hey, can't do this.
And we say, yeah, let's give it a try.
We try and we said, yeah, we it's kind of working, but I
think to get it to the accuracy you need, this might take six
weeks and then and then, you know, we kind of leave it and
then and then two weeks later a new model comes out.
We tried with a new model and itjust one shots it 100% accuracy,

(45:52):
no extra work required on our side.
So the only way you'll learn what these things can and can't
do is just by experimenting, by trying it out and, and it's a
lot of fun. I think you just have to like,
like you kind of picked up on earlier, you have to de risk,
you have to do short POC's, you have to move fast.
And like Andrew said, that the team design needs to be such
that you're not having to have committees and multiple teams
meetings and scrum cycles and stuff.

(46:14):
None of that's going to work. fast-paced, fast-paced.
You just have to be able to movequick a small team.
OK. And on that I suppose the new
and the fast and how things are changing so quickly, either of
you, what's what's the most exciting thing you're seeing now
even outside of Oxy in terms of what AI is capable of doing?
Is there anything that has you really excited and potentially

(46:36):
you want to play around with even in your off hours, right,
not during the daytime perhaps as well too.
What's exciting you most? I'll say for me first, it's,
it's the reasoning models. I think we saw this the, you
know, we look at the big multimodal models, big, big
language models there, there were these, there was a scaling
paradigm that got us to GPT 3 where we just had more and more
data. We scaled up free training.

(46:57):
We got these really huge and awesome models, but they weren't
super applicable or useful yet. And then we saw the the
basically supervised fine tuningfeedback or our league feedback
that got us to GPT 3.5 ChatGPT. And that was really, really
exciting. And now we're kind of exiting
those two regimes. Like, you know, they're not, not
to say they're fully exhausted, but we're kind of seeing that
signs of exhaustion of those scaling regimes.

(47:17):
And now we're moving to this, this reasoning based scaling,
which for any kind of verifiabledomain like like coding or
mathematics or some fundamental sciences, I, I think it's really
quite promising what's going to be possible with these reasoning
models. I think for us, where we had
seen some limitations on certaintypes of tasks that required,
I'll give an example pertaining to land, which is a lot of the

(47:37):
work we presented today was on, on single documents as a unit,
right? We're having the language model
classify and and extract on a single document at a time.
And where we saw struggling in the past with previous models is
in cross document reasoning. And I'll say with the new
reasoning models, we're seeing alot better results on cross
document work. And I think that's going to be a
huge thing for for this specificsubject, but also for many, many

(48:01):
other topics. I think this, this, these
reasoning models are going to bea very, very impactful.
Yeah, it's a good point. We've been playing around with
say like the reasoning engine from Google etcetera as well
too. And it's, yeah, I think that
moves us into a new sphere because as you say, a lot of a
lot of the applications are built, you know, they're they're
looking at a single force of truth, essentially.
And in fact, as they grow and get bigger, is there anything we

(48:24):
hear agentic all the time now, anything in that space that
excites you? Definitely trying to think of
what I can say specifically. No worries.
No, But yeah, what I'll say is for an agentic system to be
successful, the standard for accuracy goes up a lot because
generally with agentic systems, you're seeing multiple steps,
right? So you'd have, you know, sit,
say, instead of just an input output, you might instead have

(48:46):
like a 5A5 step process where ifyou're at 98% accuracy with each
of those steps, so you need to multiply .98 * .98 * .98, and
then you're going to get maybe amuch worse accuracy on the other
side of that. So I'd say it's a much harder
application to build an agentic system, but I think there's a
ton of value there and we're seeing a ton of value there.

(49:06):
I just want to start I guess it's all.
Yeah, Yeah. I look, there's, there's a lot
of ways to go. And then look, as you said, it's
moving so fast and everything's so new.
I think you're going to see frameworks for Agentic come out
to help you do that because you need to safeguards in place,
right, to make sure it doesn't go off a Cliff at the other end
if you've left it unmanned as itwere, to do its own reasoning

(49:28):
and and make its way through number of stages, I suppose.
Yeah. And I'll say kind of to that
point, you know, in oil and gas,we also need to be conscious
about, you know, we have to be picky about our application.
I think, I think there's there'sprojects that we've looked at on
our team or we've been approached internally for, for
projects where where we really need to have a human in the
loop. It's not seem to have an AI
system like something start to finish, you know, with land

(49:50):
document classification. I think the risks are a lot more
limited. You know, maybe you're limited
to financial risk, worst case scenario if you mess something
up. But but yeah, certainly with
things affecting equipment in the field or or safety
operations or personnel decisions in the field, I think
we rightfully are acting a lot more cautiously.
And I'd encourage others to actually a lot more cautiously
in those applications. Sure, sure.

(50:11):
And in the same vein of everything moving new and fast,
how Andrew, like how do you keepup with what's going on in the
AI space? For example, is it, is it blogs?
Is it news feeds? Is it podcasts?
How do you keep on top of things?
Yeah, For me, I mainly just listen to what people on my team
are saying, OK? I mean, we've got some guys who

(50:31):
are actually just had a guy who is basically scanning the
archive and the Columbia University.
I just. Supposed to not I forget but but
in any case, yes, scanning for papers.
And through papers, sending out any interesting papers every
week. OK.
So like he's passed in on? Top of the Alex stays on top.
I'd say it's like it's reading papers and tech, you know, ML

(50:53):
Twitter, which you have to be kind of ruthless about curating
your your Twitter feed to keep it tech focused.
But but that those are the two big sources for me is papers and
Twitter. Yeah.
Because there's an awful lot going on.
There's, as you said earlier, new things coming out all the
time. There's a lot of leapfrogging,
you know, you think, oh, this isgonna be the solution we need.
Then you see something else that's easier.

(51:14):
Just just today frog 3 came out so now it's like, yeah, there
goes my afternoon. I'm gonna have to be testing the
latest. Yeah, the new model that looks
to be a bit better than 40 for for non reasoning model.
It's it's probably the strongestnow.
So yeah, it keeps it exciting. Just just, yeah, like, like you
said, it's a lot of nights and weekends to to on top of what's
changing in this fast baby. Excellent.

(51:35):
Well, look, this has been fascinating and it's an amazing
to see, I suppose. Look, I, I do this live stream
an awful lot and a lot of the time we've got demos and we've
got some Pocs etcetera. But to see it in action, to see
it at the scale that you've managed to do this in Oxy to, to
see the proof of concept borne out into fruition, as in it was

(51:55):
quicker. It didn't need 40 people and we
did it. And everybody else wants to
leverage our architecture going forward as well too.
Is brilliant to see. And you know, you've explained
your own processes internally really, really well.
Any last, as we come to a close on the podcast, any last parting
comments for for our audience orkind of words of wisdom or

(52:15):
things you've learned or even looking back on this project, is
there anything that you would have done differently or
anything that you really stumbled on or everything super
smoothly and no problems Do it all again the same way.
I think, I think the only thing I would say is one thing I've
really felt especially just withthe app end of is language
models has been having more of an understanding of a solution

(52:38):
Architect has been, I think really helpful because it's,
it's become so simple to dive deep into something you want to
know how to do, but you do have to know what questions to ask
kind of, I've found a lot of value.
It's kind of kind of made me change my philosophy a little
bit on learning going forward islike, you know, do you spend the

(53:00):
time to become truly an expert at one thing or can you get a
lot more value from sort of knowing a whole lot of different
things and how they work together?
More of like a system design instead of, you know, deep SME,
whatever it might be. And so I think there's a balance
there. But yeah, I've really felt
lately that just just knowing how systems work and seeing how

(53:20):
they should be built, it's become a lot easier to actually
go build the system. But kind of understanding the
edge cases, what to watch out for, where things go wrong, all
that stuff. You really do have to have a
good understanding out of where things will go very wrong.
But yeah, building has become much simpler than it used to in
the past, so I don't know. I'm still trying to figure that
one out myself, but exploring. And Alex, anything to add to

(53:41):
that? Yeah, I'll just, I'll just say
there's one thing on my wish list for 2025 and I intend to do
a better job at this. I want to see more open source
work and oil and gas. I think there's a few companies
doing a really good job at this.But that's my call to action for
for everybody listening and alsojust for me personally, I think
I think we need to share more. I think it's exciting to see

(54:02):
what other people are helping, so I hope to see more of that.
This year, Yeah. Look, I'm a big fan.
It's, it's always, you know, great to to learn from other
people and to see what they're building and to have that shared
experience and that sense of community.
I suppose as well, too, that, you know, as, as you say, large
companies, sometimes that's hardto foster, Right.
But I think, you know, everybodyis trying to solve not much the

(54:23):
same problem, but, you know, architecturally much the same
problem. And we could all learn from the
collected. Yeah.
Perfectly perfect. Well, look, this has been super
informative. I've learned a lot, as I always
do. It's the reason why I do these
podcasts is from my own education as well too.
But Andrew and Alex, you've madeit incredibly clear to follow
the pathway. And, you know, as a really, as I

(54:44):
said earlier, a really good example of taking that, you
know, understanding that inclination that I think we can
do this in a different way. And then into the POC and then
into the the actual production. And to take, you know, I suppose
that drudgery out of that classification and extracting
and to help the, the collective team there and, and go, look,

(55:04):
it's all in your system, work away as you always did, but it's
now just easier. So I think that's a brilliant,
brilliant example. And very much, I think you
alluded to, Alex, there might besomething in the future.
So as soon as you build something new and different,
hopefully with a bit of Mongo DBin the background there as well
too. We'll certainly get your back on
the podcast to show show it again as well too.
But Alex, Andrew, thank you so much for your time.

(55:25):
We do appreciate it. Thank you for everybody who
joined us as well too, and this is great.
And as I said at the intros, do keep up to date with what we're
doing on these shows. By liking and subscribing on our
YouTube channel and following uson LinkedIn, you will be able to
keep up to date with future episodes like this.
And if you want to keep up to date with you know, our examples
that we mentioned earlier and other case studies, etcetera as

(55:47):
well too, just go to developer.mongodb.com.
You'll find everything that our Deborah team and others with
inside MongoDB create and build.And if you're not already using
MongoDB, but you'd like to get started, we've a load of help.
We've got a great forum, we've got a great community where all
our users hang out and obviouslyour engineers and our Devrel

(56:08):
team and our product managers aswell too.
So community at mongodb.com. That's, that's the end of my
plugs. Any parting words, Andrew or
Alex? Thanks for having us.
Yeah. Thanks a lot, Shane.
Spotify. No, listen, you made my job
super easy. It's been a pleasure to host you
both. And yeah, I look forward to see
what you build in the future. But for now, thank you so much.
It's been a pleasure. Take care of you, everybody.

(56:28):
Andrew, Alex, thank you. Thanks.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

The Bobby Bones Show

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}EP. 262 Solving Unstructured Data Challenges with AI & Vector Search

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

The Bobby Bones Show

All Episodes

EP. 262 Solving Unstructured Data Challenges with AI & Vector Search

Stuff You Should Know