Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Andreas Welsch (00:00):
Today we'll talk
about automating your document
processing with LLMs.
And who better to talk about itthan somebody who's actually
working on that?
Petr Baudis hey Peter, thank youso much for joining.
Petr Baudis (00:10):
Thank you for
having me, Andreas.
Andreas Welsch (00:12):
Great.
Hey, why don't you tell ouraudience a little bit about
yourself, who you are and whatyou do?
Petr Baudis (00:17):
Absolutely.
So yeah, my name is Petr Baudis.
I'm a co-founder and CTO ofRossum.
I'm a programmer by heart, Iwould say.
So, yeah, I've startedprograming when I was like 10.
I contributed to many many opensource projects.
I worked on many AI projects aswell as early as 2005 ish.
(00:39):
I, for example, collaborated on,in the area of Computer Go on
some programs, which thenDeepMind, when building AlphaGo,
used as their main reference.
But for the past eight years,I'm building ROSUM.
So the R&D department and theproduct strategy belongs on my
side.
And at ROSUM, what we do is wehave this vision that when you
(01:04):
as a company are processingincoming documents, it shouldn't
be open spaces full of peoplesitting there eight hours a day,
five days to be just typing oversome pieces of information from
from pieces of paper to somecomputer forms.
This should be automated.
Our vision is that one personprocesses 1 million incoming
(01:25):
document based transactions peryear in such a company.
Right now it's tens ofthousands, maybe low tens of
thousands typically.
So it's a pretty ambitiousvision.
And we use advanced AI.
We are deep tech company and weare focusing on managing this
process end to end, so not justcapturing data from documents,
(01:46):
but everything from receivingthe document to the company
through things like approvalflows, GL coding, etc.
All the way to getting the datain great quality to your SAP,
Oracle, or whatever other systemthere is.
Even automatically rejecting thedocument and sending an email
back to the sender if somethingis wrong with it.
Andreas Welsch (02:04):
Awesome.
That sounds really exciting.
And I remember, I've seen youalready a couple of years ago
when you were just starting outand it was super impressive to
see what you guys are building.
So also thank you for sponsoringtoday's episode and bringing
this information to many morepeople.
I'm excited to have you on theshow, like you said.
Should we play a little game tokick things off?
Petr Baudis (02:27):
All right.
Andreas Welsch (02:28):
Okay, so let's
see.
This one is called In Your OwnWords.
And when I hit the buzzer, thewheels will start spinning.
When they stop, you'll see asentence and I'd like you to
answer with the first thing thatcomes to mind and why.
In your own words.
To make it a little moreinteresting, you'll only have 60
seconds for your answer.
Are you ready for What's theBUZZ?
Petr Baudis (02:49):
Let's go.
Andreas Welsch (02:50):
Okay, so here we
go.
And if you follow us live,please put the answer in the
chat, too.
If AI were a movie, what wouldit be?
60 seconds on the clock.
Petr Baudis (03:01):
Alright, a movie.
Honestly, the first movie thatcame to my mind is Terminator,
but that's so cliche and I don'treally believe in it.
But I think Shrek is a greatanalogy actually.
Because the first scene thatcomes to mind from Shrek is is
the scene where one of thecharacters keeps asking, Are we
there yet?
Are we there yet?
And the other one says (03:22):
No, stop
asking, please.
But on the other hand, theystill keep making progress to
their destination.
So I think AI is a lot likethis.
It's been overhyped a lot.
And now I think many people inthe audience have to be asking,
are we there yet, finally?
And I'm going to say no, notquite yet.
There is still a lot of hype inthe field, but there's also a
(03:43):
lot of amazing progress and itjust needs a bit of a discerning
person to just pick up what arethe right pieces there.
Andreas Welsch (03:50):
Wonderful.
Thank you.
See, I was going to ask, are wethere yet?
But, I'm excited for ourconversation and seeing where
that leads us.
Especially in the realm ofprocessing your documents with
LLMs.
So thank you for the awesomeanswer.
Now, I've been seeing so muchabout digital transformation and
digitization for pretty much thelast 10 years.
(04:12):
And still it baffles me thateven the largest organizations
rely on paper based processes.
I was working with a number ofFortune 500s a couple of years
ago, and it just blew my mind.
I thought everything isautomated, everything is
digital, everything doesn't evencome in a PDF or in some kind of
a scanned document.
But purchase orders, invoices,bills of lading, you name it.
(04:32):
And now I feel that the goodthing is that there's been even
more automation over the yearswith machine learning, for
example, and other techniquesand OCR.
But now it comes down to the oldquestion, right?
How much does this actually costme if I want to extract
information from my documents?
And how much can I save?
So I remember doing somecalculations many years ago.
(04:52):
Time spent per documentmultiplied by the number of
documents per year.
This is what your cost is.
This is what your spend is.
If you do this with and with thesoftware solution, but it's
rarely that simple, right?
So I'm curious, what are youseeing?
If you're a leader, you'rethinking about how can I
automate my document processing?
Where should you start?
(05:13):
What's the calculation youshould be doing?
Petr Baudis (05:16):
I think it's most
important to think about what do
you actually want to achieve andtry to phrase that in a bit of
more concrete terms.
So from my perspective my advicewould be to start with your use
case.
And as the machine learningpractitioners, they always think
about the input and output,right?
(05:36):
So yeah, I want to get to putsome input in and this is the
kind of output I expect.
And pretty often I actually hearpeople that don't have a clear
idea who they want to get outfrom the system, right?
We just want to get some extrainsights, et cetera.
And that might be fine for somecases, but if you are really
optimizing some businessprocess, you need to think,
(05:57):
okay, what's my input?
What's the type of the data thatI'm processing?
Is it gonna be invoices or is itgoing to be, for example,
construction charts?
You will want a very differentsystem with pretty different
cost and ROI equations, alsodepending on what the use case
is.
Another thing is, what's thevolume of my data?
(06:18):
You might have hundreds ofdocuments per month, super
complex, long documents withcomplex tables, maybe logistics
or material analysing, or usecases like this.
Or you might be talking aboutmillions of purchase orders or
invoices a month.
And again, then the costequation will be different.
(06:38):
You will be in the first case,maybe you have just one or two
people processing this.
In the other case, you probablyhave a big team of people
processing it.
But maybe they are all in thesame place, or maybe each of
them is sitting in a differentoffice, working in a different
region, maybe entering theinformation to a different
downstream system.
That's one part of it, is thisuse case, what's the type of
(07:01):
data, the volume of data, whatdo you want to get out?
Do you want to get an SAPbusiness record out or something
else maybe?
Then, I think it's good to thinka little wider.
And think, okay, what's youractual business objective?
And by that, I mean that okay,you want to capture some data
from some documents maybe, orintelligence, the process
(07:23):
documents in general, butprobably there is an overall
business process that you wantto automate.
And then it's worth startingfrom the big picture, from this
whole business process.
Is it an order to cache orsomething like that?
Then you want to think about,okay, what's the biggest piece
of it I actually can automate?
(07:43):
Because maybe you can automateway more than just the document
data capture.
The more you automate, thelonger part of the process that
you automate, usually that meansthat you are gonna use some
pretty integrated platform torun this process.
The more integration there is,the better.
The tighter product you use thatcovers a bigger piece of the
(08:04):
process, the more benefits youusually get.
Because you don't need to switchbetween systems, you need to
think less about integrations,and also if those products, if
they advertise automation, themore signals, the more feedback
from user usage they get, thehigher automation they can
achieve.
And then the last piece is, doyou want everything at once, or
(08:29):
are you willing to think aboutsome sort of roadmap?
Maybe at the beginning, I don'tneed full automation.
I just want to save 70 percentof my time or time of my team,
but I'm still okay with peoplechecking those documents at the
beginning.
Maybe at the beginning, I don'tI don't need a fully fine tuned
integration.
I am okay with some extra manualsteps during the integration and
(08:52):
until I work out all thedetails, right?
If you are willing to think in abit of an agile way like this,
you probably can get a betterROI and also de risk your
project, because already you canstart getting some benefits
early and meanwhile, of course,you still should aim high from
my perspective.
Andreas Welsch (09:11):
No that's a
great way to, to frame it,
right?
Thinking in, first of all,business terms, thinking
manageable chunks of what youwant to go after.
And maybe if I can pick up thequestion from Crispin here in
the chat.
And I think it's great with yourtechnical background, right?
I remember when I worked withlarge organizations, again, six,
seven years ago, we would askthem for tens of thousands of
(09:34):
documents and they had to beannotated and where's the
bounding box and what is thisfield?
And lots of additional data orground truth to test how does
our model actually work.
And it was a big leap to go fromtext to tables and to PDFs and
tables in PDFs.
So, Crispin here is asking whathappens when pre processing
(09:54):
fails on a PDF or on aPowerPoint, or when it's complex
with tables, images, orpotentially videos, how do you
handle that, when it comes todocument processing and working
with AI on that unstructureddata?
Petr Baudis (10:08):
Sure.
First, you want to design yourprocess and use appropriate
tooling so that you don't losethose documents altogether,
right?
At the worst case, you want todo the same thing as you would
do if you were just doing thismanually.
And maybe, yeah, sometimes youjust get a PDF that you just
cannot open on your computer atall, and then what you do?
(10:28):
You just write back to thesender this isn't working for
me, you need to resend it, Icannot process it, something
like that, right?
And then you just figure it out.
Actually, if you are thinkingabout an IDP solution like ROSUM
that has this end to endintegration this platform can do
things like this automaticallyon your behalf if it cannot open
those documents and then youneed to think about the
(10:52):
quantitative aspect Is it justone document in 10,000?
Is it just gonna happen to meonce a week that something like
this happens?
In that case, just an escalationto a human person can be totally
fine, right?
If it's just a small fraction.
What you want is a system or aplatform that has this built in
(11:14):
and that has a user friendlyprocess for it, so that it
generates minimum extra hassleto have a human take a look at
it and process it manually ifneed be.
If it's just a small fraction ofthose documents, then that can
be totally fine answer andprobably much cheaper than
trying to figure out, okay, howdo I tweak the process so that I
get 100 percent literalautomation.
Andreas Welsch (11:35):
I love that
answer.
Very practical.
Now, we've been talking aboutthis for quite a bit before we
set up today's episode andtopic.
And, obviously anybody followingthe tech field for the last two
years, there's no way aroundLLMs.
There's no way around GenerativeAI.
And there's an abundance ofthese models in the market.
(11:57):
And it feels every week there'ssome kind of enhancement or some
new one coming out.
Vendors are a outbidding themthemselves in the capabilities
that they ship, that they lookinto, that they deliver.
How would you say should AIleaders use LLMs and, bring them
to the business?
How should they make thetangible?
Petr Baudis (12:18):
So first, I think
each AI leader should have a lot
of personal experience.
Just using LLMs day-to-day toget a good feel what are their
strong points and what are theirweak points because there is
plenty of both, right?
If you are an AI leader orinnovation leader or even just
upper management in anenterprise company and you don't
(12:43):
use LLMs, at least sometimeseven for just personal purposes
to help you help you writepersonal documents or whatever.
Doesn't need to be in thecompliance of enterprise
context, right?
But you should try using thoseLLMs and just see what comes
out.
(13:03):
You should be willing toexperiment with them a little.
And I think that's superimportant to develop this
intuition about what's thecurrent state of AI, because in
some cases it will really amazeyou and blow your mind.
In other cases it will make yourhead ache because it will
mislead you with somehallucinations.
(13:23):
That's all part of the story,right?
But but in order to get valuefrom those LLMs, you need to
understand their weaknesses aswell as their strengths.
So that's one part of this.
Then when it comes to actuallyusing LLMs in real world
business processes within yourcompany, then one thing that's
(13:44):
the most important one todistinguish is distinguish
between a fancy demo and sometool that you can use reliably
and that can reliably process 98percent or 99.
9 percent of your documents,right?
Of course, if you copy paste apurchase order to ChatGPT and
(14:06):
you start talking about it, itwill already pick up a lot of
the information, etc.
But then, if it's a purchaseorder with three pages of line
items, and you start askingabout about specific columns, it
might not do so well anymore,right?
It's about this reliabilityaspect that needs to be really
thought out, and you want tofind find a platform that has a
(14:29):
really good story around how canthey achieve this reliability.
Then another thing there isregarding adaptability, because
the thing is, unless you arejust a small company using
totally standardized processaround some existing off the
shelf accounting software, etc.,you probably have some pretty
(14:51):
specific process.
And I think this is superunderrated aspect of LLMs.
How do you make them adapt toyour concrete process?
Of course, you can prompt themin a custom way, etc.
But that's actually superlaborious.
If you have a bit of anelaborate process, you are maybe
a little a little biggercompany, and you actually want a
system that just watches overyour shoulder as you execute the
(15:14):
process yourself and learns fromit on the fly.
Again, an example of what we doat ROSSUM is we have this
instant learning system whereyou just put in a document, you
mark something on the document,whatever you need.
Of course, if it's just like aninvoice, it will already pick up
a lot of things on its own.
But then, the second documentyou send in even if you put it
(15:34):
in an email, something totallyspecific with a very high chance
it's already picking up thisthis information from this
document just based on what youshowed me.
You don't want a system that youmanually set up where you do
some AI engine management andrun some trainings manually or
which you need to prompt upexplicitly.
You want something that justadapts on the fly.
(15:55):
And those solutions are on themarket.
You really want that.
And ultimately though, there arerisks and there are weaknesses.
So you want a system thathandles those well.
You need a system that has agood answer for what do you do
in case of hallucinations?
How do you defend against promptinjection, etc., etc..
(16:15):
It's pretty easy to find outthose risks.
We just Google around and again,pick a vendor that has a solid
solution.
ChatGPT would have the sameconnector.
Probably will not be the answer.
Andreas Welsch (16:28):
Several great
points that you made, and they
resonate deeply with me.
One, I remember again a coupleyears ago when I was working
with large organizations onthings like document processing,
we asked for sample data, we gothundreds of invoices, or
hundreds of PDFs, and it wasn'talways one invoice in one
document.
(16:48):
Sometimes it was two or three orfour in one document that a
customer's vendor has sent them.
And, I think they're splittingthe document, finding out where
what belongs to one document,what's the next one, what's on
page one, what's on page threewas a very complex task for a
software system, even in machinelearning or now LLM based system
(17:11):
to figure out.
So yes, there are standard offthe shelf things that you can
use for free or for 20 per userper month.
But I think you make anexcellent point that as with,
standards offering in general,if.
There's somebody that hasalready thought this through for
you.
There's a big benefit and a bigvalue, to, do that.
Which kind of leads me to thatnext point of enterprise
(17:34):
organizations or IT landscapesare heterogeneous and vendors
would love it to be wall to wallone company especially the
bigger that their footprint is,but the reality is that there's
an HR system, there's a customerexperience system, there's a
finance system, there's an HRsystem, supply chain system, and
so on.
And usually they're from manydifferent vendors.
(17:57):
Makes sense, right?
With the cloud, you had a lot ofthe individual buying centers
going and buying the best ofbreed solutions there.
They are pretty much left withtwo choices, right?
Either you use each individualvendor's built-in OCR or
document processing features,and they will probably deliver
different results.
Or you quickly need to integrateyour own solution with all of
(18:18):
these individual systems.
And they can very well mean alot more cost, a lot more time
that you need to spend than youinitially thought.
And I'm curious, what are youseeing when it comes to
integration?
What do some of your customersdo, for example?
Petr Baudis (18:32):
Yeah, the story can
be even wilder.
Because a lot of those biggestenterprises, how they grow is
essentially mainly throughacquisition.
And when they do an acquisitionof another company that
company's SAP instance cansurvive for many years within
the company.
So just like each business unitcan have their own Oracle and of
(18:55):
course there is plenty of ITdepartments just like working on
integrating it and ideallynormalizing everything.
But those projects usually lookslike so much that if you have a
shared service center within alarge enterprise, it usually
just has to deal with an insaneamount of systems.
(19:17):
And from our perspective atROSSUM, we call this the
multinational enterprise problemor even the many national
enterprise problem because ifyou look at an average Fortune
500 enterprise, they operate in70 plus countries and they have
their supply chain in differentcountries than where they have
their consumers, right?
So you need a system that candeal with a wide variety of
(19:41):
regions and file formats thatcome with those regions,
different standards.
If you, for example, throw ininvoicing into the mix, that's
really totally wild because eachcountry does it totally
differently, even within withinareas like EU, right?
And then, of course, plenty ofdifferent downstream ERP
systems.
(20:02):
You want to standardize on akind of shared piece of
infrastructure that can handleall those situations and connect
to all those different systems.
And the big advantage ofstandardizing like this is you
can manage your IT costs muchbetter.
And, for me, that's kind of partof this recent story, which is
(20:26):
really important for me.
That's pushing back against thestatus quo of working around a
legacy IT.
Because it feels to me that thework of the ERP systems is
switched into a very different,it's a much more advanced gear,
much slower gear in terms ofchange and adaptation and
adopting best practices than theworld of innovative pieces of
(20:47):
enterprise software like AIbased IDP solutions.
So at ROSSUM we actually gavethis a lot of thought, and very
recently we had a big launcharound formula fields for
admins, which is essentially away to customize your business
transformations without knowingprogramming and with using LLMs
(21:09):
with just text based input.
It is a kind of prettyimpressive ability, at least our
customers say so when we show itto them and when they use it to
onboard their cases.
So from this perspective,eventually you want also to
think about gradual rollout.
(21:29):
So when you think about a sharedpiece of infrastructure for your
many national enterprise, it'salways best to just pilot it in
a single country, then add extrathree and then go global so that
you really gain this trustbecause it's not worth the risk
to just jump directly into aglobal rollout.
Andreas Welsch (21:48):
I'm curious,
we've talked quite a bit about
extracting information fromdocuments, doing that with
machine learning and OCR, nowLLMs.
Certainly, LLMs have been at theforefront of news and everything
for quite some time.
And a lot of times I see thisbeing about text.
And obviously with invoices,sales documents, bills of
(22:11):
lading, pretty much any documentwhere we do have a lot of text.
What are some of the keybenefits you would say that
adding LLMs to a documentprocessing workflow or solution
has brought in your case?
What hasn't been possible beforethat's possible now?
Petr Baudis (22:27):
From my
perspective, when we have our
POCs with our clients, etc.,what didn't change is we usually
were the top solution in termsof the accuracy and automation
potential before.
We still are but but it justenables us to keep increasing
the accuracy without increasingthe machine learning model
(22:47):
complexity.
So one big thing is largelanguage models are a bit of a
unifying factor that lets youjust put a lot of the complexity
to this large language model.
So our tLLM, the TransactionalLarge Language Model that we
developed here in ROSSUM, itsubstitutes about 10 different
machine learning models that wehave used before.
(23:08):
So the platform gets muchsimpler, and that lets us focus
into just increasing the rawaccuracy, the automation, the
speed of learning of the system,the adaptation process, without
getting distracted by all thecomplexity and making sure that
we support OCR in this languageand that language, etc..
We can just all solve this in asingle place.
So I think that's a hugeadvantage.
(23:31):
And that also gives those LLMbased systems a lot more
versatility than has beentraditional for IDP systems
before.
But I think the other aspect isreally, I think LLMs in the next
five years, there'll be a hugegame changer in how enterprise
IT infrastructure getsintegrated to it.
(23:52):
Because you will empower nonprogrammers a lot more.
So the process domain experts,etc., they will be able to do a
lot more without relying ondevelopers.
And that will speed up thechange in the enterprise IT
ecosystem tremendously.
Andreas Welsch (24:07):
I love that.
And I'm fully with you thatthere's so much more change
coming in many good ways—morethan we can even comprehend
right now.
And just seeing what is alreadypossible today and where things
are going, to me, it's superexciting.
Because then you know how shouldyou plan your vision.
How should you think about thesethings before they become
(24:28):
reality, so you're ready whenthey do?
But also if I look at my Twitterfeed, for example, AGI,
Artificial General Intelligence,is near.
It's almost here, especially ifyou believe the ones of OpenAI
and several other AI labs.
What are you seeing?
What's your perspective on this?
Are we going to have AGI nextyear?
Petr Baudis (24:51):
Probably not next
year, even though it's always
difficult to predict, right?
The most difficult thing thereis defining what AGI is and
agreeing with each other onthat, because we just keep
shifting goalposts.
And if we showed someone from adecade ago what current ChatGPT
can do, they would totally say,that's totally AGI.
We see all the weaknesses still,so we think, okay, we need to go
(25:12):
a little further, right?
So the goalposts keep shifting abit.
I think it's going to take alittle more but I am more in the
camp of people who are veryoptimistic about the speed of AI
development overall and I thinkAGI is more of this decade thing
yet and not the next decade butI also think that AGI cannot be
overrated too much because atthe very beginning it's going to
(25:34):
be very resource intensive andit's not that once we get AGI
with a snap of the finger, theworld will totally change.
I think the world is a lotslower to change than people,
especially the deep tech peoplein AI think.
And still, even if there were nonew big inventions beyond the
current GPT-4 and Claude Sonnet,etc., in the next 10 years, it
(26:00):
would have plenty of worldchanging potential.
So I think it's gonna still takesome time, but I'm pretty
optimistic about AGI overall,and from my perspective.
The angle then is thinking againin your in your kind of
enterprise IT ecosystem story.
How do you pick solutions thatcan leverage this, that where
(26:24):
AGI can be plugged, alsosolutions that can be used by
AGI engines from the future,etc..
Because it's not that AGI wouldobsolete all the internet
infrastructure and all theenterprise infrastructure,
right?
AGI will not obsolete it.
Even if you have the wholecompany built around AGI agents,
it will still need SAP to keeptheir records.
(26:46):
And it might still want to use asolution like ROSSUM rather than
just doing everythingthemselves, because it's going
to be a lot more robust and it'sgoing to be a lot cheaper.
The intelligence will still costa lot more.
The generalized one will cost alot more than a specialized one.
Andreas Welsch (27:02):
Awesome.
Look, I see we're getting closeto the end of the show, and I
was wondering if you cansummarize the key three
takeaways for our audiencetoday.
Again, thanks for sponsoringtoday's episode, and I'm super
excited to see what you'realready doing today with large
language models, and what thatpath is as we increase autonomy
and automation.
So maybe just the three keytakeaways.
Before we wrap up.
Petr Baudis (27:22):
Okay, three key
takeaways.
We ended up the last question onthe little Sci-Fi note, but
let's go back to the reality.
Let's go back to the today,right?
I think when you are on themarket looking at LLM tools that
can help you, I would say lookat a tool that has a great
reliability story, so that itcannot be thwarted by just low
(27:48):
inconsistent performance orprompt injections and
hallucinations, etc.
Look for a tool that has greatadaptability stories so that it
really adapts on the fly to yourspecific requirements without
you spending very longonboarding times or huge amount
of IT resources on it and lookfor a tool that that can be
(28:09):
integrated easily.
It has a good story around, yeahhow you can spend minimum
possible IT resources to connectit to all your systems that you
need.
Because you want a tool that'send to end, that doesn't just
you don't actually want a tool,you want a platform where the
documents live, which connectsto all the systems and solves
your business process end toend.
Andreas Welsch (28:31):
Wonderful.
Thank you so much, Petr.
Thank you for joining us todayand for sharing your experience
with us.
I really appreciate it.
Petr Baudis (28:38):
Thank you for
having me.
Andreas Welsch (28:39):
And thanks for
those of you in the audience for
joining us.