Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Jerod (00:04):
Welcome to the Practical
AI Podcast, where we break down
the real world applications ofartificial intelligence and how
it's shaping the way we live,work, and create. Our goal is to
help make AI technologypractical, productive, and
accessible to everyone. Whetheryou're a developer, business
leader, or just curious aboutthe tech behind the buzz, you're
(00:24):
in the right place. Be sure toconnect with us on LinkedIn, X,
or Blue Sky to stay up to datewith episode drops, behind the
scenes content, and AI insights.You can learn more at
practicalai.fm.
Now onto the show.
Sponsor (00:39):
Well, friends, when
you're building and shipping AI
products at scale, there's oneconstant, complexity. Yes.
You're bringing the models, datapipelines, deployment
infrastructure, and then someonesays, let's turn this into a
business. Cue the chaos. That'swhere Shopify steps in whether
you're spinning up a storefrontfor your AI powered app or
(01:00):
launching a brand around thetools you built.
Shopify is the commerce platformtrusted by millions of
businesses and 10% of all USecommerce from names like
Mattel, Gymshark, to foundersjust like you. With literally
hundreds of ready to usetemplates, powerful built in
marketing tools, and AI thatwrites product descriptions for
(01:21):
you, headlines, even polishesyour product photography.
Shopify doesn't just get youselling, it makes you look good
doing it. And we love it. We useit here at Changelog.
Check us outmerch.changelog.com. That's our
store front, and it handles theheavy lifting too. Payments,
inventory, returns, shipping,even global logistics. It's like
(01:42):
having an ops team built intoyour stack to help you sell. So
if you're ready to sell, you areready for Shopify.
Sign up now for your one dollarper month trial and start
selling today atshopify.com/practicalai. Again,
that is shopify.com/practicalai.
Daniel (02:17):
Welcome to another
episode of the Practical AI
Podcast. This is DanielWitenack. I am CEO at Prediction
Guard, and I'm joined as alwaysby my cohost, Chris Benson, who
is a principal AI researchengineer at Lockheed Martin. How
are doing, Chris?
Chris (02:32):
Doing great today,
Daniel. Lots of as always, lots
of AI and autonomy to talkabout. And you know what? We
have way more to talk about aswell.
Daniel (02:42):
We we have way more to
talk about. Yeah. Speaking of
way more, we're very excited towelcome back Drago Engelov, who
is the vice president and headof the AI foundations team at
Waymo. Welcome, Drago.
Drago (02:56):
Thank you, guys. It's
great to be back after five
years or so, right?
Daniel (03:00):
After five years, yeah.
We were commenting before we
started the recording that thelast episode with Drago was on
09/01/2020. So that was episode103. So a few things have
changed in the world generally,but certainly in relation to AI.
(03:22):
I'm wondering if you could maybejust catch us up at a high
level, Drago, in terms ofdriverless cars, autonomous
vehicles.
How do you see the worlddifferently now than you did in
2020?
Drago (03:37):
One thing I would say is
in October 2020, we opened our
Waymo One service in PhoenixEast Valley to everybody, so
just one month after we talked.But since then, we have launched
and scaled quite dramatically innow five major metros. Is San
(04:01):
Francisco, Los Angeles, Phoenix,Atlanta, and Austin. We are also
serving hundreds of thousands ofrides a week to paying
customers. We are expanding.
We announced expansion to atleast half a dozen or no more
cities that will be going onthrough next year, and we may
(04:24):
announce yet more. In the citieswe were at, we continue
reporting the safety performanceof our autonomous driver, and we
are over 100,000,000 autonomousmiles driven on the road at this
point, so it's fairlystatistically significant. In
(04:46):
those miles, our safety study atclose to 100,000,000 miles
showed that we are five timesless likely to get into
accidents with critical injuriesand over 10 times, I think 12
potentially, less likely to getinto collisions or injure
pedestrians, so that has beenhappening. And we are on to
(05:10):
doing more and more right now. Ithink we work on improving the
driver further.
We have a sixth generationvehicle coming up. We have
started partnering withdifferent companies. For
example, we're partnering withUber in Austin, in Atlanta, so
our vehicles show up on theirapp in those cities. We have
(05:35):
partnered actually with Lyft inNashville, if I believe, and we
partnered with DoorDash toexplore delivery, so we're
exploring and expanding thescope and the partnerships that
we are doing as well, but Ithink in 'twenty five, I would
say a lot more people have hadand continue having the
(05:57):
opportunity to try Waymo. I'mquite a convert myself.
To me, probably the moment, thebigger moment was in 'twenty two
when I got riding in SanFrancisco by myself fully
autonomously, and so since thenit took some time for more
people to get exposed, but now Ithink the phenomenon is out
(06:18):
there. I think also theautonomous vehicle industry went
through cycles. There wascertainly around 2223 time of
pessimism in autonomousvehicles, but I think through
our success, through generativeAI, and I think there's other
companies now, it's again a verylively space. There are others
(06:41):
that are also trying to pushwhat's possible with autonomous
driving and robotics, so it'sagain a very, very happening
place. Yeah, we are contributingprobably.
I would like to thank the mostadvanced version of an embodied
physical AI today that you cando without.
Chris (06:58):
That's fantastic. I got
to say, as a native Atlantan,
I'm so happy that you you guysare in in my city. And we're a
very, very car centric city aswell. You know, you you really
have to have a vehicle to getaround. And I noticed, you know,
as you were naming the citiesthat you guys are in, that
tended to be the case in termsof Variado.
Does that play into any of theway that you guys think about
(07:21):
testing in terms of being, youknow, like Atlanta traffic for
its size is notoriously bad? AndI would love to see ever more
Waymo's and and other autonomousvehicles here because I am
terrified of all the driversaround me with our daily collage
of of traffic accidents andstuff like that. So I keep
(07:41):
telling everyone just wait.Autonomous vehicles are coming.
I'm kinda curious how you pickthese different testing cities,
that you guys engage in and whatare some of the things that
you're testing for that maybethose locations are
particularly, apt for helpingout on.
Drago (07:58):
It's a bit of a
combination of both technical
and business reasons. I thinkwe're trying to do large metros
where autonomous vehicles can bea big market and help a lot of
people, so that's one. Also,we've intentionally been growing
our ODD, so to speak,operational design domain. Our
first service, Waimo Wan, inPhoenix East Valley, Chandler,
(08:21):
that's maybe a bit suburban withup to forty five hour arterial
roads. We learned to master itand then went to San Francisco
that is dense urban with fog andsome rain and hills and windy
roads and narrow roads and tonsof pedestrians downtown, so we
(08:42):
dealt with that.
Then we started expanding. Ithink some of this is Atlanta is
a big city, also differentstate. There are some
differences across the variousstates, how people instrument
the roadway and how peopledrive, right? So we're spreading
geographically more and more. Ithink also we're spreading to
(09:02):
other domains.
A few that are really top ofmind is highways. We have been
working on highways for a longtime. We've gotten to a certain
point with highways. Generally,to have a good taxi service, you
need highways, right? It turnsout that's a very fascinating,
interesting problem.
They're difficult becausewhenever you move at high enough
(09:24):
speeds, like 65 miles an hour orso, the consequences of any
mistake are really high, andmany things can happen. So it
pushes your robustness andsafety capability there. We've
been doing highways, but onething I did do is I rode. Now we
can give highway rides toemployees, and I rode one to
(09:48):
Milbray Station to get to theairport, and it's fantastic. I
hope to be able to bring it inthe future to more and more
people.
I think that will make theservice a lot more useful. Also,
we announced that we will drivein other cities that have snow,
so potentially even in '26,right? Our sixth generation
(10:10):
platform is designed after theJaguar. It's a Zika vehicle and
Geely Zika. It's Zika, and thatZika is designed with a hardware
suite to be able to handle snow.
We are also heading out to othercountries, so we announced that
(10:31):
we intend to launch driverlesscapabilities in London next
year, and London is a left sidedriving city, and so is Tokyo
where we currently have vehiclesand we're testing, right? You
can see we're trying to coverlittle by little the operational
design domain of most largemetros with all of their
(10:52):
properties. We're, of course,also in Texas. That's its own
unique state, but we startedwith more southern states, large
metros, so you don't have toworry about snow at least. You
want to tackle these challengesin some order, not just try to
do everything at the same time.
It's very difficult to validateyour ability to do well in
everything all at the same time,right? We're just taking we
(11:16):
we're kind of mixing what iswhat makes business sense with
actually expanding thecapabilities to become truly
global driver.
Daniel (11:23):
And you started you you
mentioned the driver, the car.
I'm wondering if for those outthere, those listening, which
this is kind of maybe hard to dojust from an audio standpoint,
but if you kind of imagine thedriverless car as a system in
2025, how would you kind ofdescribe that architecture or
(11:45):
that system? What are the kindof main components? Like I
imagine, you know, sensors, theactual car, the compute, like
how, what does that system looklike in 2025? Just at a high
level?
Then of course, I'm sure we'llget into some of the modeling
things and foundation models andall of those things, but
Drago (12:04):
I mean, the car is
Ultimately, it's a robot on
wheels, right? The maindistinguishing capabilities are
that it has a set of sensors, inour case, camera, LIDAR, radar
and microphones. Microphones arequite helpful for many things,
including listening to sirensand occasional instructions.
(12:24):
Then you have compute on thecar. It's a nontrivial amount of
compute.
It's more than you can put on aphone, and all our vehicles are
electric. That was an explicitchoice of the company. I'm
personally quite proud of thischoice. I think that's good for
the environment to actually havesuch cars and I think can
(12:45):
accelerate, I think, transitionto more electric vehicles, which
I think is good personally. Sothey have this robot on wheels
with computer and sensors, andthen you have actuators, right?
Then there is a lot of systemdesign engineering to make sure
steering and brakes and allthese things. They need
redundancy and robustness tomake sure that if any system
(13:06):
goes wrong or we need to thinkalso if compute parts of it can
go down, that you havecontingencies, so it needs to be
designed with redundancy. Youneed to think of what if
steering wheel column, there canbe also issues with steering.
What is the redundancy? For forautonomous vehicles, they need
(13:26):
to think additionally and buildthese things into the hardware.
It's a robot designed for safetransportation from the ground
up, even though we're builtusing. We're just extending
existing platform, and we workwith the various automakers to
do this.
Chris (13:42):
As you're doing this and
you guys have progressed over
these five years since we lasttalked, one of the challenges is
probably not every person outthere is is a Chris or a Daniel
who's who's very invested inthis kind of technology, you
know, going forward. You have alot of people out there here in
the South. We joke that, youknow, every every other driver
(14:05):
thinks they're a NASCAR driverand stuff. And and that that
notion of control and safety andand, you know, the general
population may not have as muchconfidence in some of these
technologies because they're notfollowing it closely and living
it the way you do all the time.How do you how do you approach
that?
And how has that changed overthese last five years since we
(14:27):
talked to you last in terms ofgetting buy in from the public
and getting them feeling, youknow, like it's as you talk
about the the safety statistics,are amazing, but getting them to
really feel that deep down, youknow, inside that they can that
they know they can trust, andbelieve in this mode and that it
is in fact much, much safer thanwhat they are typically doing on
(14:49):
a day to day basis.
Drago (14:50):
So there is I know people
do not feel statistics. It's
hard, right? Because they're aproduct of many, many rides.
You're doing 10 or even a 100safely is not enough. I think
what people feel is when theyget into the vehicles, and this
worked for me my moment eventhough I went before, and also
my wife and friends of mine,people get comfortable really,
(15:12):
really fast.
You need to pass a certain barwhere they feel, Okay, this
thing actually is a really,really good driver. My
mother-in-law sat in it just afew weeks ago for the first time
and she rolled around. She'slike, This car drives much
better than me, right? Once shethinks this way, she's
immediately at ease, I think. Ithink people relax after the
(15:33):
first several minutes are veryexciting.
Then they relax and enjoy theexperience and mind whatever
they like to mind, either theenvironment or their phone or
other things. People get reallyused to it if you cross this
threshold of, Can I trust you? Ithink your driving immediately
shows this. Now, us in theindustry also understand that
(15:57):
coming back to statistics, youneed to back it up. With regards
to backing up, Waymo, right, webelieve in transparency and
we're quite open with theincidents that happened.
We file the details and we alsotrack the statistics and do our
best estimate. We have a greatsafety team. They publish these
(16:19):
reports in them. We evaluate andtry to estimate how are we doing
compared to a fleet of humantaxi drivers or human drivers
driving in our area that we arehandling, and this is both by
us, but also there are studiesdone by insurance companies who,
of course, want to quantify thisvery well, and so there's a
(16:41):
Swiss REST study also provingour numbers. They also believe
we significantly can decreaseclaims of different kinds for
injuries, for accidents, and soon as well.
That's another externalvalidation for the kind of thing
we provide. That's what I wouldsay to people. Now, it's a
process. You need to work withthe local communities. You need
(17:02):
to work with police.
You need to work with thevarious city stewards officers.
We train a lot of people. Weengage with them. We work
overtime. I think you can seethat in the cities we have been
over time, I believe generallythe trust in us increases, and I
think that the satisfaction ofWemos by the users, if you look
(17:24):
at the apps like in the stores,so I think on the App Store, we
had a five star rating, right?
There is a bit of almost likepeople that would just use Wemos
now if they could, and that's atestament to the value that
people see in the rides, but itdrives, of course, to safety and
(17:48):
ultimately engaging thesepeople, getting them
comfortable. Often when peopleexperience this, many of them
become converts. I encouragepeople. Try it. You may be the
next convert if you have notyet.
I personally love it. I take itas much as I can It's always a
pleasure working on a product.Enjoy yourself, so I feel
(18:10):
blessed that way.
Sponsor (18:28):
Well, friends, it is
time to let go of the old way of
exploring your data. It'sholding you back. But what
exactly is the old way? Well,I'm here with Mark DePuy, co
founder and CEO of Fabi, acollaborative analytics platform
designed to help big explorerslike yourself. So, Mark, tell me
about this old way.
So the old way, Adam,
if you're a product manager or a
(18:50):
founder and you're trying to getinsights from your data, you're
wrestling with your Postgresinstance or Snowflake or your
spreadsheets. Or if you are, andyou don't maybe even have the
support of a data analyst ordata scientist to help you with
that work. Or if you are, forexample, a data scientist or
engineer or analyst, you'rewrestling with a bunch of
different tools, local JupyterNotebooks, Google Colab, or even
(19:12):
your legacy BI to try to buildthese dashboards that someone
may or may not go and look at.And in this new way that we're
building at ABBYY, we arecreating this all in one
environment where productmanagers and founders can very
quickly go and explore dataregardless of where it is. So it
can be in a spreadsheet, it canbe in Airtable, it can be in
Postgres, stuff like.
Really easy to do everythingfrom an ad hoc analysis to much
(19:35):
more advanced analysis if,again, you're more experienced.
With Python built in rightthere, in our AI assistant, you
can move very quickly throughadvanced analysis. The really
cool part is that you can gofrom ad hoc analysis and data
science to publishing these asinteractive data apps,
dashboards, or better yet,delivering insights as automated
(19:58):
workflows to meet yourstakeholders where they are in,
say, Slack or email orspreadsheets. This is something
that you're experiencing, you'rea founder or product manager
trying to get more from yourdata or for your data team today
and you're just underwater andfeel like you're wrestling with
your legacy BI tools andnotebooks, come check out the
new way and come try out Fappy.
There you go. Well,
friends, if you're trying to get
(20:20):
more insights from your data,stop resting with it. Start
exploring it the new way withFabi. Learn more and get started
for free at fabi.ai. That'sfabi.ai.
Again, fabi.ai.
Daniel (20:39):
Well, Drago, I
understand that every driverless
car company is gonna have adifferent approach to modeling
and all of those sorts ofthings. You've talked a little
bit about the hardware and thecar, but I think it would be
good for people to understand,we talk about this driver or you
mentioned the driver, peoplemight have in their mind because
(21:02):
we do talk a lot about modelsnow after the generative AI
boom, that there's this modelthat can reason and blah, blah,
blah. And so people might havethis view of like, there is a
model that drives the car. Couldyou help us really break down
like in 2025, is this a systemof models, models that do
different things, a kind ofcombination of different types
(21:24):
of models and even non AIpieces? Could you just help us
kind of generally understand howthat works?
Drago (21:34):
So when you think of the
stack, right, let's talk first
about what it needs to do. Itneeds to perceive the
environment using the sensors.It needs to build some
representation of thisenvironment. It needs to use
this representation of theenvironment to make a set of
decisions. Traditionally,autonomous vehicles are around a
(21:56):
long time.
Waymo is around over fifteenyears already, right? So it's a
rapidly developing technologyspace, but traditionally you can
think of there's thishistorically, people thought,
Okay, there are these models.There's a perception model that
builds a representation of theworld that can be useful for
certain things, and then thereis some kind of behavior
prediction and planning modulethat reasons what we could do
(22:21):
and potentially some people liketo also reason what others could
do to cross reference ourbehavior with the other folks,
and then based on all thisinformation, eventually select
promising decisions. That's whatthe stack normally does. Now,
there's different ways toimplement it.
Generally, the trend has been tohave few, and in some cases
(22:45):
people claim they have one,large models, AI models on the
car, and you could say ML or AI.For a while it was called ML.
When the models became bigenough, people called it AI,
right? So you have these largeAI models on the car. A few are
one depending on the variouscompanies, and they're connected
(23:06):
in certain ways.
You can train them end to end ornot. That's also an option.
Different companies can choose.The two are orthogonal concepts.
Whether you have modules andwhether you can train them end
to end is different concepts, soit can be structured end to end.
Essentially have models enteringend to end. These are two, and
so different companies on thisvery coarse taxonomy fall
(23:30):
somewhere in this bucket, right?Think Waymo always has used AI
or ML since I've been there andit's been the backbone of our
tech. I think over time, ourmodels have streamlined and
become fewer and fewer. I cansay that.
I think off board, what my teamdoes is build these large
(23:51):
foundation models for Waymo thatare not limited by how much
computer latency constraints youhave, and they can be quite
helpful to essentially curatedata or teach the models that
actually run on the car in thesimulator. We can get to
simulators later. We haveexperience with most aspects of
(24:12):
these options, whether it's endto end and whether it's
structured or not. I think offboard, I can definitely tell you
we've explored a lot with largevision language models. That's
one of the latest technologiesthat's relevant to us.
I think in the field ofrobotics, people talk also about
vision language action modelsbecause you can tie in one model
(24:33):
both understanding vision andlanguage inputs and potentially
ask for certain actions asoutputs, right, which is
ultimately what the robot needsto generate. That's an exciting
area that has developed in'twenty five. I think in our
model, Waymo Foundation model,we combine benefits of these
(24:56):
vision language models, but alsocombine it with some bespoke
Waymo architecture innovations.Think in areas such as fusing
these new modalities that visionlanguage models typically are
not trained on, like LiDAR andradar is one. Another one is
modeling the evolution, futurepotential evolutions of the
(25:17):
world.
There is some interesting Waymotechnology on how to do this
well that we also use, but wefuse all of this and VLM
technology, world knowledge fromalso other bases, whether it's a
world model or a visual languagemodel, into something that then
is able to do well on autonomousdriving tasks. That's off board.
(25:40):
On board, we don't typicallytalk exactly what is there, but
I think we're trying to getstate of the art, the best
architectures that we believesolve the problem and put them
together on the card. I thinkit's a really, really high bar
to have a model perform in allthe conditions and all the
(26:03):
situations we need it to, right?We also have some notion of, as
you know, VLMs also have thisweakness of hallucination, so we
have the safety harness aroundthem to prevent hallucination to
double check what they arepredicting, right?
We also have that aspect in ourstaff as well, which we have
(26:23):
worked on historically. That'swhat I can say on a high level.
I hope that's not too scattered.Maybe you guys, if you want
anything specific, we candiscuss that in a little bit
more detail.
Chris (26:35):
I do have a follow-up to
that. Recognizing that you're
not able to get into thespecifics of how of this of the
architectural decisions andmodel decisions that Waymo is is
engaged in. If you couldabstract it a little bit and
maybe just talk about the spacea little bit, you know. I'm
curious as you talk about worldmodels, you know, and having
(26:57):
representation of theenvironment that brings in not
only AI but the notion ofsimulation as as, you know, one
of the tools in the tool chestif you will. I suspect, like, we
have a lot of listeners thatthat are hearing lots of
different AI use cases ingeneral, but may not have as
much expertise and autonomy.
(27:18):
And so as you talk about thatthat notion of of representation
of that environment, Could youtalk a little bit about, like,
what that problem looks like andwhat are different things that
you might think of to solve itwithout having to get into how
you guys have done it, but justkind of like, what is that what
is that juxtaposition ofsimulation, AI, and
(27:39):
representation of the world inthe environment around you look
like?
Drago (27:43):
So maybe I mean,
simulation, if we're gonna go
there, maybe I can justjuxtapose two things there. I
like saying this historically.I've been doing this for a
while. There are two mainproblems in autonomy. One is to
build this onboard driver andanother one is to test and
validate this onboard driver,and both are really, really hard
(28:05):
problems, and people usuallytalk about the first one.
But I think imagine there issome collection of models and
you need to prove that it's safeenough to put them out in the
real world. That's in itself areally challenging problem,
arguably no simpler than puttingthe first model together, and
(28:26):
that one, ultimately because youneed to be a bit more
exhaustive, potentially takeseven longer time to build the
full recipe to validate thingsproperly, right? So these are
the two problems. Now, inautonomy, what is different
maybe than the standard AImodels is there's a few things.
(28:48):
One is ultimately output actionsthat commands to a robot that
are a different type of datathan traditionally, say, text
and images.
I think that's one. Another oneis we operate under strict
latency constraints. You need toreact quickly. For us, what is
(29:09):
also interesting in AV is thisis probably the first serious
domain where we had to reallylearn how to interact with
humans in the same environment,so it's highly interactive multi
agent setup, right? Then we haveadditionally, if you choose to
add additional sensors andcameras, have a lot more
modalities coming in and we havea ton of data.
Essentially, the way to think ofit is imagine you get maybe
(29:34):
billions of sensor reading persecond or even tens of billions,
a lot, and you need to makedecision over design, you need
to have a context of manyseconds of these sensor inputs,
maybe a dozen cameras, half adozen LIDAR and radar, and so
you need to collect maybe fiveto ten seconds, some can argue
(29:56):
twenty, thirty of context tomake a decision, right? The
decision is fairly lowdimensional. It's like, okay,
steering or acceleration, butthe inputs are incredibly bulky,
so you need to somehow learn themapping from this extremely high
dimensional space,representational space to
(30:17):
decisions, that's very hard,right, under latency
constraints, under safetycritical constraints. That's
what makes our domaininteresting. Now, a lot of the
things that work in machinelearning in one domain transfer
to the other, right?
So yes, there is, for example,very similar scaling law
findings that if you havecutting edge architectures and
(30:42):
you do proper studies andscaling and you have a lot more
data and compute and you feed itto these architectures, and now
for every class of algorithms,there's a bit different scaling
laws, but even the simpler,imitative algorithms that people
also did in language predictnext token, we can predict next
action, right? There is thesedirect parallels. You can do
(31:02):
reinforcement learning inlanguage. We can do
reinforcement learning in oursimulator, right? These are the
parallels, but how exactlythings translate is interesting.
The ideas translate. Theimplementation is a little more
creative than the usual, juststaying on the Internet because
there is a bit of a domain jumpto the real world, right? So
that's interesting. The otherpart is compared to, say,
(31:25):
language LLMs, can actually wehave a paper MotionLM from two
or three years ago where theidea was, Hey, why don't we talk
tokenize motions to make themlike language? Turns out it's a
very effective idea.
Now, it models thatarchitecture, which is very LLM
inspired. It models futureinteractions of agents in the
(31:47):
environment very well. You canthink of agents talk to each
other with these motions theyexecute simultaneously in an
environment and now you canleverage the machinery. We have
this paper. It's quiteeffective, right?
That's an example of this. Now,one other interesting point
though is text is its ownsimulator. Essentially, you
(32:11):
speak text to each other, that'sthe full environment. You spit
out text tokens, text tokens,and text tokens. In our case, we
well, we predict actions, weexecute actions.
Imagine now, but you need thesimulator because now based on
these actions, you need toenvision what the whole
environment looks like and howyour, whatever, hundreds of
(32:33):
millions to billions of sensorpoints look like, so now you
need something that generatesthem as you act so you can test
yourself how you behave overtime. As you make decisions at a
fairly high frequency, thenthere is a known problem which
is called covariant shift.Essentially, can take you to
places you may not have seenbefore in the data and there you
(32:53):
may have particular failurethings that you may not observe
unless you push yourself anddrive on policy to those places
in the data, but to drive therenow you need the simulator. The
simulator needs to be realisticenough where you don't go
somewhere else entirely asopposed to the actual place you
will end up with decisionmaking. That's another very
interesting point.
(33:14):
Simulation is hard. If you wantrobust testing, simply having
drivers on the road is not aparticularly scalable solution
if you want to keep it erraticon your stack because some of
the events happen once in amillion miles or more and you
would much rather test them inthe simulator, but for the
simulator, now you have to solvethis problem, which is
interesting and challenging.That's unique in our domain.
Sponsor (33:50):
What if AI agents could
work together just like
developers do? That's exactlywhat agency is making possible.
Spelled AGN, TCY, agency is nowan open source collective under
the Linux Foundation, buildingthe Internet of Agents. This is
a global collaboration layerwhere the AI agents can discover
each other, connect, and executemulti agent workflows across any
(34:15):
framework. Everything engineersneed to build and deploy multi
agent software is now availableto anyone building on agency,
including trusted identity andaccess management, open
standards for agent discovery,agent to agent communication
protocols, and modular piecesyou can remix for scalable
systems.
This is a true collaborationfrom Cisco, Dell, Google Cloud,
(34:40):
Red Hat, Oracle, and more than75 other companies all
contributing to the next gen AIstack. The code, the specs, the
services, they're dropping, nostrings attached. Visit
agency.org, that's agntcy.org tolearn more and get involved.
Again, that's agencyagntcy.org.
Daniel (35:07):
Well, Drago, I'm really
intrigued by how you kind of
helped me form a mental modelfor the types of problems that
are part of the research in thisarea. I would definitely
encourage our listeners to gocheck out waymo.com/research.
There's a bunch of papers therethat people can find and read,
(35:28):
but also there's even a WaymoOpen Data Set, which, supports
research and autonomous driving.So, that's really cool to see.
It's amazing.
I'm wondering, Drago, as youlook at this kind of, I see all
sorts of things from, sceneediting to forecasting and
(35:48):
planning to
Drago (35:50):
Did I mention you need to
embody the agents in the
simulator too? They're notdeterministic.
Daniel (35:55):
Oh, yeah, yeah.
Drago (35:56):
Because they start doing
different things, you need to,
well, guide the agents to reactto you in reasonable ways as
well. Otherwise, they'll bereacting to an empty spot where
you're no longer even if youcollected the situation with
your sensors, as you startradiating from it in the similar
areas, still need the agents todo reasonable things, right?
Yeah, yeah, yeah.
Daniel (36:16):
Yeah, yeah, that makes
sense. And I guess that really
kind of gets to my question alittle bit, which is, I assume
over the last five years, wehaven't chatted, there's been a
lot of progress in certain areasand maybe certain challenges
that are kind of holdouts thatremain very, very challenging
and maybe not as much progresshas made. So in this kind of
(36:38):
autonomous driving researchworld, can you paint in broad
strokes kind of where there hasbeen very rapid progress as
things have advanced and maybesome of those of the hardest
problems to solve that stillremain kind of at arm's length,
if you will.
Drago (36:59):
I would say one thing for
folks that especially closer to
robotics, they will see justlike the field of AI is going
through some crazy inflectionpoint of both methods people
develop and popularity. I thinkthe same is true in robotics and
the same is true in AV. I'vebeen in the space over ten years
(37:20):
now just doing AVs, and I wouldsay every couple years, our
capabilities with AI and machinelearning dramatically expand due
to innovations, and thisinnovation train has not
stopped. So where we are fiveyears later compared to five
years before in terms ofmodeling, I think is still huge
(37:44):
improvements possible. I thinkwe're moving more and more to
machine learn power stacks and Ithink ultimately understanding
how to leverage data drivenelegantly, scalably handle this
problem with data drivensolutions.
(38:05):
So that's been generally anevolution, and I think we
understand how models behavebetter. I think these latest
architectures and the scalingthat we mentioned is a really
interesting domain. We startedstudying it, for example, for a
while back. So there's thispaper we have, for example, of
scaling laws of motion LMarchitecture. So it's an LLM
(38:26):
like architecture.
So you say, oh, what are itsscaling laws? How does it
compare to LLMs? We have a techreport on this, for example.
Still similar kind of learningstransfer as LLMs, but there's
some bespoke really interestingthings. For example, for that
architecture, improving what'scalled open loop prediction
performance seems to correlateto improving closed loop
(38:46):
performance.
That's not always true, right?We see different scaling factors
compared to language, like ourmotion space is nowhere near as
diverse as language tokens, sowe need actually, for the same
set of parameters model, we needa lot more data of examples of
how the world behaves to scale.These are interesting findings
(39:08):
generally, right? So that's one.I think now as the architectures
keep evolving, now there'sdiffusion and autoregressive
models and now how each compareand how do they compare in open
loop and closed loop.
These are all very interestingareas people are studying. I
think generally there's thisquestion lately as well of how
(39:31):
do you build the best simulatorwith machine learning and what
kind of models are there? And,you know, most recently, there's
some groundbreaking work, likethe Genie model by Google. I
don't know if you guys saw it.It's a controllable video,
essentially.
You can, like, give motioncontrols, and it dreams the
video close to real time of whatit should look like. So,
(39:54):
essentially, you're you'recontrolling the world you're
imagining a bit. Right? And youcan do it in real time, or you
can do it, of course, off boardtoo or offline with even larger
potentially models. Now thesemodels are pre trained on a
large amount of video and text,and so they capture a lot of
(40:16):
knowledge of how the real worldbehaves, and it somewhat
complements the knowledge thatvision language models capture
from the internet corpuses.
How do they storylate? How doyou mix them? Which one is
beneficial for which type oftasks? These are all interesting
capabilities that people aredoing. Maybe one other
(40:39):
interesting topic is there's alot of talk about architectures
for robots that are somecombination of system two and
system one architecture.
You guys may have heard it,right? Now, we know that large
models are more capable whentrained on more data and more
compute, but in latencysensitive situations, if they're
too big, you can't run them inreal time, so now the question
(41:03):
is, okay, well, what if you havea real time model that handles
most cases, but then you have aslower model that does better
high level reasoning that runsat some slower hertz that helps
guide and understandadditionally and provide this to
the fast model well needed whilestill keeping this reflexive
(41:25):
capability. Someone jumps frontof you, you still respond,
right? These are interestingquestions in our domain as well.
There's many, actually.
It's a really, reallyfascinating time and I think
we're studying a lot of thesequestions just as the whole
field is and we have some veryinteresting findings. Some of
them not published yet.Generally, I would encourage
people, come join us. You can,well, contribute to the premier
(41:52):
embodiment of physical AIcurrently out there, and you can
do interesting research, right?Sounds like fun.
Yes. These are all fascinatingtopics. And of course, how to
control hallucinations in allthese models, how to determine
when these models are out ofdomain and potentially making
(42:15):
clear mistakes, right? This canhappen. We have research
experience with VLMs like manyof the current ones, but we have
a paper called Emma where we tryto fine tune VLM for driving
tasks, got a bunch of learnings.
It can be quite good, but it haslimitations too, right? So how
(42:37):
do you overcome theselimitations with additional
system design is veryinteresting.
Chris (42:41):
I'm curious as we're
talking about this and I'm
really enjoying theconversation. And I work for
another company in autonomy butin a in a slightly different
context. I'm curious one of thethings that is popular in the
industry I'm in right now issolving for swarming behaviors.
(43:02):
As you're talking about manyautonomous vehicles that are
that are having to collaboratein certain ways. I'm curious
from your take that may or maynot be an interesting problem
for Waymo.
I I don't know what yourthinking is on that, but I would
love to know when you look atthat space, what are some of the
things that you think about andare interesting to you about the
(43:24):
notion of many autonomousvehicles collaborating together?
Drago (43:28):
That's been a very
interesting area that actually
there was earlier research thatI was impressed with where
people proved that if you cancontrol groups of vehicles, you
can improve traffic flow. So tome, we're not exactly swarming
yet autonomous vehicles. They'restill a subset, a relatively
small subset of the wholetraffic. It's mostly when I
(43:51):
think of swarming, I imagine,say, a crowd of 200 people on
Halloween all around the car andstuff like this, that's
swarming. Or you go to downtownafter a Giants game and they're
exiting and that is swarming.
Right? They're the human agent,so to speak, more prone these
days to swarming than AVs still.Maybe we'll get more prominent.
(44:14):
I think when you think ofcoordinating multiple AVs, in
our domain already, they do sendeach other valuable information.
For example, if one of ourvehicles encounters some very
complex construction, it canhelp pass information about it
to the others.
If we encounter potentiallyslowdowns or vehicles getting
(44:36):
stuck, that kind of informationcan be passed. I think
controlling jointly vehiclesstarted becoming interesting now
that we're getting to some kindof scale. I think one of the
interesting domains where thisis interesting is when you want
to charge them. Imagine you needto charge now hundreds of
vehicles in a location. How doyou control all these vehicles
(44:57):
so that they all get to theright place and don't block each
other and it's all veryefficient?
That's one example of whereyou're fairly swarmed, it's your
own warehouse, right, or agarage, where this comes up, and
then down the line potentiallythere is opportunities to
improve traffic flow foreveryone, but that's still maybe
(45:21):
in the future.
Daniel (45:22):
Well, you took us right
there, Drago, as we're kind of
getting close to an end here,I'd love to talk about that
future. We were talkingbeforehand and I was saying, I'd
love for you to share just whatyou're excited about. And that
could be of course in generalrelated to driverless research.
It could be kind of in the AIecosystem generally, something
(45:46):
that you're excited about as youlook forward to or are thinking
about a lot. Does anything standout so that we can ask you about
it?
Hopefully not in five years fromnow, but maybe maybe the next
time you're on in in less thanfive years, we can ask you ask
you about it.
Drago (46:01):
Sounds good. Well, I'm
around, so I could come probably
faster than in five years' time.
Daniel (46:07):
Yeah. In a waymo. Yeah.
Drago (46:10):
Potentially, yes. I think
maybe let's go in a couple
areas. First, maybe as toparallel this chat we had
earlier, maybe first about theproduct and then a bit about the
AI. I think in terms of theproduct, in a way, with the
safety studies we've shown,these are significant
improvements over the baseline,and I think we've shown it
(46:34):
already at scale with somefairly starts to become fairly
good confidence or somestatistical significance at this
point. Maybe your listeners, I'mnot sure they understand, but
even just on The US roads alone,I'm not talking world roads, US
(46:54):
roads, forty thousand people dieevery year from accidents.
That's a lot. I think thesegains are starting to become
somewhat meaningful, so itstarts becoming thinking, Hey,
maybe we have a mandate toexpand. We should be expanding.
It will save people's lives. Andyou think about it, and then the
(47:15):
question is, How can Icontribute to expanding?
I mean, ignore all the ofcourse, I believe it's a great
service. A lot of people love itfor a lot of good reasons. We
could potentially go into somereasons people found why they
love it, right? But I think evenjust from the mandate, okay,
it's helping in a meaningful wayand I think being out there can
(47:40):
make quite a dent against someof these numbers. So yes, I
would love it to expand more.
Now, we're doing that. I thinkto me then the question is what
can I do to contribute to it,right? I think one of the most
scalable solutions to tacklingdozens of new cities and
conditions in countries ismachine learning and AI, right?
(48:03):
So now for me, what I'm excitedabout is harness all the
positive latest trends. I thinkfor me, more directly first into
the Weimar Foundation model workwe're doing, where we can
directly experiment and deploythem and then try to push more
and more of them to contributesimilar benefits to the main
(48:27):
production systems, which is theonboard driver and the
simulator, right?
That's what I think about. Now,more specifically, if you don't
want to go into AI techniques, Ithink this question of, Okay,
how do I endow vision languagemodels with more modalities?
(48:47):
It's a fascinating one. Weactually have some good results
already. How do you expand tonew modalities, say, LIDAR and
RADAR?
How do you connect it toactions, the model? What's an
effective way to do this whilepreserving all the world
knowledge that's present in themodel that you're trying to
(49:08):
build on top of? It's aninteresting model and system
design challenge. Then what I'malso excited is building the
simulator, I think, asrealistic, as scalable as
possible. I think the moderntechnologies like the Gini model
that I mentioned, these worldmodels that are still relatively
few and far between, but I thinkit's a ton of labs that are
(49:31):
working on them today.
I think taking that kind oftechnology and build the most
generalizable possible simulatorwith it, I think is fascinating.
Now, the interesting thing isyou could do that, but they can
still be very expensive to run,so you still need to show this
is not just enough to show thatit can handle very realistic,
(49:56):
interesting cases. You stillneed to show how you can run it
without breaking the bank. Theamount of simulation Weimoo does
today to ensure that we're safe,we run millions of virtual miles
every day. That's a lot ofthings to simulate potentially
with so many sensors on boardand so on.
(50:16):
There's some very interestingquestions in that space. How do
I get the maximum possiblesimulator realism and how do I
get the maximum possiblescalable simulator? There's a
very interesting mix oftechnologies getting involved to
do that.
Daniel (50:30):
That's awesome. Well,
I'm certainly excited about
that. Like I say, I encourageour listeners to check out
Waymo's research page. Lots ofamazing stuff to explore there.
Drago (50:41):
And folks can see our
history. Right? Like, think you
can see the kind of work andpapers people did from, I think,
2019 to now. And there's almosta 100 papers there now. And
maybe it's not 100 only becausewe may not have uploaded the
most recent ones.
I'll try to make sure we do soonif we're missing any, so if the
(51:02):
readers go there, they can see,the full the full set.
Daniel (51:05):
That that sounds great.
Well, thank you for joining us
again, Drago. It's it's a realpleasure to have you on the show
again, and let's let's, let'snot make it five years next
time. We'll we'll try to get youon and and hear the update
sooner than that, for sure.
Chris (51:19):
Don't be a stranger.
Drago (51:20):
Thank you, guys. Pleasure
to be on the show.
Jerod (51:30):
Alright. That's our show
for this week. If you haven't
checked out our website, head topracticalai.fm and be sure to
connect with us on LinkedIn, X,or Blue Sky. You'll see us
posting insights related to thelatest AI developments, and we
would love for you to join theconversation. Thanks to our
partner Prediction Guard forproviding operational support
for show.
Check them out atpredictionguard.com. Also,
(51:53):
thanks to Breakmaster Cylinderfor the beats and to you for
listening. That's all for now,but you'll hear from us again
next week.