Why Amazon Built a Spatula-Wielding Robot

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:15):
Pushkin.

Speaker 2 (00:20):
One of the things I look at in robotics as
a big field. There are so many amazing demonstrations of mobility,
robots doing backflips, robots running down hills, and that's really
impressive to me because I can't do a backflip, or
I might trip if I run down the hill. But
where the really valuable parts of robotics are going to

(00:41):
be are in manipulation. So my kid can take a
blueberry out of her cereal bowl because she doesn't want
to eat it, and that is an incredibly hard task
for a robot. And you don't see any of those demos.
And I think we're like kind of inherently programmed as
people to like be biased towards the backflip being more
impressive and in reality like the business value and the

(01:04):
harder thing for the robot is to like take the
blueberry out of the cereal bowl.

Speaker 1 (01:15):
I'm Jacob Goldstein and this is What's Your Problem, the
show where I talk to people who are trying to
make technological progress. My guest today is Aaron Parness. Aarin
spent the earlier part of his career building space robots
at NASA's Jet Propulsion Laboratory JPL. Six years ago, he
went to work at Amazon. Now, Aaron is a director

(01:35):
of Applied Science at Amazon Robotics. I wanted to talk
to Erin about a robot arm called Vulcan. He and
his team developed Vulcan to do a job that is
surprisingly hard for robots to do, taking stuff that gets
delivered to Amazon warehouses and putting it onto shelves. In
order to solve this problem, Aaron and his team had

(01:58):
to build a robot that had a sense of touch,
that could deal with complicated, unpredictable situations, and that could
look at a shelf and plan out a course of action.
As you'll hear in the interview, all of those traits
may someday be helpful, not just in stocking shelves in
a warehouse, but in doing lots of boring sounding but
complicated real world tasks like, for example, taking a blueberry

(02:23):
out of a bowl of cereal. To start, I asked
Aaron to tell me the problem that Vulcan was designed
to solve at Amazon's warehouses.

Speaker 2 (02:31):
So new inventory comes into the building. You know, trucks
pull up and they unload new stuff. We need to
store that stuff while it's waiting for someone to click
the buy button. We store it in these large fabric bookcases.
It's about eight feet tall. It has about forty different
shelves on it. It's four sided, so you can store

(02:53):
stuff from any of the different faces of the case.
What's really interesting is this stuff is randomly stowed, so
it's not like all the iPhones are in one shelf.
It'll be all different stuff, all mixed together.

Speaker 1 (03:08):
When you say random, do you mean random or do
you mean it would look random to the untrained eye.

Speaker 2 (03:14):
I mean literally random. Really wherever there is space you
can put the item, because.

Speaker 1 (03:20):
That's what's optimal. It turns out the optimal way to
store stuff is random.

Speaker 2 (03:24):
That's right. Why the stems actually from Jeff Bezos is
like original vision, I think, and it's it's incredible. So
you want to have the most selection, and you want
to have speed of delivery, and you want to have
low cost, and that's what the customer wants, right. The
customer is using Amazon dot Com because we have selection,

(03:46):
we have speed, and we have low cost. In order
to achieve that, you have to have these massive warehouses
located really close to your customers, and you have a
lot of customers in Tokyo, in New York City and
San Francisco where real estate's really expensive, so you have
to figure out a way to put all of this
different stuff in like the densest packing area, you can

(04:10):
and have access to it immediately so that you can
you can deliver in hours instead of days. And what
that means is that random is better than structured. So
anywhere there's a space, you can add that item into
the inventory, and that means it comes up for sale
immediately on the website, and then when someone places in order,

(04:32):
you don't have to wait for that iPhone bookcase to
make its way all the way across the warehouse. You
probably have a thousand iPhones in the warehouse, and whichever
one is closest can go to whichever pickstation is eligible,
and it ends up being actually substantially faster. So that last.

Speaker 1 (04:49):
Sentence seems to be the key. The idea is like, yes,
given you have whatever a thousand iPhones in the warehouse
in the universe where a human had to know where
they all were, you'd put them all on one shelf.
But you're saying at any given time that means that
shelf is probably going to be pretty far away, whereas
if you randomly distribute them throughout the shelves and warehouse

(05:10):
at any given time, one of those thousand iPhones is
probably going to be pretty close to where it needs
to be. And because you have a whatever, a computerized
system that can keep track of everything all the time,
it makes sense to randomly distribute all the things.

Speaker 2 (05:22):
Yeah, that's exactly right. And it works on the flip
side as well. So when you have a new item
that's come in, rather than waiting for the shelf that
has the right size thing to put the new dog
toy in, you just put the dog toy anywhere you
can find space for it.

Speaker 1 (05:37):
H It's like my house. We have a lot of
dog toys in my house. Also, Yeah, that's really interesting.
It's great for the customer, and that's optimal.

Speaker 2 (05:48):
It's optimal, and it creates an incredibly difficult environment for robotics. Huh,
because now you have to deal with all this clutter.
We can have more than a million unique items in
one warehouse. Yeah, so it's not like you have a
model of each of those items. And we sell more
third party items than you know, Amazon owns themselves. Right.

(06:10):
We are a platform for third party fulfillment, and so
you don't have all the data about all those items,
and so you have to handle all this uncertainty, all
this clutter, and everything's tightly packed.

Speaker 1 (06:22):
And so still in most places as a result, when
stuff comes into the warehouse every day off a truck
to people take the things out of the truck and
stick them randomly on shelves wherever they can find space.
Is that the system?

Speaker 2 (06:36):
That is exactly the system, and it's in you know,
hundreds of buildings around the world.

Speaker 1 (06:42):
And just to be clear, I mean it's pretty clear,
but just to really put a point on it, why
is this a hard environment for robots?

Speaker 2 (06:51):
Traditional industrial robots do not handle contact well, so like
touching their environments, and they don't handle clutter or you know, uncertainty,
and so it's hard because to put that last book
onto the bookshelf, squeeze that teddy Bear into the just

(07:12):
small enough space that it'll fit, you have to push
the other stuff around that's already on that bookshelf. And
a traditional robot doesn't have sensors, it doesn't even know
how to do that. So if you think of like
a car manufacturing line, you're like nineteen nineties two thousands,
you know, welding robot or loading sheet metal into a press.

(07:32):
It's doing all of that only knowing its position in space.
So it has no force sensing. If it runs into something,
it either is like an emergency stop because it's like broken,
or it just smashes that thing and keeps going and
it doesn't even know it's smashed anything. It literally has
no sensing.

Speaker 1 (07:48):
That is an incredibly homogeneous environment. Right, It's doing like
the exact same thing at a very high level of
precision forever one thing.

Speaker 2 (07:57):
That's exactly right. And so this extension, the fundamental breakthrough
for science for robotics manipulation that my team is trying
to make is one giving the robot a sense of
touch and using that along with site and along with
like knowing where your robot is to do meaningfull tasks
in like very high contact, high clutter environments. And then

(08:20):
there's a brain part. It's also much more difficult to
kind of predict how this random assortment of items is
going to move or change as you push on it.
And so there's an AI piece, there's a brain piece
that's saying this item will fit in that bin. This
is actually one of the most frustrating things when you
try and do the job yourself. I'm like an optimist.

(08:40):
I'm always oh, yeah, this will fit. And I go
up there and I try and play Tetris and I
try and rearrange the shelf and like, it clearly isn't
going to fit. And then I've wasted thirty seconds or
forty seconds and I have to try something else.

Speaker 1 (08:53):
That's a good statement of the problem. Well, like when
did you come onto the scene?

Speaker 2 (08:58):
So I was working on some other stuff and there
was a recent PhD that had joined our team. He was,
you know, one year out of school something like this,
and he says, I'm going to go try and solve
stowing items into these bookshelves. And my thought was, Oh,
how naive, Like the real world is going to teach

(09:18):
this new grad. That's just way too hard a problem
for robotics to solve. But I was helping him because
it's fun, right, Like you like to work on hard
problems when you're a researcher. And he was a very
nice guy, and so I was, you know, helping him,
but never thought it was going to work. And there
were a couple of kind of moments where we made
these simplifications that turn the problem from I have to

(09:41):
try and do every possible game of Tetris that a
person can do into a problem where you're like, oh,
it's not that this is never going to work, it's
that this is the future, Like this is robotics two
point zero, Like this is I have to work on this.
I can't do anything else anymore. I'm like, I'm all
in on this problem.

Speaker 1 (10:01):
Tell me about one of those simplifications one of those moments.

Speaker 2 (10:05):
It was the gripper is one. The design, the mechanic
design of the robotic hand was actually a big breakthrough.
And when we started, we were trying to push items
with the item we were gripping. So imagine you're pinching
a book and you're trying to use that book to
like push this dog toy over to the side.

Speaker 1 (10:25):
I see, So you want to put the book in
a bin. Yeah, dog toys in the way. So you're like, okay,
pick up the book and use the book kind of
like a brush to sweep the dog toy out of
the way.

Speaker 2 (10:35):
Okay, And I say, okay, like I understand, but it's
never going to work. What if you don't have a book.
What if you have a T shirt? Yeah, what if
you have an iPhone and it's very expensive? Are you
going to actually want to start pushing on stuff with
the phone? And so we came up with this strategy
to have like a spatula that would extend into the
bin and you'd push everything with this spatula that was

(10:56):
part of your hand. So imagine like you're like Wolverine
and you can shoot out, you know, but instead of
like the Adamantium claws, you're shooting out a spatula.

Speaker 1 (11:07):
So it's like a pincher grip. A little spatula shoots
forward out of the pincher grip.

Speaker 2 (11:12):
Is the thing that's right.

Speaker 1 (11:13):
It's so simple when you put it that way. I mean,
I'm sure making it was not low tech, but it
sounds very like. It's not like some crazy AI thing.
It's like just what if there was another little thing
that came out and push stuff out of the way.

Speaker 2 (11:26):
But those ideas are like the really powerful ones when
you have a simple, elegant solution and you're like, okay,
that could work. That's different than like a five fingered
hand that has twenty five motors embedded in it. Yeah,
like Oh, it's just dispatulate.

Speaker 1 (11:44):
Fingers are famously difficult. Why didn't anybody think of it before?

Speaker 2 (11:49):
So we had been working on it as a company
back to the Amazon Picking challenge, which was, you know,
twenty fifteen. But I think a lot of robotics researchers
like myself, were scared that this problem was just too hard.
There was easier things to go try and work on,
and there were a couple of simplifications, so using this
spatulo as one, and then you watch people do the

(12:11):
task and you realize they're kind of doing the same
strategies over and over again. It's like insert dispatchela on
the D and sweep to one side. For this kind
of page turn mechanism. Something's fallen over and you need
to sort of flip it back up to make space.

Speaker 1 (12:30):
So you put this spatula underneath it and flip the
thing up ninety degreees basically, Yeah.

Speaker 2 (12:35):
And you realize that accounts for like ninety percent of
the actions you do when you try and stow into
these pins.

Speaker 1 (12:42):
And did you figure that out by watching people stow.

Speaker 2 (12:45):
We did and doing it yourself.

Speaker 1 (12:48):
How much stowing did you do couple of days? Okay,
it's a hard job, thousands of items probably, I imagine.

Speaker 2 (12:55):
Yeah, exactly. And we tried to wear go pro cameras
on our heads so we could look at the videos later,
which turns out as a recipe for motion sickness. It's
very difficult to watch those videos, but you go and
you do it, and you build up this intuition. And
I I think the other piece of the problem that
made it tractable and made me this like huge believer
was recognizing we didn't have to get to one hundred percent.

(13:17):
So in some automation scenarios, you have to solve the
whole problem, and if you don't, you have nothing, so
like landing on the moon. And what we realized was
there was a way to like make the business logic
work that the robot could handle seventy five percent of
the stoves and it just had to not make a
mess and let like work alongside people to do the

(13:41):
other twenty five percent, and the sum of the parts
is actually much better than either all robots or all
employees would be on their own. And making that realization
all of a sudden meant that it could be a
two or three year project instead of a twenty year
project because chasing this long tail. You know, we have

(14:01):
a million unique items in the building, but we also
process a million items per day. So I have a
phrase like, if something goes wrong one in a million,
it happens every day in every Amazon building. And to
try and solve all of those is it is a
twenty year problem.

Speaker 1 (14:17):
I feel like that part of the solution generalizes in
a really nice way, right, Like I mean, I guess
the eighty twenty problem is a sort of cliche. But
the idea that like, oh, if you think of the
problem the right way, it's like, no, we don't have
to build a robot that does it every time. Before
we build a robot that does it seventy five percent
of the time. That is a huge efficiency gain and

(14:38):
maybe the optimal point on the curve. Right yep, if
the robot is doing everything, you're working too hard to
make the robot work.

Speaker 2 (14:46):
Probably exactly that.

Speaker 1 (14:49):
So, Okay, so you have these two big ideas. Do
you want to tell me the sort of story of
making it work? You want to tell me how it works.

Speaker 2 (14:57):
We've been running six of these robots at a warehouse
in Spokane, Washington, Okay, since November of last year and
so we've done over half a million stows it this point.
We also have another product that's picking those items out
of the bins, and so that's my team in Germany,
and so we have a warehouse in Homburg where we've

(15:19):
been picking items. And picking is a slightly harder problem
in some ways because you have to identify the item.
So for stow, you have to identify free space. It's
either occupied or you can make space to put the
next item in. For pick, you want to make sure
I get you the red T shirt, not the red sweatpants,
or I get you the Harry Potter volume two and

(15:39):
not Sapiens or some other book.

Speaker 1 (15:41):
Tell me how it works. Let's do this stewing first,
since that's what we've been talking about. So there's this
warehouse in Spokane where this robot that you built is
in use. Like what happens there? A truck pulls in
and then what happens.

Speaker 2 (15:54):
The way the system works is one of these pods,
one of these bookcases pulls up to the station, so
it pulls in front of the robot. We have stereo
camera towers and so we're looking with the eyes first,
and we are creating a three D representation of the scene.
So we're modeling, you know, all the items that are

(16:16):
in the in the pod already. But the really interesting
part is we're actually predicting on top of that, how
we can move those items around to make more empty space,
how can we squeeze more stuff in. So it's not
just identifying vacant space. You have to predict where you
can make that vacant space by pushing stuff with this spatula. Okay,

(16:40):
then we do this matching algorithm. So we have about
forty or fifty items waiting for us to stow, and
so we have a variety of stuff, and we're matching
those forty or fifty items to the thirty ish shelves
that are in front of the robot. Which items should
go where, and then how do we make that space?
And so that's where a lot of the AI in

(17:01):
the system is active and operating. It's predicting success, it's
minimizing risk, it's trying to optimize for a bunch of
different parameters. Once we've made that selection, we grasp the item,
so that item we've selected for putting into the given
shelf passes into our hand and our hand is two

(17:21):
conveyor belt paddles, so you can think of it kind
of like a Panini press, like a George Foreman grill.
It is a George Foreman grill where each side has
a conveyor built into it.

Speaker 1 (17:32):
Like a little belt, Like just a little belt going around.

Speaker 2 (17:35):
That's right. So each face of the grill, the top
face and the bottom face have a conveyor belt. And
that's important because you can control the pose of the
item and you can feed it into the bin rather
than like throwing it into the bin. One of the
early versions we had kind of dropped it and tried
to punch it to put the item into the bin,
and that predictably failed in a lot.

Speaker 1 (17:57):
Of Well you say predictably now, but if you try
to it wasn't predictable.

Speaker 2 (18:02):
Yeah. Yeah. I'm a huge believer in iterative design, and
so we try and build early, build often build and
learn from those builds. So it's actually really important to
keep sixed off pose control of the item. So you
want to make sure the item isn't rotating as you
shoot it out. You want to make sure that you

(18:22):
keep the orientation of the item because it's fitting tightly,
so you don't want it to run into the bookshelf
above it or below it, or the items thirty in there.
Yeah yeah, yeah. We started by trying to shoot it out,
and then we had all kinds of problems when it
would like collide with stuff and fall on the floor.
The worst case is, you know, you would shoot it out,
it would bounce off the back of the back of

(18:44):
the bookcase and then come back and hit you in
the face or hit you in your.

Speaker 1 (18:47):
Did that happen?

Speaker 2 (18:48):
Yeah? Oh yeah, good.

Speaker 1 (18:49):
That's robot comedy, yesh, ro but physical comedy.

Speaker 2 (18:52):
Yeah.

Speaker 1 (18:53):
So, and that's it.

Speaker 2 (18:54):
That's the stow process. And we want to do that
a few hundred times an hour, and we want to
do it on the top shelves of those bookcases where. Yeah.
That's one of the ways we are really complimentary to
the employees is if the robots can do the top shelves,
it saves a lot of ergonomic tasks. It allows the
employees to work in their power zone, like you know,

(19:17):
shoulder level. That makes them faster too. So if you
put robots in, people get faster at the job.

Speaker 1 (19:22):
I mean, presumably as the robot gets better, it'll also
be better at putting things on the middle shelf.

Speaker 2 (19:28):
Right, Well, there's this like sweet spot.

Speaker 1 (19:29):
The robot's going to get better faster than people will
get better.

Speaker 2 (19:33):
Yeah, we want the robots to be as good as
it can and not chase one hundred percent. We don't
really believe in one hundred percent automation. We want to
find that sweet spot where we're maximizing product.

Speaker 1 (19:43):
Mean, the sweet spot's going to keep moving, right, The
robot's going to get better and better and be able
to do more and more faster and faster.

Speaker 2 (19:49):
Presumably, and my science team's actually really excited about that.
As you get more and more data. So we have
five hundred thousand stows that we've done so far, but
when we get to five hundred million STOs, you can
leverage some of these techniques to start learning the motions
and learning some of these strategies and refining them to
be specific to the item that you are molding in
your hand. There's a lot of opportunity as you get

(20:12):
more and more data.

Speaker 1 (20:14):
Well, right, so we haven't really I mean, you mentioned
the software side, the AI side, but we haven't really
talked about it, and it is I mean, in talking
to other people working on robotics. It's plainly a data
game because there's no Internet of the physical world, right,
because large language models work so well, because there's this
huge data set and everybody is trying to get data

(20:36):
from the physical world, and you seem very well positioned
to get a lot of data from the physical world.

Speaker 2 (20:43):
I think that's true. So one of the joys of
being a roboticist at Amazon is all the data that
we have access to. But I will push back a
little bit that it's just a data problem. It's a
highly debated topic. Some people in the world believe that
you can apply the same sort of transformer architectures that
work so well for search and so well for natural

(21:05):
language processing and apply those to robotics. If we only
had the data. I would not put myself in that camp.
I am not a believer that all we need is
more torque data from robotic grippers and will solve it.
Natural language is already tokenized in a way that's very
amenable to those methods, and language and search are also

(21:28):
very tolerant of sloppiness. So you and I can have
a conversation. I don't have to get every single word correct,
but if you mess up a torque on a gripper,
you can crush your iPhone, or you can sort of
smash something else that's there, or drop something, or just
fail the task. And that's because you have physics and
this nonlinear, very sort of difficult to model real world

(21:54):
that these robots have to interact with. And so I
think those techniques certainly accelerate us in a lot of places,
but they don't just solve the problem. I think we
need all of the rest of robotics, like hardware design
and classical control theory to solve those problems.

Speaker 1 (22:12):
Compelling. Although you did start this part of the conversation,
you brought it up by saying, the science team is
really excited for what the model's going to learn once
you have hundreds of millions of stoves.

Speaker 2 (22:24):
That's right, and that both things are true.

Speaker 1 (22:26):
I know, yes, plainly, we're just talking about sort of
the margins, right, what is true? At what margins? I mean?
I did wonder as I was reading about this, you know,
I thought of AWS of Amazon Web Services, which of course,
like was an internal Amazon thing that at some point
Amazon was like, oh, maybe other people would find this
service useful, And now it's a giant part of Amazon's business,
and so I wondered, like, are you building Amazon robotics services.

Speaker 2 (22:51):
Yet not today? There's so much value that we can
provide to our fulfillment business that we are one hundred
percent focused on that. Certainly as a roboticist, though, I
take great joy that the work we're doing is advancing
the field of robots, and so it's definitely like in
the makes my job better that we're advancing the state

(23:14):
of the art. But from a business perspective, it's all
hands on making the fulfillment process better for Amazon dot Com.

Speaker 1 (23:25):
We'll be back in just a minute. I think I
read you say that you're building a foundation model of items.

(23:45):
Is that right? And I sort of know what that means,
But tell me what that means when you say that.

Speaker 2 (23:50):
So, when a robot handles an item, it would do
better if it takes into account the properties of that item.
So if you're trying to hand a bowling ball to someone,
you should do that in a different way than if
you're handing them a bouncy ball or a light bulb.
At its core, a foundation model for items is simply

(24:11):
a model that encodes all of those attributes of an item.
And makes them available to the robotic systems that are
going to use it. And one of the things that
makes it a foundation model instead of just you know,
some custom bespoke thing is that you can transfer it
across lots of different applications. So if it's you know, stowing,
you can use it. If you're packing it into a

(24:32):
delivery box, you can use it. If you're putting it
onto a shelf in a physical store like for grocery
or whole foods or something, you can use it. And
so that like commonality across applications is one of the
things that's important.

Speaker 1 (24:45):
Is part of the notion there that like the model
would allow a robot to sort of look at some
novel item and make a reasonable inference about the properties
of that item.

Speaker 2 (24:57):
Yeah, absolutely that. And the other thing that's a little
non intuitive is that by understanding how to handle that
item in all those different applications a grocery a you know,
stowing picking, you get better at the individual application. So
by training on all of this data across these different domains,

(25:18):
you actually get better at the individual task that your
specific robot is trying to do. Doesn't like it takes
a while to like understand that it is not intuitive.

Speaker 1 (25:29):
SayMore, what do you mean, like, I don't know that
I fully get it.

Speaker 2 (25:32):
Understanding how an item behaves when you gift wrap it,
Uh huh shouldn't really inform how it's going to behave
when you're picking it off of a bookshelf.

Speaker 1 (25:42):
Oh, I mean yes, it should, right, Like if you
think of like a whatever, a stuffed animal versus a book. Yeah,
maybe that's too easy of a case, but like if
a thing is squishy or rigid, that seems like as
a human being, I feel like we sort of port
that knowledge from one use case to another, right.

Speaker 2 (26:02):
Yeah, it's a good point. And maybe that's because we
are inherently sort of we think and manipulate items in
the world more similarly to how these foundation models do.
But ten years ago it was totally not the case.
You would train your model in a very narrow domain,
and if you gave it data from some other domain,
it would kind of corrupt the results that you had,

(26:23):
and so you were very careful to curate all the
data that you were using to be very specific to
the task that you wanted it to do, and that
made the performance better. But it also meant that the
model you had was only good at that one very
narrow thing.

Speaker 1 (26:38):
It was why we were always so far from the
general purpose robot. Yeah, because, as you're describing it, trying
to make a robot do more than one thing just
meant it couldn't even do.

Speaker 2 (26:47):
One We couldn't even do one thing, and so you're
putting all your effort into making it do that one
thing just a little better. I think there's another really
interesting piece here, which is our team, the Vulcan team
at Amazon, is trying to use touch and vision together,
and that is how people interact with the world. That's

(27:09):
how people manipulate the world. And so the example I
like to give is picking a coin up off a
table ten years ago, when a robot would try and
do that, I mean it's impossible, Like, robot can't pick
a coin up off table, it's too hard a task.
My five year old can pick a coin up off
the table in half a second without you noticing. Well,

(27:31):
the reason is your strategy. So when you pick a
coin up off the table, you actually don't grasp the coin.
You go and you touch the table and then you
slide your fingers along the surface of the table until
you feel the coin, and when you feel the coin,
that's your trigger to like rotate it up into a grasp.

(27:51):
You're not going to some millimeter precision the way your
grandfather's robot and the welding line would do. And you're
not just watching with your eyes. You're using your eyes
and your fingertips both your.

Speaker 1 (28:04):
Sense of touch. Yes, sense of touch is central to.

Speaker 2 (28:07):
Pick and we are trying to do those same kind
of behaviors that are not only reacting to touch, but
planning for touch. So the same way you plan to
touch the table first, we plan to put our spatula
against the side of the bookcase before we try to
extend it in between this you know, small gap between

(28:30):
the T shirt and the bag and the side of
the bookcase. So we are building our plans and our
controllers around having sight and touch.

Speaker 1 (28:40):
I mean when you say touch in the context of
the robot, does that mean that it is getting feedback
from the stuff it is coming into contact with? And
is that novel? And how does that work?

Speaker 2 (28:51):
So the sensor is a force torque sensor. It looks
like a hockey puck and a thousand times a second,
it's telling you what it feels in the six degrees
of freedom, So up, up and down is one, left
and right is two, in and out as three, and
then you've got roll pitch y'ah as the three torques.

(29:12):
So a thousand times per second, you're sensing, you're feeling
what the world is pushing on you with, and we
use that to control the motion but also to plan
the motion.

Speaker 1 (29:24):
When you say plan the motion, it's like, given the
sense of touch that is happening right now, what should
I do next? Yep.

Speaker 2 (29:31):
So in a like high level view, it's like touch
the table first, slide along the table while keeping you know,
sort of one pound of force pushing into the table
until you touch the coin, and then you know, rotate.
That's at a high level, but then even at a
low level, the thousand times per second is so that

(29:51):
as you slide your fingers along the table, you're sort
of maintaining that accurate force.

Speaker 1 (29:57):
Yeah. Or like if you're putting a thing on the shelf,
you can sort of tell if you've pushed it too
far because the shelf is pushing back.

Speaker 2 (30:04):
At you exactly, or you can tell it slipping and
you're about to like push over the top of it,
so you can like, oh, it's about to fall over,
so I can react. And those dynamics are happening at
tens or hundreds of hurt since you need to sense
them at a thousand hurts.

Speaker 1 (30:21):
What's the frontier right now for stewing? What are you
trying to figure out?

Speaker 2 (30:26):
One of the things is getting the fullness of those
bins all the way up to where they are today,
so as a person you can pack those bins really,
really densely, and so the robot's close but not quite
as good as a person is today at getting as
much stuff into the bookcase as it can. That's one frontier,

(30:49):
and that is because one we're conservative, like our brain
is telling us there's no space when really there is space.
And two it's because those motions are not sophisticated enough yet.
So we're trying to improve our video streaming. We're trying
to get the eyes better to help as well as
those low level touch centers to those behaviors to be better.

(31:12):
So that's one of the major frontiers. The other one
is the negative. The robot makes too many mistakes, so
defects and exception handling are so important in robotic systems,
and this is another thing I think the world on
the Internet doesn't appreciate enough. Like you can do a
demo and a happy path. Hey, it worked once. I

(31:33):
can submit a paper to a conference, or I can
put a cool video on YouTube. That's great. You have
a demo. To have a product, you have to make
sure it's working, you know, ninety nine percent of the time,
or ninety nine and a half percent, or you know
in some cases four nines or five nines. And a
lot of the work you have to do is to
recover and handle those rare exceptions or prevent or recover

(31:57):
from those defects. And so the robot still drops too
much stuff on the floor. One of our frontiers is
not dropping crap on the floor, like, we need to
get about three times better at that.

Speaker 1 (32:08):
Umly, the robot is already skipping some universe of items
that the robot can't handle.

Speaker 2 (32:14):
Yeah, and so we need to get smarter about which
items we skip and which items we take. We also
need to get better at inserting those items in such
a way that they're not going to fall back out?

Speaker 1 (32:25):
What items are particularly hard for the robot?

Speaker 2 (32:28):
So tight fitting items are the hardest.

Speaker 1 (32:31):
Uh huh. And so that's not the nature of the item,
but the nature of the particular relationship between the item and.

Speaker 2 (32:37):
The shelf exactly.

Speaker 1 (32:38):
Yeah, Like, is there a kind of thing that the
robot just can't do because of its shape or something.

Speaker 2 (32:46):
There is a particular rubber fish that we really hate.
It's a dog toy.

Speaker 1 (32:53):
It's floppy. Is that what?

Speaker 2 (32:55):
Sticky?

Speaker 1 (32:56):
Oh? Sticky? Interesting? Yeah, And they don't put it in
a bot, Nope, they just send you the sticky fish.

Speaker 2 (33:04):
Yeah, and it sort of gets hung up on whenever
it makes contact. It doesn't slide, it like it wants
to rotate about whatever it's made contact with. And so
there's this particular dog toy and so we use it.
We've bought like fifty of them and now we have
them in the lab and this is like our diabolical
item set.

Speaker 1 (33:22):
Is that a term of art that diabolical?

Speaker 2 (33:24):
I don't know, Yeah, it's our term of art. Yeah.
Also bagged items where the bag is really loose. So
imagine having like a T shirt in a bag, but
the bag is like twice as big as the T shirt.

Speaker 1 (33:37):
Floppy? Is that the floppy problem?

Speaker 2 (33:40):
Floppy but also transparent, so sometimes you can see through
the bag.

Speaker 1 (33:44):
Or so the robot gets confused about is the bag
the item.

Speaker 2 (33:49):
Yeah or not? Sometimes you want one and sometimes you
want the other. So like if it's just floppy plastic bag,
it probably will fit. Like if I just push it
into the bin, the bag is going to conform and
slide in, but you can't be sure about that. You know,
you get into a bunch of those edge cases that
are in that long tail of being robust.

Speaker 1 (34:07):
I mean, it's interesting, right because the robot is dealing
with this sort of human optimized world. Like it reminds
me of the way I think is it. Ikea designs
its furniture to fit optimally on a pallette, so you
can fit the most of them, like not just the
flat pack, but like in more subtle ways. And can
you imagine that there is some shift in the world

(34:28):
where I mean, obviously you're trying to make the robot better,
but also people are trying to make things work better
for the robot.

Speaker 2 (34:34):
Yes. Absolutely, And there is a different team within Amazon
that's imagining a future world and future bookcases that are
friendly for robots.

Speaker 1 (34:45):
Uh huh.

Speaker 2 (34:46):
However, there are currently five million of those bookshelves in
warehouses holding inventory that's for sale on Amazon dot Com.
And so it's a really really big lift to go
replace all of those books.

Speaker 1 (35:04):
Shehlds interesting. So it's a whole other team that's just like,
let's imagine the you know, a much more robot centric warehouse. Yeah,
those guys like you don't even talk to them. They're
just off in their own.

Speaker 2 (35:16):
I mean, they're friends, but yeah, we are spacing very
different problems. And so we took a tenant very early on.
It's like, the world exists, the robot needs to perform
in the world as it exists. And this team they
get their green field, so they get to think of
like a new field. We are a brownfield, meaning we

(35:38):
have to retrofit into these existing buildings. You know, we
have like ten year leases on some of these building
They're going to be there for a long long time.

Speaker 1 (35:45):
And then somebody else is out there. So they're building
a whole other kind of robot. Your robot is optimized
for the world today, and somebody else is building a
robot for the robot world.

Speaker 2 (35:53):
That's right. I love that they have a building that
they've built in Louisiana. It's in Shreveport, Louisiana. It has
ten times the number of robots that traditional building has.
It's a completely reimagined way of fulfilling your order. It
also has a lot of people still working in those buildings,
but they're working in maintenance and robotics quarterbacks jobs, and

(36:17):
so they're higher skilled. And so we have a bunch
of programs that are trying to transition our very talented
workforce into the jobs of the future. One of the
things I really like to say is, you don't need
a college degree to work in robotics. At Amazon. It's
about twenty twenty five percent of my team doesn't have
a college degree but are enormously valuable. Like some of
our top ten people on our team are those people.

Speaker 1 (36:41):
That facility in Treeport. Is it live? Like, is real
stuff going in and real orders going out?

Speaker 2 (36:47):
Yeah, it's live. We could follow up with exactly the date,
but it's been up for about a year. I think, so.

Speaker 1 (36:52):
Interesting thing like that, Well, I would be interested in
talking to your counterpart there as well. That show would
pair interestingly with this show. So okay, let's talk about
the rest of the process. You know, the rest of
what's going on in the warehouse and where else you're
working on robots. So the piece we've been talking about
this whole time is getting stuff as it comes in

(37:13):
from the truck onto the shelf, which naively I wouldn't
even think of that part, but it turns out to
me this scrape big problem. What are the other pieces?

Speaker 2 (37:23):
What's interesting is the science we're building giving robots a
sense of touch has applicability and lots and lots of
places across that whole chain. Anytime the robots need to
be physically interacting, like contacting, touching items is a good
place for our core technology. So if we're packing four

(37:44):
items into a box because we want to send you
the four things you bought in one shipment, not in
four separate packages, you need to touch the box. You
need to touch the other items that are already in
the box. You need to play that game of tetris.

Speaker 1 (37:56):
Yes, I mean it's a stowing problem again, right, I
know it's called packing, but it's a version of that same.

Speaker 2 (38:01):
Problem, that's right, And those problems recur over and over again.
So getting all of the packages, all of the cardboard
boxes and paper mailers into a cart that can go
onto the back of the truck, that is a stowing
problem in the cart.

Speaker 1 (38:15):
Putting things in a thing, yeah, is a great, big
problem in many ways.

Speaker 2 (38:19):
But you can also expand to think about grocery. So
if you order produce, you don't want your grandfather's welding
robot handling your peaches. It's gonna smash them, like, you
need a robot with a sense of touch. If you
think about household tasks, if you want a robot, you know,
picking up your kid's toys or dealing with laundry, like,

(38:41):
those robots need to have a sense of touch. They're
physically interacting in a dexterous way with the world. And
so one of the things that we're so excited about
not only these big applications for stowing and picking off
of you know, these bookcases, but everything that gets unlocked
once the robot has that sense of touch.

Speaker 1 (39:00):
When you talk that way, it feels like a beyond
what is typically considered Amazon kind of thing. It seems
like a thing either Amazon's going to get into lots
of other sort of non retail businesses or license the
technology or sell you know, robotic touch as a service
or whatever.

Speaker 2 (39:20):
Yeah, I think there are probably five or ten applications
in how we process orders today that are all within
the warehouses and delivery stations, and those are my first
hill to climb. Then we do have a consumer robotics team.
So there was a cool robot we released called Astro.

(39:41):
It didn't have any manipulation capabilities, right, It would drive
around your house. It had a camera on a mast
that would extend up and down. You could talk to
it the way you can talk to an Alexa device.
The future versions of those robots are going to want
to do more useful things, and so they're going to
need this kind of underlying technology, and so that's a
business opportunity in the long term. You know, that's not

(40:02):
a thing my team is focused on now, but I
get excited about it when I think about what we unlock.

Speaker 1 (40:12):
We'll be back in a minute with the lightning round.
Let's do a lightning round. If you listen to the show,
you have a sense of what this is. Tell me
about the last time you were in zero gravity.

Speaker 2 (40:33):
I flew an experiment to try and drill into rocks,
which was going to be applied to asteroids. And of course,
if you're drilling into an asteroid, any amount you're pushing
into the rock is pushing you back off into space
because asteroids have almost zero gravity.

Speaker 1 (40:51):
Right, so you gotta have somebody push it on the
other side. How do you solve that? What do you do?
You grab it? How do you even do that?

Speaker 2 (40:58):
Says my passion for robot hands. We built a robot
hand that would grab the rock with a bunch of claws.
I think it had a thousand claws, and the claws
were actually fish hooks. So imagine a bunch of fish
hooks grabbing onto a rock to react the force of
pushing a drill bit down the center.

Speaker 1 (41:15):
Did it work?

Speaker 2 (41:17):
It did work, but it only worked on rocks that
were pretty rough, that had a lot of spots for
the fish hooks to grab. But it turns out asteroids
are really rough. Most of the smooth rocks you find
on Earth have been processed by liquid water or ice,
and that's not happening on asteroids. No liquid water.

Speaker 1 (41:35):
And so this was on the on the plane, on
that NASA plane that flies what is it? Yeah, vomit
combat fine curve basically, yeah, what was it like?

Speaker 2 (41:45):
The vomit comets actually very zen. So when you're in
zero gravity, when you're floating, it's like very peaceful. It's
when you're in double gravity, where you're the bottom of
the parabola and you're like being glued and pushed against
the floor. If you like turn your head very quickly,
that's where you get like into serious trouble. And so
the trick is just to like go into your zone

(42:06):
for the bottom of the parabola and then you've become
like very free and zen like in the zero G portion.

Speaker 1 (42:13):
You think you'll ever go to space?

Speaker 2 (42:15):
No, I think now that I have three kids, I
think I'm landlocked.

Speaker 1 (42:20):
You seem a little bit sad about that. Does everybody
who works at JPL kind of want to go to space?

Speaker 2 (42:26):
Yes, everybody that works at JPL, I think does think
about going to space. I think what makes me sad
is we could be doing so much more at building
civilization out into space, at the scientific exploration of all
of the interesting places in space, and I think we're
kind of tripping ourselves up in a couple of places

(42:48):
as a species. I wish we would get unblocked and
get some of that eagerness you see some of the
private investment, Like we're doing well in rockets, but we're
not yet doing well in the spacecraft and the scientific
instruments and the pieces that have to fly on top
of the rockets.

Speaker 1 (43:04):
When you say we're tripping ourselves up in a couple places,
in what places? Like, what do you mean?

Speaker 2 (43:10):
I think we became very conservative, like our risk posture
about going to space. We stopped treating it as this
like very dangerous activity and tried to make it extremely safe,
and that slowed us down.

Speaker 1 (43:22):
To bring back to cowboys a little bit.

Speaker 2 (43:24):
Yeah. Interesting, Yeah, and then there's a lot of bureaucracy,
of course, that built up over fifty years. I still
have very optimistic there's a lot of smart people working
in that area and a lot of exciting things happening,
So we're going to get through it.

Speaker 1 (43:46):
Aaron Parness is a director of Applied Science at Amazon Robotics.
Please email us at problem at Pushkin dot fm. We
are always looking for new guests for the show. Today's
show was produced by Trinomnino and Gabriel Hunter Chang. It
was edited by Alexander Garretton and engineered by Sarah Bruguerrett.

(44:07):
I'm Jacob Goldstein, and we'll be back next week with
another episode of What's Your Problem.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Ruthie's Table 4

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Why Amazon Built a Spatula-Wielding Robot

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Ruthie's Table 4

Dateline NBC

All Episodes

Why Amazon Built a Spatula-Wielding Robot

Stuff You Should Know