Teaching Robots How to Do Everything

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:15):
Pushkin. In a metaphorical sense, AI is everywhere. It can
write essays, it can do your texes, it can design drugs,
it can make movies. But in a literal sense, AI
is not everywhere. You know, a large language model can

(00:35):
tell you whatever twenty seven ways to fold your shirts
and put them in the drawer, but there's no robot
that you can buy that can actually fold your shirts
and put them in the drawer. At some point, though
maybe at some point in the not that distant future,
there will be a robot that can use AI to
learn how to fold your shirts and put them in
the drawer, or you know, cook lasagna, pack boxes, plug

(00:58):
in cables. In other words, there will be a robot
that can use AI to learn how to do basically anything.
I'm Jacob Goldstein and this is What's Your Problem, the
show where I talk to people who are trying to
make technological progress. My guest today is Chelsea Finn. She's

(01:20):
a professor at Stanford and the co founder of a
company called Physical Intelligence aka PI. Chelsea's problem is this,
can you build an AI model that will bring AI
to robots, or, as she puts it, we're.

Speaker 2 (01:35):
Trying to develop a model that can control any robot
to do any task anywhere.

Speaker 1 (01:41):
Physical Intelligence was founded just last year, but the company
has already raised over four hundred million dollars. Investors include
Jeff Bezos and OpenAI. The company has raised so much
money in part because what they're trying to do is
so hard. Motor skills, the ability to move and find
ways to fold the shirt to plug in a cable,

(02:02):
they feel simple to us, easy, basic, But Chelsea told
me basic motor skills are in fact wildly complex.

Speaker 2 (02:11):
All of the motor control that we do with our body,
with their hands, with our legs, our feet, a lot
of it we don't think about when we do it.
It actually is incredibly complicated what we do. This is
actually like a really really hard problem to develop in
aisystems into robots, despite it being so simple. And the
reasons for that are because actually it is inherently very complex,

(02:34):
and second that we don't have tons and tons of
data of doing this, in part because it's so basic
to humans as well.

Speaker 1 (02:42):
Right, let's talk about the data side, because that seems
like really the story, right, the big challenge, and it's
particularly interesting in the context of large language models and
computer vision which really seem to have emerged in a
weird way as a consequence of the Internet. Right, just
because we happen to have this crazy amount of data

(03:06):
of words and pictures on the Internet, we were able
to train language models and computer vision models. But we
don't have that for robots, right. There is no data
set of training data for robots, which is like the
big challenge for you and for robotics in general.

Speaker 2 (03:22):
It seems, Yeah, so we don't have an open internet
of how to control motors to do like even really
basic things. Maybe the closest thing we have is we
have videos of people doing things, and perhaps that could
be useful. But at the same time, if I watch
like videos of like Roger Federer or playing tennis, you
can't just become an amazing tennis player as a result

(03:42):
of that. And likewise, just with videos of people doing things,
it's very hard to actually extract the motor control behind that.
And so that lack of data, that scarcity of data,
makes it in some ways a very different problem than
in language and computer vision. And I think that we
should still learn a lot of things from language computer
vision and collect large data sets like that. It opens

(04:04):
up new new challenges new possibilities on that front, and
I think that in the long run we should be
to get large amounts of data, just like how in
autonomous driving we have lots of data of cars driving
around very effectively. Robots too, could be in the world
collecting data learning about how to pick up mustard and
put it on a hot dog fund, or learning how
to open a cabinet to put some objects away. We

(04:26):
can get that sort of data, but it's not given
to us for free.

Speaker 1 (04:33):
You still have this core problem, which is there is
no giant trove of physical reality data that you can
train your model on. Right, That's the great big challenge,
it seems, what do you do about that? How do
you start to approach that?

Speaker 2 (04:49):
Yeah, so we're starting off by collecting data through telling
operation where you are people are controlling the robot to
do tasks, and then you don't just get video data.
You get the videos alongside what are the actions or
the motor commands needed to actually accomplish those tasks. We've
collected data in our own office. We've also collected data

(05:10):
in homes across San Francisco, and we also have a
very modest warehouse. In some ways, it actually like our
current operation is rather small, given that we're a little
over a year old at this point.

Speaker 1 (05:24):
Like what's actually happening? Like if I went into your
warehouse and somebody was doing teleoperation, what would I see?
What would it look like?

Speaker 2 (05:30):
Yeah, so we it's a little bit like controlling a puppet.
So the person who's operating at the robot, they are
holding in some ways a set of robot arms, but
they're very lightweight robot arms, and we use those to
measure the positions of joints.

Speaker 1 (05:47):
It's almost like an elaborate control for a video game
or something. It's like that, it's not actually a robot arm, right,
It's a thing you control to sort of play the
robot to the robot move.

Speaker 2 (05:57):
Yeah, exactly exactly, and then we record that and then
directly translate those controls over to the robot. We have
some robots that are just robot arms, where you're only
just controlling the robot arm. It's mounted to a table
or something like that. But we also have what we
call mobile manipulators that have wheels and robot arms, and
you can control both how the robot drives around as

(06:18):
well as how the arms move and we're doing tasks
like wiping down counters, folding laundry, putting dishes into dishwashers,
plugging cables into data center racks, assembling cardboard boxes, lots
and lots of different tasks that might be useful for
robots to do, and recording all the data. So we

(06:38):
have cameras on the robots. There are sensors on the
joints on the motors of the robots as well, and
we record that in like a synchronized way across time.

Speaker 1 (06:47):
So when you do it, it's like kind of like
a real world video game, like you're moving your arms
in these things, and in basically real time, the robot
arm is moving and picking up the thing you wanted
to pick up, And like, what's it like? Is there
like a curve where like at the beginning it's really bad?
Sort of tell me talk me through an instance.

Speaker 2 (07:06):
And it depends on the person. So some people can
pay it really really quickly. Some people are a bit
slower to pick it up. I've pride myself in being
a pretty good operator, and so I have done tasks
as complex as peeling a hard boiled egg with the robot,
which is how are.

Speaker 1 (07:22):
You how are you at peeling a hardboard hard boiled
egg with your hands.

Speaker 2 (07:27):
It's pretty hard with my own hands too, yeah, and
with the robot is even harder.

Speaker 1 (07:31):
Tell me about the robot peeling a hard build egg
because that sounds like a hard one. Yeah.

Speaker 2 (07:35):
So the robots, basically, all the robots that we're using
are like kind of pincher grippers. They're called parallel drag rippers,
where there's just one degree random like open clothes two pincers.

Speaker 1 (07:44):
It's basically two pincers, like two.

Speaker 2 (07:46):
Pinters, two arms. Yeah, exactly, and I've used that exact setup.
There's six different joints on the arm, so it can
move as kind of full basically full range of motion
in three D space and three D rotation, and you
can use that to peel a hard boiled egg. You
don't have any tactile feedback, so you can't actually feel
the egg, and that's actually one of the things that
makes it more difficult. But you can actually you can

(08:08):
use visual feedback to compensate for that. And so just
by looking at the egg myself, I'm able to figure
out if you're like in contact with something, and you just.

Speaker 1 (08:18):
Use one prong of the claw like what I could say,
you squeeze it a little to crack it, and then
use like one prong of the claw to get the
shell off.

Speaker 2 (08:26):
Yeah, exactly, so you can. You want to crack it
initially and then hold it with one gripper and then
use basically one of the two fingers in the gripper
to get pieces of shell off. When we did this,
we heart boiled only two eggs and the moss egg.
This is actually a Stanford The first egg and graduate
student ended up breaking and so that I did the
second egg, and I was able to successfully not break

(08:49):
it and fully peel it. It took some patience, certainly,
and I wasn't able to do it as quickly as
with my own hands, But I guess goes to show
the extent to which we're able to control robots to
do pretty complicated things.

Speaker 1 (09:02):
Yeah, and so obviously, I mean that is a stunt
or a game or something fun to do with the robot.
But presumably in that instance, as in the other instances
of folding clothes and vacuuming it like, there is learning, right.
The idea is that you do it some number of
times and then the robot can do it, and then
presumably there's also generalization. But just to start with learning,

(09:24):
like you know, reductively, how many times do you got
to do it for the robot to learn it?

Speaker 2 (09:31):
Yeah, so it really depends on the extent to which
you want the robot to handle different conditions. So in
some of our research, we've been able to show the
robot how to do something like thirty times or fifty times,
and just with that maybe sounds like a bit, but
you can do that in like typically less than an
hour if it's a simple task, and from that the

(09:52):
robot can under the circumstances. You only kind of demonstrate it.
In a narrow set of circumstances, like a single environment,
a single particular object, the robot can learn just from
like less than hour of data.

Speaker 1 (10:05):
What is an example of a thing that the robot
learned in less than an er of data?

Speaker 2 (10:09):
Oh yeah, we put a shoe on a foot, We
tear it off a piece of tape and put it
on a box. We've also hung up a shirt on
a hangar.

Speaker 1 (10:19):
So that's not that much I mean, especially because you
say the robot, but what you really mean is the model.
So every robot, right, presumably or every robot that's built
more or less like that one, right, Like that's one
of the key things. It's like you're not teaching one robot,
you're teaching every robot ever, because it's it's software fundamentally,
it's an am model. It's not hardware.

Speaker 2 (10:39):
Yeah, yes, with the caveat that, if you want to
be this data efficient, it works best if it's like
in the same like the same color of the table,
the same kind of rough initial conditions of where the
objects are starting, right, and the same shirt for example.
So this is just with like a single shirt and
not like any shirt.

Speaker 1 (10:55):
So there's there's like concentric circles of generalizability, right, like
exact same shirt, exact same spot, exact same table versus
like fold a shirt versus fold clothes, right and versus.
And so is that just infinitely harder, Like how does
that work? That's your big that's your big challenge at
some level, right, Yeah.

Speaker 2 (11:16):
So generalization is one of the big one of the
big challenges, not the only one, but it's one of
the big challenges. And in some ways, I mean the
first unlock there is just to make sure that you're
collecting data not just for one shirt, but collecting it
for lots of shirts, or collecting it for lots of
clothing items, and ideally also collecting data with lots of
tables with different textures, and also like not just visual

(11:37):
like appearances, but also like if you're folding on a
surface that has very low friction, like it's very smooth,
versus a surface that like maybe on top of carpet
or something that's going to behave differently when you're trying
to move the shirt across the table. So having variability
in the scenarios in which the robot is experiencing in
the data set is important, and we've seen evidence that

(12:02):
you set things up correctly and collect data under lots
of scenarios, you can actually generalize to completely new scenarios.
And in like Pile five release, for example, we found
that if we collected data in roughly like one hundred
different rooms, then the robot is able to do some
tasks in rooms that it's never been in before.

Speaker 1 (12:23):
So you mentioned Pile five, So PI zero point five
that's your latest model that you've released, right, tell me
about that, Like, what what does that model allow robots
to do? Like what robots and what settings and what tasks.

Speaker 2 (12:39):
Yeah, yeah, definitely. So we were focusing on generalization. So
the previous model, we were focusing on capability, and we
did a really complicated task of laundry folding. From there,
we wanted to answer, like, Okay, that model worked in
one environment. It's fairly brittle. If you put it in
a new environment, it wouldn't work. And we wanted to
see if we put robots in new environments with new objects,

(12:59):
new lighting conditions, new furniture, can the robot be successful.
And to do that, we collected data on these manipulators,
which feels like a terrible name, but robots with two
arms and wheels that can drive around kind of like
a humanoid, but we're using wheels instead of legs, a
bit more practical in that regard, and we train the

(13:22):
robot to do things like tidying a bed, or wiping
spills off of a surface, or putting dishes into a sink,
or putting away items into drawers, taking items of clothing,
dirty clothing off the floor and putting them into a
laundry basket, things like that, And then we tested whether
or not after collecting data like that and lots of
environments aggregated with other data, including data on the internet.

(13:46):
Can the robot then do those things in a home
that has never been in before. And in some ways
that sounds kind of basic, like people have no problem
with if you can do it something in like one home,
probably could do the same thing in another home. It's
not really doesn't seem like a complicated thing for humans,
but for robots that are trained on data, if they're

(14:08):
only trained on in one place there are whole universe,
is that one place they haven't ever seen any other place?
This is actually kind of a big challenge for existing methods.
And yeah, it was a step four. We were able
to see that it definitely isn't perfect by any means,
and that kind of comes to another challenge, which is reliability.
But we're able to see the robot do things in

(14:29):
homes it's never been in before, where we set it up,
ask it to do things, and it does some things
that are useful.

Speaker 1 (14:33):
So like in the classical setting where a robot is
changed in one room, like it doesn't even know that
room is a room. That's just like the whole world
to the robot, is that world right? And if you
put it in another room, it's in a completely unfamiliar
world exactly.

Speaker 2 (14:48):
And so for example, what we were talking about, like
hanging up a shirt, its whole world was like that one,
like like a black tabletop that smooth, that one blue shirt,
that one coat hanger. And it doesn't know about this
entire universe of other shirts and other.

Speaker 1 (15:01):
It doesn't know that there is a category called shirt.
It only knows.

Speaker 2 (15:04):
Yeah, it doesn't even know what shirts are.

Speaker 1 (15:06):
Yeah, it doesn't even know what shirts are. For pie
zero point five, Like, what did you ask the robot
to do? And how well did it work?

Speaker 2 (15:13):
Yeah, So we trained the model. We took actually a
pre trading language model with also like a vision component,
and we fine tuned it on a lot of data,
including data from different homes across San Francisco, but actually
a lot of other data too. So actually only two
percent of the data was on these like mobile robots
with arms. So we can store how the motors were

(15:35):
all moving in all of our previous data and then
train the model to mimic that data that we've stored.

Speaker 1 (15:40):
It's like it's like predicting the next word, but instead
of predicting the next word, it's like predicting the next movement.
Or something like yes, exactly.

Speaker 2 (15:48):
We've kind of trained it to predict next actions or
next motor commands instead of next words. We do an
additional training process to have it focus on and be
good at the mobile robot data and homes. Then we
set up the robot in a new home and we
give it language commands, so we can give it low
level language commands, or we can actually all so give

(16:09):
it higher level commands. So the highest level of command
might be cleaned the bedroom. And one of the things
that we've also been thinking about more recently is can
you give it a more detailed description of how you
want it to clean the bedroom? But we're not quite
there yet, So we could say clean the bedroom. We'd
also tell it put the dirty clothes in the laundry basket,
so that would be kind of a subtask. Or we
can tell it like commands like pick up the shirt,

(16:32):
put the shirt in the laundry basket. Then after we
tell it that command, then it will go off and
follow that command and actually in most cases realize that
command successfully in the real world.

Speaker 1 (16:47):
How did it do.

Speaker 2 (16:48):
So it depends on the task. The average success rate
was around eighty percent, so definitely room for improvement, and
in many snares it was able to be quite successful.
We also saw some some failure modes where for example,
if you're trying to put dishes into a sink, sometimes
one of the dishes was a cutting board, and picking
up a cutting board is actually pretty tricky for the

(17:09):
robot because you either need to slide it to the
edge of the counter and then grasp it or somehow
get the kind of get the finger underneath the cutting board.
And so sometimes it was able to do that successfully.
Sometimes it struggled and got stuck. The exciting thing though,
was that it was able to We were able to
kind of drop it in place as it had never
been before. And I was doing things that are quite reasonable.

Speaker 1 (17:32):
So what are you doing now, Like, what's the next
thing you're trying to get to? Yeah?

Speaker 2 (17:35):
Absolutely, So the next thing we're focusing on is reliability
and speed. So I mentioned like around eighty percent for
these tasks. How do we get that to ninety nine percent?
And I think that if we can get the reliability up,
that's kind of, in my mind, the main missing ingredient
before we can like really have these being like useful

(17:58):
in real world scenarios.

Speaker 1 (18:00):
So getting to ninety nine percent is interesting. I mean,
I think of self driving cars right where it seemed
sometime go I don't know, ten years ago, fifteen years ago,
like they were almost there, and I know they're more
almost there now. I know in San Francisco there really
are self driving cars, but they're still very much at
the margin of cars in the world, right, And it

(18:22):
does seem like almost there means different things in different settings,
But I don't know. Is it super hard to get
from eighty percent to ninety nine percent? Does the self
driving car example teach us anything for your work?

Speaker 2 (18:39):
The self driving car analogy is pretty good. I do
think that fortunately, we may not need There are scenarios
where we may not need it to be quite as
reliable as cars. Cars there is a much much higher
safety risk. It's much easier to hurt people, and in
robots there are safety risks because you are in the
physical world. But it's easier to put in software precautions

(19:03):
in place and even hardware precautions in place to prevent
that as well, So that makes it a little bit easier.

Speaker 1 (19:08):
I mean, nine percent probably isn't good enough for cars, right,
They probably need more nines than that, whereas it may
well be good enough for a house.

Speaker 2 (19:16):
Cleaning robots, yeah, in certain circumstances. And yeah, like we're
also thinking about scenarios where maybe even less than that
is fine. And if we view humans and robots working together,
it's more about kind of helping the person complete the
task faster or complete the task more effectively. So I
think there might be scenarios like that, but still we
need the performance and reliability to be higher for the

(19:39):
robots to be faster in order to accomplish that.

Speaker 1 (19:44):
We'll be back in just a minute. What do you
imagine as the initial real world use cases?

Speaker 2 (20:05):
I don't know. There's a lot of examples of robotics
companies that have a tempted to kind of start with
an application and hone in on that, and I think
the lesson from watching those companies is that you end
up then spending a lot of time on the problems
of that specific application and less on developing the sort

(20:26):
of generalist systems that we think in the long run
will be more effective. And so we're very focused on
understanding what are the core bottlenecks and the core missing
pieces for developing these generalist models, and we think that
if we had picked an application now, we would kind
of lose sight of that bigger problem because we need
to solve things that are specific to that application. So
we're very focused on what we think are like the

(20:48):
core technological challenges. We have certain tasks that we're working on.
Some of them have been home cleaning tasks. We've also
have some more kind of industrial light tasks as well,
just to instantiate and actually be iterating on robots and
applications could range from things and homes to things in

(21:09):
workplaces to industrial settings. There's lots and lots of use
cases for intelligent robots and intelligent kind of physical machines.

Speaker 1 (21:19):
What are some of the industrial tasks you've been working on.

Speaker 2 (21:24):
One example that I mentioned before is inserting cables. There's
lots of use cases in data centers, for example, where
that's a challenging task. Another example is constructing cardboard boxes
and filling them with items. We've also done some packaging
tasks highly relevant to lots of different kind of shipping operations.

(21:44):
And then even folding clothes. It seems like a very
home task, but it turns out that there are companies
that need to fold like very large lots of clothing,
and so that's also something that in the long term
could be used in larger scale settings.

Speaker 1 (22:01):
So I've read that you have open sourced your model
weights and given designs of robots to hardware companies, and
I'm interested in that and that set of decisions, right,
that set of sort of strategic decisions. Tell me about
that sort of giving away IP basically.

Speaker 2 (22:20):
Right, yeah, yeah, definitely. So this is a really hard problem,
especially this longer term problem of developing a general system.
We think that the field is very young, and there's
like a couple of reasons. One is that we think
that the field needs to mature, and we think that
having more people being kind of competent with using robots

(22:41):
and using this kind of technology will be beneficial in
the long term for the company, and by open sourcing things,
we make it easier for people to do that. And
then the second thing is, like the models that we
develop right now, they're very early, and the models that
we'll be developing one to three years from now are
going to be far far more capable than the ones

(23:02):
that we have now. And so it's kind of like
like equivalent to like open eye open sourcing GPT to
GPT three. They actually didn't open source GPT three, but like,
I think that they would still be in an excellent
spot today if they had.

Speaker 1 (23:19):
Like what could go wrong that would either prevent you
as a company from succeeding or even hold back the
field In general, I don't think we.

Speaker 2 (23:28):
Entirely know the scale of data that we need for
getting really capable models. And there's a little bit of
a chicken and egg problem where it's a lot easier
to collect data once you have a really good model.
It took like large amounts of data.

Speaker 1 (23:43):
Right, Or if there were thousands of robots out of
the world running your model, they would just make an
incredible amount of data coming into you every day, right.

Speaker 2 (23:50):
Yeah, yeah, exactly. So that's that's one thing I actually
less maybe less a little bit less concerned about that myself.
And then I think the other thing is just that
there are technological challenges to getting these things to work
really well. I think that I think we've had incredible
progress over the last year and two months, the last
like fourteen months. I think since we've started, probably more

(24:12):
progress than I was expecting, honestly compared to when we
started the company. I think it's like wild that we
were able to get a robot to like unload and
fold laundry like a ten minute long task.

Speaker 1 (24:25):
And folding laundry is like a famously hard robot problem, right,
Like it's the one that people in robotics talk about
when they talk about things people think are easy are
actually hard for robots, right.

Speaker 2 (24:37):
Yeah, absolutely absolutely. I mean you have to deal with
all sorts of variability and how clothes can be crumpled
on each other. And also it's like there's even like
really small, minor things you need to do in order
to like actually get it to be flat on the
table and folded nicely and even stacked. And as the
task gets longer as well, there are more opportunities to
make mistakes, more opportunities to get stuck. And so if

(24:58):
you're doing a task it takes ten minutes, in those
ten minutes, there's many many times where the robot can
make a mistake that it can't recover from or just
get stuck or something like that. And so being able
to do such a task starts to kind of point
at the resilience that these models can have by recovering
from those mystics. Uh huh, so when we were first
trying to fold laundry, like, one of the common failure

(25:20):
modes is that it would fold the laundry like very
well by my standards at the time, I would be
very very happy with the robot, and then it would
push the entire stack of laundry onto the ground.

Speaker 1 (25:32):
Sort of like teaching a toddler to fold clothes.

Speaker 2 (25:36):
Yeah, yeah, exactly.

Speaker 1 (25:37):
Was there a particular moment when you saw a robot
using your model full close for ten minutes and it worked.

Speaker 2 (25:46):
Yeah. First off, we started with just folding a shirt
starting flat on the table. We got that to work
pretty quickly that it turns out to be pretty easy,
and I wasn't too surprised by that. And then we
moved from that to starting it in like just a
random ball, like some sort of crumpled position on the table,
and then you have to flatten and then fold it,
and that makes a problem dramatically harder because of all

(26:07):
the variability having to figure out how to flatten it.
We were kind of stuck on that problem for at
least a couple of months, where everything we're trying, the
success rate of the robot was zero percent. It wasn't
able to really make progress on it, and we started
to see signs of life I think in August or

(26:28):
September of last year, where we tried a new recipe
where we were continue to train the model on a
curated part of the data that was following a consistent strategy,
and that sort of high quality post training is what
really seemed to make the model work better. And then
the moment that I was most excited about was the
first time that I saw the model flatten and fold

(26:52):
and stack five items in a row.

Speaker 1 (26:54):
Yeah.

Speaker 2 (26:54):
I just remember going home that night and being like
so excited. It seemed like we had just like figured
out this this big missing puzzle piece.

Speaker 1 (27:02):
So I was asking you why might it not work
or what might slow the field down? And then we
talked about the happy short story. But if in five
years things didn't progress as quickly as you thought, what
might have happened.

Speaker 2 (27:16):
I mentioned that I think that incorporating practice, like allowing
the we're about to practice the task, should be really
helpful for allowing robots to get better. We don't know
what exactly that recipe will look like, and so it's
like a research problem, and with any sort of research problem,
you don't know exactly how hard the solution is going

(27:36):
to be, and I think that there are some other
more nuanced unknowns as well that are somewhat similar to that.
And we have a large number of very talented researchers
on our team because we think that there are some
of these unsolved breakthroughs that are going to be needed
to really truly solve this problem.

Speaker 1 (27:54):
So, if it does work well and things progress in
that universe, what would you be worried about?

Speaker 2 (28:06):
Good question? I mean, if things work well, I shouldn't
be too worried. In general. I do think that it's
very easy in general to underestimate the challenges around actually
deploying and disseminating technology that takes time, and when the
technology doesn't exist yet, that means that like the world
is not in a place that is like ready for

(28:26):
that technology. I think that there's a lot of unknowns there.

Speaker 1 (28:29):
I mean, one of the striking things to me about, say,
language models, is the people who know the most about
them seem to be the most worried about them, which
is generally not the case. I think historically with technology,
right the possible exception of the atomic bomb, and so
I'm curious. I mean those kinds of worries, like do

(28:51):
you share them? Are there worries you have about developing
a foundation model for robots about bad actors using it?

Speaker 2 (28:57):
Even I do think that, like, yeah, there's plenty of
technology that has dual uses, and I think there are
applications of technologies that are harmful. I think that a
lot of the concerns in the language model community stem

(29:17):
from imviewing these systems with greater autonomy. And I think
that I work like hands on with the robots quite
a bit, and I don't see a world in which
they will be taking over in any way. It's very
easy to just like, well, with our current iteration of robots,

(29:38):
to just like if we threw some water on it,
the robot wouldn't be in trouble.

Speaker 1 (29:42):
So that might be a problem for you, but I'm
sure you could solve that way we're working.

Speaker 2 (29:48):
We're working on so we actually do have a new
iteration that that is actually a lot more waterproof. But
it's just not a concern that I show.

Speaker 1 (29:54):
Okay, interesting basically just because you think we can whatever
turn it off if we need to.

Speaker 2 (30:01):
Yeah, and yeah, and I think, yeah, there's always going
to be dual use concerns, but I think that the
pros of the technology outweigh outway some of the Jobson's.

Speaker 1 (30:09):
Well, give me the happy story, then, like in what
what number of years should we choose for a happy story?
Ten is ten too soon?

Speaker 2 (30:16):
I don't want to put a number to it. I
think that they with research, you don't know exactly how
thongs things will take. And I an envision a world
where the when you're developing hardware, it's it's not too
hard to actually teach it to do something, and teach
it to do something useful, rather than just having machines

(30:38):
that are not particularly intelligent, like dishwashers and laundry machines
and so forth.

Speaker 1 (30:45):
Go bigger if you would like what like what what
what would be pill be teached robots to do in
that world, I.

Speaker 2 (30:53):
Guess if we were to go bigger, I think that
there's a lot of challenges around helping helping people as
the age allowing them to be more independent. That that's
like a huge one. I think that I don't know, manufacturing,
there's all sorts of places where like there's abuse of
labor practices and we can maybe like be able to
eliminate those if it's a robot instead of a human. Yeah, many, many,

(31:15):
many examples. And I think that there's also even things
that are even hard to imagine because the technology doesn't exist.
So a lot of the things that I'm thinking about
are robots helping humans in different circumstances to allow them
to be more productive. But once something exists, like you often,
like people are creative and come up with new ways
of how that's used.

Speaker 1 (31:37):
We'll be back in a minute with the lightning round. Great,
let's finish with the lightning round. What's one thing that
working with robots has caused you to appreciate about the

(31:58):
human body?

Speaker 2 (32:00):
Our skin is pretty amazing.

Speaker 1 (32:02):
Huh. Well, so we didn't talk about I mean a
sense of touch, or of of heat or of cold, right,
I mean presumably the models you're building, the robots you're
using don't have that, but they could, right, they could
have a sense of touch. Is anyone working on that?
Is that of interest to you?

Speaker 2 (32:22):
Lots of people working on it. I think it's pretty interesting.
I think that the hardware technology is not super mature
compared to where I'd like for it to be in
terms of how robust it is. And the cheapness and
the resolution that said, Like, we actually put cameras on
the risks of our robot to help it get some
sort of tactile and for example, if you can, if
you like visually look at your finger as you make

(32:45):
contact with an object, you can see it to form
around that object, and you can actually just by looking
at your finger get some notion of tactile feedback similar
to what our skin gets. Yeah, and cameras are cheap,
really easy, robust, way more robust and cheap than existing
technology for tactile something.

Speaker 1 (33:04):
I've heard you say that humanoid robots are overrated, and
I'm curious, why do you think that.

Speaker 2 (33:11):
I think that simplicity is really helpful and important when
trying to develop technology. When you introduce more complexity than's needed,
it slows you down a lot. And I think that
the complexity that humanoids introduce. Yeah, I think that if
all of the robots we were working with were humanoids,
I think that we wouldn't have made anywhere near the

(33:31):
progress that we've made because we'd be dealing with additional challenges.
I also think that optimizing for ease of data collection
is really important in a world where we need data,
and it's a lot harder to collect and operate all
of the different joints and motors of a humanoid than
it is to control a simpler robot.

Speaker 1 (33:52):
Do you anthropomorphize robots?

Speaker 2 (33:55):
I hate it when people are anthrough morphize robots. I
think that it is misleading because the failure modes that
robots have are very different from the failure modes that
people have, and it misleads people into thinking that it's
going to behave in the way that people behave.

Speaker 1 (34:12):
Like like in what way?

Speaker 2 (34:14):
Oh like, if you see a robot doing something like
doing a backflip, like or even folding laundry, you kind
of assume that anything like like if you saw a
person do that, then they probably could do a lot
of other things too. And if you have to promorphize
the robot, then you assume that it, like the capabilities
that you see are representative as if it were like
a human ah, and that it could do a backflip anywhere,

(34:35):
or that it could fold laundry anywhere with any item
of clothing.

Speaker 1 (34:39):
Or surely you would think a robot that could do
a backflip could fold a shirt, but no.

Speaker 2 (34:45):
Exactly exactly, so sometimes it's fun to like assign emotions
to some of the things, or say the robots having
a bad day, because certainly it feels like that sometime.
But when it kind of moves beyond fun and jokes,
it might have consequences that I don't think makes sense.

Speaker 1 (35:02):
I read that there was a researcher who said they
would retire if a robot tied to shoela Yes, and
then one of your robots tied to shoelace, and I
guess they didn't retire. But I'm curious. What would you
need to see a robot do to retire.

Speaker 2 (35:23):
Hmm, I don't know. I guess one example that I've
given before that I would love to see a robot do.
I don't think this is quite retirement level, but being
able to go into a kitchen that has never been
in before and make a bowl of cereal pretty basic,
especially compared to doing a backflip. I cannot do a
backflip myself, but I could make a bowl of cereal.

(35:44):
But it requires being able to find objects in the environment,
being able to interact with delicate objects like a cereal box,
maybe even use tools in order to open the cereal box.
Pouring liquids. Yeah, so that's a task that I love,
and I could actually even see us being able to
show a demo of that without too much difficulty actually
if we put our mind to it and in collected

(36:06):
data for it. So it actually is, I think, or
within reach than maybe I imagined a few years ago.

Speaker 1 (36:12):
Just as you're thinking about it, it's getting closer. You're like, oh, wait,
we could do that.

Speaker 2 (36:18):
Yeah. I mean we've actually collected data of pouring cereal,
like opening a cereal box and pouring it into a bowl.
We haven't yet done liquid handling and pouring, but I
think we're actually going to do it this week. On
the Robot, I asked the hardware team to make a
waterproof robot. So we're not too far. A lot of
the pieces are coming together. I also, I love working

(36:38):
with robots and so, and I'm also fairly young, I
think not too old, and so I don't imagine myself
retiring anytime soon.

Speaker 1 (36:53):
Chelsea Finn is a Stanford professor and the co founder
of Physical Intelligence. You can email us at problem at
pushkin dot fm, and please do email us. I read
all the emails. Today's show was produced by Gabriel Hunter Chang,
edited by Alexander Garreton and engineered by Sarah Bruguerrett. I'm
Jacob Goldstein and we'll be back next week with another

(37:15):
episode of What's Your Pop

All Episodes

Episode Transcript

Popular Podcasts

On Purpose with Jay Shetty

Dateline NBC

What Are We Even Doing? with Kyle MacLachlan

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Teaching Robots How to Do Everything

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}On Purpose with Jay Shetty

Dateline NBC

What Are We Even Doing? with Kyle MacLachlan

All Episodes

Teaching Robots How to Do Everything

On Purpose with Jay Shetty