Gemini Robotics: How Google DeepMind Is Making Robots Think for Themselves

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Can imagine a world where a robot isn't just you know,

(00:03):
a tool falling orders, but a real independent agent, thinking
on its feet, even when it's totally cut off. Picture this,
maybe a sophisticated robot deep inside a collapsed mind, dangerous
unstable ground all around it. Or maybe it's way out
there on Mars, dusty empty, analyzing rocks. There's no Wi
Fi out there, no cloud connection, nothing, just dust, silence,

(00:27):
the mission and its own brain, a really powerful one
right inside it. And this robot it moves with purpose,
makes critical calls in milliseconds, reacts to stuff it didn't expect.
No human, no server telling it exactly what to do.
It's free, no invisible digital leash. But you're probably thinking,
like I was, hang on, isn't that still kind of
sci fi? Aren't robots now still mostly you know, tied

(00:48):
to networks.

Speaker 2 (00:48):
That's exactly right.

Speaker 1 (00:49):
Always those frustrating delays or needing a person watching over anything.

Speaker 2 (00:53):
Tricky, you're ritting the nail on the head. Yeah, that reliance,
that network dependency. Yeah, it's been the chain holding back
true autonomy for ages.

Speaker 1 (01:01):
Yeah, I mean, think about our own lives, right, my
smart speaker sometimes just gives up if the Internet flickers,
it's maddening.

Speaker 2 (01:09):
And imagine that dependency in a critical situation exactly.

Speaker 1 (01:12):
A robot doing surgery or maybe handling something dangerous and
it just freezes because the connection dropped. That's been the terrifying.

Speaker 2 (01:20):
Limit and that reliance. It's been the biggest practical hurdle.
Forget the cool stuff in labs for a second, getting
robots out into the real world, the messy, unpredictable world.
Latency and connectivity have always been the blockers, always querying
a server miles away for every little adjustment. Yeah, it's
like imagine trying to drive a car with a two

(01:41):
second leg on this steering wheel.

Speaker 1 (01:43):
Yeah, okay, impractical and dangerous precisely.

Speaker 2 (01:46):
So what we're diving into today, it's not just a
bit faster, a bit better. It's a fundamental shift. It's
about putting the brain right there on the robot itself.

Speaker 1 (01:55):
And that is our mission for this deep dive everyone,
because something really huge just dropped, seriously transformative. Just yesterday,
June twenty fourth, Google deep Mind unveiled something well revolutionary.
They call it Gemini Robotics on device. So we're gonna
unpack this dig into what it can do and trust me,
some of it sounds almost unbelievable, and figure out what

(02:17):
it means for robots and honestly, for all of us.
This feels like a massive leap, cutting that leash for real,
giving robots their own brains. Get writing for a serious
nugget of knowledge here.

Speaker 2 (02:27):
It really is about that on device part. That's the
core of it. Not just a better cloud model, not
just faster processing somewhere else. It's a whole new architecture
for how robot ai works in the physical world, moving from,
like you said, remote controlled puppets, yeah, to something with
its own sophisticated mind right there. It's about self sufficiency,
instant response, true embodied intelligence.

Speaker 1 (02:48):
Okay, wow, So let's get into the nitty gritty. This
announcement Gemini Robotics on Device tell us more right.

Speaker 2 (02:54):
So, June twenty fourth, Carolina Parata and her team at
deep Mind, they announced this Gemini Robotics on the Vice.
Think of it as a kind of stripped down but
seriously supercharged version of Gemini two point zero.

Speaker 1 (03:07):
Which is already a beast right tells text, images, video,
audio exactly.

Speaker 2 (03:12):
A multi modal powerhouse. Now imagine taking that complexity, that
power and optimizing it like crazy. So it doesn't run
on a giant server.

Speaker 1 (03:20):
Farm, but runs where on the robot.

Speaker 2 (03:23):
Directly on the robot itself.

Speaker 1 (03:25):
And that is the bit that just floors me, the
absolute jaw dropper.

Speaker 2 (03:28):
It is because the key thing, that the really transformative
part is this system does not need the cloud at all,
No cloud, no Wi Fi, doesn't need even a sniff
of an Internet connection to do its job. It runs
entirely one hundred percent on the robot.

Speaker 1 (03:41):
Wow. Okay, just think about that. We live tethered to
the net, right, phones, smart homes, cars, they all seem to.

Speaker 2 (03:47):
Need that connection constantly pulling data.

Speaker 1 (03:50):
My kids have a meltdown if the WiFi drops for
ten seconds during a game. So the idea of a
full blown AI like Gemini running completely offline on the
machine itself, it almost feels wrong somehow, like cutting the
cord entirely.

Speaker 2 (04:06):
It is cutting the cord and operationally that's well, it's revolutionary.
It tackles that achilles heel.

Speaker 1 (04:12):
You mentioned, right, the connectivity issue. If you're underground or
in a giant warehouse with patchy Wi Fi, or a.

Speaker 2 (04:18):
Secure facility where data just can't leave.

Speaker 1 (04:20):
Yeah, that dependence is a killer. So on device isn't
just neat tech. It's about making robots work reliably everywhere.

Speaker 2 (04:28):
And the immediate impact. Ye massive. First off, like we said,
DeLay's gone completely eliminated. Is the brain's right there exactly,
no network lag, no server cues, no sending data across
the planet in back, which means real time thinking and crucially,
real time action, whether it's on a chaotic factory floor
or doing something delicate in your home, or even exploring

(04:48):
Mars where the signal delay is minutes long, right.

Speaker 1 (04:51):
Minutes, You can't wait minutes for the robot to decide
don't drive off that cliff.

Speaker 2 (04:55):
Precisely, That immediate response is critical, yeah, for safety, for
getting the job done. It just changes the game for
what these machines can actually do in the real world.

Speaker 1 (05:06):
It really is like the difference between needing to phone
a friend for every single decision versus just knowing what
to do instantly. That kind of split second action. It's
not just faster, it's what allows for precision for safety,
Like catching that falling glass you mentioned a tenth of
a second delay and it shattered.

Speaker 2 (05:24):
Yeah, exactly.

Speaker 1 (05:24):
So this really is cutting the leash, giving robots their
own brain, fast, adaptable, always ready. It feels like a
huge step up in self sufficiency.

Speaker 2 (05:32):
And getting there from an engineering standpoint, yeah wow. It
took an almost obsessive focus on efficiency because it has
to run offline on the robot's own.

Speaker 1 (05:41):
Hardware, power, limited space.

Speaker 2 (05:43):
Exactly, every calculation, every bit of memory, every wat of
power matters. Ye, deep minds engineers, they talk about shaving
the architecture.

Speaker 1 (05:52):
Shaving it like trimming it down.

Speaker 2 (05:54):
Yeah, basically meticulously pruning the neural network, compressing its knowledge
but keeping the intelligence, making its small enough to fit.
They targeted specific hardware for training dual arm Aloha rigs
they call them. These aren't cloud servers, they're low power
chips you'd find in a mobile robot embedded stuff. Okay,
but what's truly amazing, the real engineering feat is that

(06:16):
even after all that shrinking and optimizing, it still has
the core intelligence. It still handles the vision stuff seeing
the world, the language understanding complex commands, and the action
part translating thought into movements.

Speaker 1 (06:30):
All of it, vision, language, action, all.

Speaker 2 (06:32):
In one meat package. As they put it's not a
dumb down version. It's a fully integrated, compact brain designed
for local execution, no strings attached, thinking big AI fit
in a small box.

Speaker 1 (06:43):
That is just seriously impressive. It really is like shrinking
a supercomputer down. I remember those old room size computers,
and now we've got more power in our phones. This
is that same kind of leap, but for a robot's
entire mind, packing all that vision, language action to something
that fits on the robot. Wow, what were the hurdles there?
How did they pull that off?

Speaker 2 (07:04):
Oh? Huge hurtles? Hardware, software, algorithms, everything. First, just getting
a model like Gemini with all its parameters onto an
embedded GPU board that takes clever tricks like quantization and pruning,
essentially compressing the model without losing the smarts.

Speaker 1 (07:21):
Okay, so compressing the brain basically right, then making sure
it things fast enough inference they call it within tens
of milliseconds.

Speaker 2 (07:28):
That means optimizing every single layer, making the software talk
perfectly to the hardware code, designing them really squeezing every
drop of performance out of the chip within tight power limits.
It's not just shrinking, it's rethinking computation at the edge
and the payoff You see it directly in the latency
that control loops, ce process, decide act that stays way

(07:49):
below old thresholds. We're talking tens of milliseconds, not hundreds.

Speaker 1 (07:53):
Tens versus hundreds. That sounds significant.

Speaker 2 (07:55):
It's night and day. Take that factory example. A sensor
sees something shitft slightly on a fast belt. With older systems,
by the time the robot thinks asks, the cloud gets
an answer, decides to move. The part's gone past error.

Speaker 1 (08:09):
Collision may be right too slow, but with tens of.

Speaker 2 (08:11):
Milliseconds, the planner reacts before the momentum carries it away.
It grabs it, perfectly, prevents errors, makes the whole operation safer,
more reliable, especially when things are constantly changing. You by
real measurable performance.

Speaker 1 (08:26):
That example hits home. One hundred milliseconds. Yeah, if you're
assembling something tiny, that's an eternity, you crush it. But
ten milliseconds that's basically instant for us. It really does
make the difference between gently placing something delicate and while
watching the job it unlocks tasks that needed that human
touch before. And the performance numbers deep Mind shared they

(08:47):
really seem to prove this isn't just theory. They put
it head to head, didn't they?

Speaker 2 (08:51):
They did side by side charts this new on device
model versus their own top tier hybrid model, the one
that does use the cloud connected to those massive hulky
TPU pods Google super powerful AI chips.

Speaker 1 (09:04):
Okay, so the best cloud version versus the new local one, and.

Speaker 2 (09:07):
Also versus the previous best purely local VLA model visual
language action.

Speaker 1 (09:12):
Model and the results.

Speaker 2 (09:13):
This is where it gets, as they say, wild, the
new on device model it practically glues itself to the
hybrid curve, almost identical performance on visual tests understanding meaning
generalizing behavior.

Speaker 1 (09:26):
Wait, almost identical to the cloud connected one.

Speaker 2 (09:29):
Almost and it significantly beats the previous best on device model,
especially with tricky stuff objects it's never seen weird lighting wow,
and even on following complex multi step instructions using natural language,
it closes most of the gap with a big hybrid model.

Speaker 1 (09:44):
That is wild because the hybrid has that direct line
to basically unlimited computing power. Right those TPU.

Speaker 2 (09:50):
Pods exactly massive purpose built AI accelerators. Yet this little
sibling running entirely locally on an edge box, just a
small computer on the robot is nearly keeping pace.

Speaker 1 (10:01):
That just turns things on their head. The idea that
you needed the cloud for top tier AI performance maybe
not anymore. For a lot of things.

Speaker 2 (10:09):
It seriously challenges that assumption. It's a huge lead for
what's possible locally, suggest the cloud might be overkilled from
many critical robot tasks.

Speaker 1 (10:17):
Now it really is like that efficient marathon runner keeping
up with the engine strapped competitor. You just wouldn't expect it.
Incredible design and the actual tasks it can do. The
demos show some really well fiddly stuff, things that are
notoriously hard to program the old way. Oh absolutely, like
unzipping a soft lunch box. Think about how tricky that is.

(10:38):
The fabric the force needed compliance or folding shirts with
proper creases, not just flat shapes, but actual fabric that bunches.

Speaker 2 (10:46):
And slides, Understanding material properties.

Speaker 1 (10:48):
Pouring dressing into a narrow bottle without splashing that needs
serious control.

Speaker 2 (10:53):
Spatial awareness, fluid dynamics, fine motor.

Speaker 1 (10:56):
Skills, and sliding one card out of a deck that's
incredibly percipe tiny fiddly stuff, as they called it, These.

Speaker 2 (11:03):
Aren't just tricks. They show a deep understanding of the
physical world. Object manipulation, fine motor control, all without constant
handholding or cloud connection.

Speaker 1 (11:13):
They really do seem complex when you break them down
like that, folding a shirt isn't just moving arms, it's
sensing the fabric, adjusting.

Speaker 2 (11:20):
Force exactly and doing these fiddly things well without needing
tons of retraining. For every new object that shows real
generalization adaptability, it moves robots beyond just repetitive factory work
towards well more versatile, almost human like dexterity. Huge for
robots and homes, maybe restaurants, personal.

Speaker 1 (11:40):
Care, game changers, stuff, And maybe.

Speaker 2 (11:42):
Even more crucial than the tasks themselves is how they
learn them. The training efficiency. Yeah, the robot does these
complex things after seeing what maybe fifty to one hundred examples.

Speaker 1 (11:51):
Fifty to one hundred That sounds really low. Isn't AI
training usually about massive data sets thousands millions of example.

Speaker 2 (11:58):
Historically, Yes, especially for physical tasks like robotics, it's required
enormous amounts of data. Okay, And this session count matters,
as they say, because data collection is expensive, hugely expensive,
whether you do it in simulation like Google's mujogophysics simulator
or with real robots being guided by humans. Teleoperation footage right,
Getting all that data it takes time, specialized gear, skilled people,

(12:22):
loads of patients. So cutting that down to double digits
fifty one hundred examples, that just slashes the barrier for researchers,
for smaller labs, universities, startups, people who don't have thousands
of hours of tilly up footage lying around.

Speaker 1 (12:37):
Wow, that's huge. It tackles that massive bottleneck. Directly collecting
data isn't just costly, it's painstaking work. Teaching a robot
by showing it something a million times versus fifty, that
speeds everything up. Development, trying new ideas, lowering the.

Speaker 2 (12:53):
Conte democratizes it, let's more people in accelerates innovation everywhere.
What moves a massive roadblock?

Speaker 1 (12:59):
Okay, it's efficient to train, but then you mentioned something
about adaptability porting the brain.

Speaker 2 (13:04):
Ah, yes, this is another really exciting part. Google initially
only trained this on device model, on their own internal robot.

Speaker 1 (13:11):
A lower platform, or their own system.

Speaker 2 (13:12):
But then and this is the mind blowing bit, they
ported the exact same weights, the exact same brain, the learned.

Speaker 1 (13:18):
Knowledge, it's same neural network.

Speaker 2 (13:20):
Onto totally different robots. A Franca fr three arm common
in industry and even in Eptronics Apollo humanoid robot, completely
different body size, everything.

Speaker 1 (13:29):
Ooh okay, so they move the software brain to a
different hardware body exactly.

Speaker 2 (13:34):
And here's the kicker. No retraining from scratch, none, just
a short adaptation pass, a quick tweak, and suddenly the
models knew the different joint kinematics. They understood how to
use these totally different limbs.

Speaker 1 (13:47):
That's incredible. It's like learning piano then picking up a
guitar after five minutes and playing the same complex song perfectly.
It understands the concept of movement, not just one specific body.

Speaker 2 (13:57):
Precisely, an abstract understanding of motor control. It transcends the
specific hardware. That's amazing. And they called this idea embodiment
agnostic skill libraries basically train once, transplant almost anywhere. This
is well, it's a kind of holy grail in robotics.
I can see why it changes everything about scaling robot skills.

(14:17):
We saw amazing demos on the Franca arm. It did
complex assembly, lining up pulleys, tensioning of belt, real industrial stuff,
and it folded a long dress without stretching it, showing
that delicate touch again. Okay, then on the Apollo humanoids, taller,
different limbs, different balance. It still followed spoken commands, handled
random objects they had never seen. Put a Rubik's cube

(14:39):
in a pouch.

Speaker 1 (14:39):
On a completely different robot body without retraining.

Speaker 2 (14:43):
It's a massive step towards universal robot skills. Intelligence that
isn't locked to one machine, but can be deployed rapidly, crossed,
diverse hardware, faster development, lower costs, Whole new applications become possible,
like robot apps that work on almost any robot.

Speaker 1 (14:57):
That is a seriously futuristic vision universal robot skills, like
software compatibility across computers, breakthroughs on one robot benefit everyone everywhere,
instantly speeds up the whole field.

Speaker 2 (15:07):
Wow.

Speaker 1 (15:08):
Now developers must be absolutely chomping at the bit for
this is Google keeping.

Speaker 2 (15:11):
It locked up or no. And this is another big
positive signal. They're empowering developers with the Gemini Robotics SDK
Software Development.

Speaker 1 (15:19):
Kit and SDK Okay so people can actually build with
this exactly.

Speaker 2 (15:23):
Developers can sign up for early access, which is a
big deal. Shows they want broader adoption collaboration.

Speaker 1 (15:29):
What's in the SDK.

Speaker 2 (15:31):
It comes with code to interface with live robots which
is key, but maybe even more importantly, it includes full
Mojo COO simulation scenes.

Speaker 1 (15:40):
The physics simulator you.

Speaker 2 (15:41):
Mentioned right, and Mojoco's physics are super accurate but also
fast enough to generate contact rich demonstrations on a laptop.

Speaker 1 (15:49):
On a laptop, so you don't need a supercomputer to
train it exactly.

Speaker 2 (15:53):
You can gather those crucial fifty one hundred training examples
teaching it a new task, and realistic virtual kitchens or
factories safely, cheaply on your laptop before you even touch
a real expensive robot.

Speaker 1 (16:05):
That lowers the barrier. Massively, test, iterate, debug, all in
simulation first, much faster, much safer.

Speaker 2 (16:12):
And here's another critical piece, especially for companies with specific
needs or sensitive data. This is Google's first VLA where
fine tuning is officially supported locally.

Speaker 1 (16:21):
Locally meaning not sending data back to Google.

Speaker 2 (16:24):
Precisely, you don't send your proprietary data or even the
robots learned weights back to their servers. You collect your
specific data for your unique task, your unique objects, run
the optimization yourself on your own hardware entirely on prem
and then you push the updated customized brain straight onto
your bot's flash storage done.

Speaker 1 (16:45):
That's huge for customization and privacy, and just iterating quickly,
in the real world, you control the learning, you tailor
it exactly to your needs, no weighting, no external dependencies.

Speaker 2 (16:56):
It's a massive shift towards user control. But and this
is really important. Access isn't wide open just yet. It's
via a trusted tester program.

Speaker 1 (17:05):
Okay. Why the cautious approach.

Speaker 2 (17:07):
Safety simple as that Deep Mind keeps emphasizing their AI principles,
be helpful, don't be reckless. And this isn't just talk.
They've built serious safety layers into this system.

Speaker 1 (17:16):
That's good to hear because yeah, powerful autonomous robots, you
need robust safety my room. But's try and eat shoelaces
is one thing, right, But a big industrial arm new people,
that's a whole different level of risk. What are these
safety layers?

Speaker 2 (17:29):
Well, multiple layers. First, semantic filters. The system watches the
commands that's given, looks for awkward or unsafe instructions. The
example they give is someone telling the robot maybe casually
hand me that knife laid first. The system should recognize
that's dangerous and refuse or ask for clarification.

Speaker 1 (17:48):
Okay, an understanding of dangerous requests makes sense.

Speaker 2 (17:51):
Underneath that there's a low level safety controller. This constantly
checks and enforces hard limits, torque limits, collision cones, velocity caps,
basically physical boundaries the robot cannot exceed no matter what
the AI tells it. It's a hard fail safe prevents
too much force entering forbidden areas, moving too fast.

Speaker 1 (18:09):
Like physical guardrails built into the code exactly.

Speaker 2 (18:11):
And they've also developed a new semantic safety benchmark. This
is designed to deliberately test the system with weird edge cases,
ambiguous instructions, conflicting goals, to really stress test its safety responses.

Speaker 1 (18:23):
Trying to find the cracks before they cause problems.

Speaker 2 (18:26):
Right and beyond the tech, there's human oversight. A responsible
development and innovation team analyzes impact. A responsibility in safety
council has to sign off before code goes out. The
core message robots are heavy, potentially dangerous. You need multiple
layers watching your back, especially near humans.

Speaker 1 (18:43):
Yeah, absolutely so.

Speaker 2 (18:44):
The staged access the trusted Tester program makes sense. They
want eyes on real deployments, learn how it behaves in
complex situations before, well before someone less cautious straps a
chainsaw to a mobile base and live streams the result.

Speaker 1 (18:58):
Okay, yeah, good point, avoid the viral disaster videos.

Speaker 2 (19:01):
It allows for continuous safety checks, responsible scaling, prioritizing safety.

Speaker 1 (19:06):
That sounds incredibly thorough and responsible. It builds trust, which
is crucial for a technology like this. It's not just
about can we do it? But should we do it?
And how do we do it safely? Mature approach. So,
with all this incredible on device power, does this mean
Cloud AI for robots is like obsolete now or is
it more complicated?

Speaker 2 (19:26):
Definitely more complicated. Deep Mind is clear it's complementary, not
a replacement. Their top tier hybrid model, the cloud boosted one.
It's still more capable overall for really novel, super complex problems,
things requiring abstract reasoning pulling from vast novel problem. Their
example was something like pick up the clear cup with

(19:47):
blue marbles, then rotate it until the marbles form a
diagonal line. That level of abstract, geometric, multimodal reasoning. It
still benefits from the cloud's sheer power solving something truly new.

Speaker 1 (20:00):
So the cloud is still king for the really hard
novel puzzles.

Speaker 2 (20:03):
Right, But Carolina Pratta herself said everyone was surprised by
how strong on device is even calling it a starter
model initially, but adding its edging toward production grade. So
the key is strategic deployment on device becomes essential where
the cloud is a problem, like those warehouses with bad
Wi Fi, or defense contractors needing total on prem security.

Speaker 1 (20:23):
Or maybe remote locations minds space exactly, places where connectivity
is shaky or data sovereignty rules for bid cloud calls
in those spots.

Speaker 2 (20:34):
On device isn't just nice to have, it's a must have.
It enables reliable, secure operation where the cloud just can't
go or isn't allowed.

Speaker 1 (20:42):
That makes total sense. It's about the right tool for
the right job, and for some critical jobs, offline capability
is the only tool that works. Opens up whole new
areas for robotics.

Speaker 2 (20:52):
Absolutely and looking bigger picture, this is a real nudge
toward robots acting in real time and those crucial oftentimes environments, yeah, factories,
fulfillment centers, offshore oil rigs, places where cables are a luxury.
Think about the operational benefits that pick and place robot
in the warehouse where stuff changes daily, Late palettes, new packaging,

(21:13):
an adaptive low latency controller that learns new packaging skews
in one afternoon could shave serious downtime. No waiting for
reprogramming or cloud updates, it just adapts on the fly.

Speaker 1 (21:23):
Less downtime means more productivity, more money saved directly, and.

Speaker 2 (21:27):
It lowers the overall cost of ownership too well. First,
no recurring cloud compute bill for big operations. That's potentially
huge savings ongoing.

Speaker 1 (21:36):
Right, no subscription fees for the robot's brain.

Speaker 2 (21:39):
Second, fewer security audits for data egress, simpler compliance, less
overhead making sure data doesn't leave.

Speaker 1 (21:46):
When it shouldn't, easier security, and.

Speaker 2 (21:48):
Third, being able to reflash the same model across different
mechanical platforms you already own. That leverages your existing hardware
investments better, more flexible, more scalable, better ROI.

Speaker 1 (21:58):
Okay, those are solid business reasons. Lower running costs, simpler security,
better use of existing gear. That makes advanced robots much
more appealing, especially for businesses that were maybe hesitant before
because of cost or complexity.

Speaker 2 (22:13):
Definitely makes the economics look better and thinking about the future. Yes,
it fits into specific rigs right now, but it's designed
to scale down further as embedded chips get better. In
Nvidia's or in Qualcom's AI chips custom acix. This tech
could get even smaller, more efficient, maybe even more powerful.
A clear path to making these brains ubiquitous, smaller.

Speaker 1 (22:33):
Faster, cheaper robot brains. The future looks interesting.

Speaker 2 (22:36):
And that fine tune knob in the SDK is key
for users. You do not have a way for deep
mind to push a global update to teach your robot
something new and specific.

Speaker 1 (22:46):
Like the cupcake bot exactly.

Speaker 2 (22:48):
The cupcake frosting bot example is perfect. Local bakery wants
a unique frosting squirrel recorded a few dozen times. Run
the optimizer locally, Boom, the model learns your specific pattern,
all done on site.

Speaker 1 (23:00):
That's just brilliant and empowers small businesses, niche applications. Anyone
can customize their robot for their exact needs quickly and locally.
It's not just for giant factories anymore. True bespoke automation precisely.

Speaker 2 (23:12):
And there's one more crucial angle here beyond just movement,
the low latency. It helps perception too, how the robot
sees and understands.

Speaker 1 (23:20):
Okay, how does that work?

Speaker 2 (23:22):
Well? Think about really precise tasks like threading a cable
through a tight harness. Every tiny delay in processing with
the camera sees it translates directly into physical error. An
extra millisecond of vision processing delay that might mean one
millimeter of overshoot by the robot arm.

Speaker 1 (23:38):
Wow, literally measurable overshoot from lag.

Speaker 2 (23:41):
Yes, So cutting that round trip to the cloud, doing
the vision processing right there locally, it buys you precision
you can literally measure with the caliber.

Speaker 1 (23:49):
Okay, that makes the obstruct idea of latency incredibly concrete.
Millimeters matter. In manufacturing, it's.

Speaker 2 (23:55):
The difference between a production grade harness assembly cell that
runs reliably day and day out and a flashy research
demo that still needs a technician nudging the cable now
and then it takes it from a cool demo to
a robust industrial solution, unlocks complex assembly tasks that were
just too finicky before.

Speaker 1 (24:13):
That's a fantastic distinction, the difference between a prototype and
a real product that works reliably on the factory floor.
That precision is key, and.

Speaker 2 (24:21):
This capability opens doors way beyond manufacturing. I think broader
impacts those huge logistics hubs juggling skew changes every week.
They need robots that adapt fast. Hospitals needing a sterile,
self contained aid that cannot rely on hospital Wi Fi
critical for safety and hygiene apps, even those planetary rovers
on the Moon or Mars where the signal delay is enormous.

(24:43):
Real time decisions are vital when you're billions of miles away.
For all of these on device could become the embedded brain,
providing that essential autonomous capability makes sense and remember it
inherits Gemini's multimodal reasoning. It understands vision and language, So
these robots won't just follow dots on a map. They

(25:04):
can read labels, they can interpret gestures. They can understand
a person saying no, grab the smaller valve.

Speaker 1 (25:09):
Wow, real understanding, not just pre program responses.

Speaker 2 (25:13):
Exactly. This conversational agility, this ability to understand nuance and
adapt in real time to human interaction. That's what bridges
the gap between rigid industrial arms and something that feels
like a coworker with motors.

Speaker 1 (25:25):
Well, coworker with motors. I like that.

Speaker 2 (25:27):
It's a huge step towards natural human robot collaboration. Robots
as partners, not just tools. Imagine working alongside a robot
that actually gets what you mean, anticipates needs. That's a
much more seamless future.

Speaker 1 (25:39):
That really is the dream, isn't it. Robots that feel
less like machines and more like wow partners, intuitive, responsive, understanding,
a gesture, a subtle correction, It changes everything, makes them
feel approachable, truly helpful, incredibly exciting for so many fields. Okay,
so let's quickly wrap up this deep dive Gemini robotics
on device. We're talking incredible offline brains giving robots real

(26:03):
independence and speed, no cloud needed. We've seen how they
adapt fast with surprisingly little data, making advanced robotics way
more accessible, accelerating innovation. We looked at the serious multi
layered safety scaffolding, deep mind built in showing real responsibility
and the path for developers through the SDK, empowering everyone
to build specialized solutions. This really feels like a foundational

(26:26):
shift from dependent tools to autonomous intelligent agents. So think
about it. How could this cutting of the leads giving
robots their own brains impact your world, your industry? How
might truly adaptive robots change how we live, work, produce explorer?
Maybe sooner than we think.

Speaker 2 (26:41):
And just as a final thought to chew one. As
these robots gain more independence, more understanding of the physical
world than human intents. What new ethical questions pop up
beyond just the safety switches when they act more autonomously
make judgment calls in gray areas. How do we navigate
their roles, their decision making, the societal shift that come
with having coworkers with motors that truly understand and act

(27:03):
on their own. That's definitely something we'll all need to
consider as this technology rolls out. Something to mull over

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

The Joe Rogan Experience

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Gemini Robotics: How Google DeepMind Is Making Robots Think for Themselves

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

The Joe Rogan Experience

All Episodes

Gemini Robotics: How Google DeepMind Is Making Robots Think for Themselves

Stuff You Should Know