Augmented Reality - Part 1 - Deep Dive - Tricky Bits with Rob and PJ

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
PJ (00:01):
Hi folks, a little preamble to this episode.
What are the goals that Rob andI have for this show is to do
technical deep dives.
And with this particularepisode, we cover a really large
amount of stuff.
So rather than try to water itdown.

(00:21):
Well, we decided to do was tobreak this up.
Into our very first two partepisode.
So this will be.
Our first episode on augmentedreality.
We hope you really enjoy it.
And we hope you stick around anddecide to download the second

(00:41):
episode as well.
Ierengaym.
com ierengaym.
com

pj_3_01-22-2024_100459 (00:55):
All right folks.
Welcome back to another episodeof Tricky Bits with Robin, pj.
This go around.
We're gonna dive into a reallyfun topic and we're gonna go
kind of deep into it, augmentedreality.
We're gonna meander a bitthrough how we've gone from VR

(01:15):
to AR to xr, and talk a littlebit about some of the devices
that have been on the market,that are on the market, that are
coming on the market, and getinto why this stuff is hard.
What are the challenges and howit's evolved over time.
And maybe a little bit ofwhere's it going?
Is it gonna take off?

(01:35):
Hmm.
Now, Rob, I'll admit that I'vehad a tiny bit of experience in
ar.
I played around with theQualcomm library on iOS years
ago, and I've dabbled a bit inAR kit and AR core.
But I believe you might haveslightly more experience than me

(01:56):
in this area.
Uh, can you talk a little bitabout some of the places you've
done some ar XR stuff atpreviously?

Track 1 (02:05):
Yeah, I've worked on pretty much all of the
commercial AR solutions allowedthere, and a lot of VR platforms
other than HoloLens.
It's the only one I've neveractually even seen.
I've never put it in my head,never done anything with it.
Never looked at the SDK, neverdone anything.
But I have worked on the Oculushardware, I've worked at Magic

(02:28):
Leap.
Uh, I was, did the wholegraphics architecture stack
there from basically the core ofthe motion to photon, as we call
it, problem.
And we'll get into what thatmeans later.
And then more recently I workedat, apple on Division Pro, and
obviously it's out next month,so we can dive into some details

(02:49):
on that as well.
I also did a small stint atdaiquiri.
Wasn't really AR related, butkind of was.
It was just kind of future techholographic type displays which
have similar rendering problemsas ar, but not really the same
display and visualization.
So being in the trenches in thisfor going on 10 years now.

pj_3_01-22-2024_100459 (03:12):
Now to kind of baseline for folks
across the board here, we've gotvirtual reality, we've got
augmented reality, we've gotmixed reality, and they're all
kind of related to each other,and I think it's worthwhile that
we maybe take a little trip downmemory lane.
To help lay some groundwork fora lot of the technical problems

(03:35):
we're gonna talk about.
So let's talk VR for a moment.
Uh, in, in one sense, it seemslike, oh, VR basically is, I'm
rendering two images and I'mgood to go.
I've got my stereo vision, andwhat other problems could there
be?

Track 1 (03:51):
So VR has been around since it was first conceived,
and like I said, as stereoimagery.
And we figured out early on thatif we render two images for our
eyes that, uh, that are slightlydifferent, your brain will
perceive it as a 3D image.
And this was done in like the1890s.
It was done with realphotographs all the way back

(04:13):
then.
So the optical trick has alwaysbeen known as, at least it's in
modern history.
And then when we get to 3D we,in the late seventies, the
eighties, we started to think,oh, we can render two separate
images and we can basicallyoffset the projection matrices
of.
The two of what you'd normallyuse for a 3D view.
We can basically offset them alittle bit for the gap between

(04:37):
your eyes, render two identicalimages, but from a different,
slightly different viewpoint.
And your brain will perceivethem as a 3D image, but it
doesn't quite perceive it as a3D image.
It's still an issue of you put ascreen in front of your eyes.
So the 3D imagery is tellingyour brain that it's 3D and you

(04:57):
should perceive it as 3D, butyour eyes know that they're
focused on a plane just an inchaway from your face.
So this gives some peopleincredible headaches.
Um, myself included, believe itor not, I've worked on all these
platforms and they all give meheadaches.
And so there's a lot of problemswhich have not yet been solved
with accommodation and focus andthings like that.

(05:18):
So that brings us to like modernVR, where we literally put a
high resolution screen in frontof your face.
We render two images.
That gives us a stereo view.
But part of AR is also thetracking of your head.
Like, which way am I looking?
Like I could trick myself intoseeing a 3D image, but how do I
trick myself into seeing a movein 3D image when the movement's
from my own head?

(05:40):
So I move my head to the left.
I want the world to move as if Ireally move my head.
And that was solved morerecently with Modern imus and
obviously accelerationmagnetometers and Gyroscopes.
The original Oculus before whenit was still a Kickstarter
project, did an amazing job atthis.

(06:01):
It was kind of janky lookingback, but they, they kind of put
it all together with moderntech., I think Paul Malki and
those guys did a great job atputting that first Oculus one
together.
And as technology progressed, itgot better.
But a key part of that is thehead tracking and head
prediction and minimizinglatency in order to make it even

(06:24):
possible to do vr.
'cause if you think about inputin a video game, for example,
you read the controller, itmight take two to three
milliseconds, then frame one,the CPU does all its update work
and does all of the updating ofthe three of the objects.
Frame two, the GPU now renderswill frame one generated and

(06:47):
then finally it gets scanned outover HDMI displayed on the tv,
which might also take time basedon how much processing the the,
the TV does.
So even though a game is runningat 60 frames per second, the
perceived latency from.
Hitting a button to see and goresult on the screen could be 70
or 80 milliseconds, which is waytoo long for something like AR

(07:12):
to work or VR to work.
And we need that to be ten eightfive milliseconds.
And that's from what we callmotion to photon.
So from when you move your headto when you see the result, you
need that to be above five to 10milliseconds.
Uh, for an ideal experience.
If it starts to go out of there,you start to perceive lag.

(07:34):
You start to get things whichalso induce headaches in some
people.

pj_3_01-22-2024_10045 (07:38):
Basically it's a, a seasickness problem at
that point in time.
If you are, the whole worldbasically is kind of just
lagging or shifting at a out away.
Like it's, it just feelsuncomfortable.

Track 1 (07:50):
Yeah.
And what I just talked aboutwith all the, the latency and
the lag, we'll get into how wefix that.
It applies to AR and VR equally,but AR has more stringent
requirements.
So vrs a kind of a lax aversionas it comes to some of these
things because you're notdealing with the real world.

pj_3_01-22-2024_100459 (08:07):
With vr, we can guarantee that the frame
is coming in, or we have theopportunity if we have enough
hardware or we get the pipelinescorrect, of ensuring that that
frame is gonna land at the righttime.
Like we control the entire framebecause it's synthetic.

Track 1 (08:24):
yeah, we can control, it's rendered at the same
correct position, but your headmight not no longer be in that
position.
So you still have, even with themost powerful hardware, you
still have to minimize latencyif you want to utilize the
hardware to its full extent, andyou can stick a 40 90 in there
and it won't make the latencyany better and you'll still get
head lag.
So it's not just about renderperformance.

(08:46):
So we, we do tricks like, okay,so your head's here right now,
but that's no use to us becausein the.
30, 50, 70 milliseconds, it'sgonna take us to render the data
and actually display it.
Your head's gonna have moved.
So we use head prediction.
So we go, okay, you're here now.
You've been moving like this.
We're gonna predict what yourhead's going to be.

(09:07):
We might do a couple of thesepredictions.
We might do a prediction ofwhere's your head going to be
for the update frame.
And we can make that frum biggerso we're not cool objects, which
potentially you might see.
'cause if you say, okay, yourhead's here and I'm gonna
tightly cu to the view frum andthen it's a a few degrees off.
When you do the final predictionand you walk the frame or you

(09:29):
regenerate some of theseobjects, you might not have had
them in the render buffer tostart with.
So you've gotta kind of overcall.
So you have some ex objects thatare slightly outside your view.
So that's one thing that we cando we could call with one set of
objects and render with adifferent.
Camera, which is where, why weneed the over calling.

(09:49):
So then we have to predictfurther into the future, which
the further you predict, theless certain you are, the more
error you will have we couldalso predict like, okay, when
this frame hits your eyes, we'regonna pull it here.
But obviously you can't everpredict that for, you can't
predict to the point when it,it's your eyes'cause the frames
already generated and it tooktime to generate that frame.
If it took 10 milliseconds, thenyou had to predict 10

(10:11):
milliseconds earlier and nowyou've got an inherent 10
milliseconds of latency.
But what you can do is you cando what we call a, a late frame
warp.
Where, okay, we rendered theimage 10 milliseconds ago when
we predicted what that framewas.
And then as we're scanning itout.
We know exactly right now whereyour head is facing.

(10:33):
So we can do a, a more trivialwarp, just a 2D, a fine warp on
the final image to kind of warpit into the, into the correct
place.
But that can only really dorotational things because if you
start to move the position, thenyou get power ax errors because
you've now got a 3D image that'snot quite right.
So then you get into morecomplicated warps of filling in

(10:55):
the background objects and usingpotentially AI these days to
fill in what it thinks wasbehind the object, which is now
not visible or is visible andlots of problems arrive.
So minimizing latency is a veryhard problem, and Oculus
originally took a very good stabat this, and over time it's it's

(11:17):
been refined very, very well.
And just head models of how yourhead moves and.
Having two imus on the headsetinstead of one.
Because ideally we'd put an IMUin the center of the motion,
which is in the center of yourhead, which we can't do.
So we can put two one on eachside in a known location
relative to where the center is.

(11:38):
And then you can take movementfrom two imus and predict what
the center rotation is.

pj_3_01-22-2024_100459 (11:45):
so basically interpolate between
those two fixed positions to saythis is what we believe the
center would actually look like.

Track 1 (11:52):
It's not even a belief.
It's known.
I mean, we know they're on theoutside of a shape and if this
one moves forward, this onemoves backwards, then the center
probably didn't move.
And then if you move left orright, you can very easily
imagine two sensor on theoutside of a circle and you're
predicting how the middle one atthe center of the circle moves.
It's not that difficult.

(12:12):
So there's been a lot of math,there's been a lot of geometry
processing, there's been a lotof prediction improvements.
Faster.
imus imus, you can read them atmaybe.
When I was doing this, about athousand frames a second.
So you get a thousand samplesper second.
So how do you even get that intoa processor?
If you're getting it late, it'sno use to you.

(12:33):
You need it right now.
So within a given game frame,you get potentially 16 samples
from the IMU using those forprediction.
You can't be sampling this onceper frame and using that for
prediction, you've got to betaking the high speed input,
predicting where it's going, andthen using that for rendering.
So once you've done all this forvr, you get a pretty stable

(12:55):
image.
You can get that perceivedlatency down to 10 milliseconds,
which is acceptable for vr.
Even 16 is acceptable.
If you are a frame off on your,on your head when you're moving
it, it's not too bad if youstart getting into a hundred
milliseconds and you move yourhead and the image lags behind
you a hundred milliseconds.

(13:16):
That's when you get theseasickness problem.
If you keep your head still,it's perfectly fine because the
delay is not visible.
But if you rapidly turn yourhead sideways, left or right,
the image stays still for ahundred milliseconds and then
rapidly moves and then staysstill a hundred milliseconds
later.
It's very nauseous and it's notacceptable for a consumer

(13:37):
product.
Then we get into ar.

pj_3_01-22-2024_100459 (13:44):
With ar, we're gonna specifically be
talking about the approachHoloLens or Magic Leap took,
where it is we're paintingbasically the, augmentation onto
some sort of glass pane, but therest of the world basically is
able to fly through that glasspane easily.

(14:04):
Like there's no, there's noscreen like in vr, it's just
like you're getting photons fromthe real world at this point in
time.

Track 1 (14:11):
Yes.
That's what augmented realityis.
It's a virtual reality is theentire scene is virtual and it's
all 3D rendered.
How good that looks could bejanky.
1970 star graphics.
It could be on Unreal five doingits best rendering possible.
That's where the performance ofthe hardware comes in.

(14:32):
You still have to fix thelatency problem no matter how
good the GPU is.
The But yes, going back to yourpoint about augmented reality,
the real world in this case isthe real world.
You are wearing some sort ofglasses, which could be a
headset, it could be a pair ofglasses, it could be some
cyberpunk looking things, butultimately it's clear glass with

(14:56):
some sort of waveguide optics init.
So you can see the real worldand you can draw anywhere in the
glass, at least where the waveguide is, which for Magic Leap
was about, I don't know, 90degrees field of view.
It wasn't a very big view port.
It was very much at the centerof the glass.
So this all sounds like a headof display that's all it is.

(15:18):
It's, they've been around foryears.
Fighter pilots have had themsince the seventies and then
they've mostly used in thingslike that.
Like it's great for a fighterpilot, but this is head static
information.
If you move your head, theinformation you have in your
display stays with your head.
So you can pilot, can lookaround anywhere he wants and
still see all of his flightinstruments directly in front of

(15:40):
his face.
Augmented reality takes the headtracking information that we can
get from modern imus.
The same thing we did for VR andstarts to go, okay, so if I want
to pin a pixel to the world, notto your head to the world, so
this pixel is on the door.
If I move my head, I have tore-render that pixel in a

(16:02):
different place based on wheremy head is.
And the real world's, the realworld, it's going to move
instantly.
And how quick we can recomputewhere that pixel needs to be in
the display, put it back there,is how good the AR is.
And the problem with AR is youdon't just get the C sickness

(16:23):
lag because the real world'sgonna move instantly.
You get this weird, uh, shimmywhere the virtual object just
kind of slides around and howmuch it slides is basically a
measurement of the lag.

pj_3_01-22-2024_100459 (16:39):
so you're racing, you're racing
against.
The speed of light is really theproblem here.

Track 1 (16:44):
you race it against the speed of light and how fast your
brain can recognize an objectsmoved.
And there hasn't been a lot ofthe brain research side on this
as to like, how quick is thatand what movements are we more
sensitive to than others.
A lot of that's because wedidn't have the technology to do
the tests.
We had some tests we've done,and a lot of that was used in

(17:05):
the early ar.
But until you get the device andyou can start to do research on
the results, it's hard topredict what your brain's going
to do.
So a lot of this is happeningnow and as AR moves on, it will
get better and it'll get moreinvolved in the, the
psychoanalysis side of how weperceive these things.

(17:27):
But for now, it's very cut anddry.
It's like you moved your head.
I gotta, we move these pixelsover here.
So if you perceive it to be inthe same place, and that is kind
of what magic leaps of that wastheir thing.
They took the head tracking techthat, uh, was around at the

(17:47):
time, improved upon it, added itto a head static type display, a
modern version.
It's all wave guided.
Doesn't prohibit your forwardview of the world Too much.
But it comes with a huge set ofproblems and head of displays.
The information is always rightin your face.

(18:09):
It's very much notificationstyle.
It's good for, I know peoplemake glasses that have displays
in them, which are for biking orfor skiing, and they tell you
your speed and things like that,and it's always in your face.
It's very head of display style.
When it comes to true augmentedreality, you've now gotta deal

(18:31):
with the real world and it'sreally difficult.
Occlusion doesn't happen in vr.
It's like you rendered the 3Dscene.
So the depth of A takes care ofall of the oc, of all of the
occlusion.
But now you have an object inaugmented reality, which is just
in space.
If that object moves backwardsand goes through a wall, it's

(18:53):
and remains visible.
Your brain does not like thatone bit.
Your eyes touching.
It's like, oh, that's horrible.
It's like, what just happened?
Likewise, if an object goesthrough a a open door, what goes
round the corner, it needs toocclude on the real world.
So otherwise, you get the sameproblem.
You can see through, you can seethrough the objects behind

(19:13):
walls, and these are theproblems which make AR
difficult.
The technology of AR isliterally the same as vr, just
with faster latency.
It's the technology required toget it to work in an acceptable
manner.

(19:33):
And that becomes down to sceneunderstanding where I understand
what I'm looking at.
So if I render this threeobject, I need to clip it here
so it fits the geometry of thereal world.
But then how do you detect thegeometry of the real world?
Are you using stereo cameras?
Are you using depth cameras?
Are you using some lidar typedevice?

(19:54):
All of this is on your head, sohow.
Do you make the headset light tomake it so it's wearable and
ideally wearable for longperiods of

pj_3_01-22-2024_100459 (20:05):
Each of these different sensors also are
gonna solve potentiallydifferent problems, right?
I mean, I could use lidar, Icould use stereo cameras, I
could use depth cameras.
But you know, you'll run intodegenerate scenarios where it's
like, oh yes, that person wentthrough like past a wall, but
that wall happens to be made ofglass.

(20:27):
So not only do I need tounderstand the physical geometry
of the real world, I need toalso understand the material
properties so that you should beable to see a character walk
behind a glass wall.
Correct.
And maybe it should be distortedwith refraction, uh, or
reflective properties there.
But a, a standard depth cameramight not give you that.

(20:50):
It might just say, oh, that's awall and I'm gonna clip you now

Track 1 (20:54):
A style at DEF Camera doesn't even give you that.
It's the data you get back is

pj_3_01-22-2024_100459 (21:00):
point clouds of depth.
Right?

Track 1 (21:02):
Yeah, so Magic Leap had some games and they were kind of
things like Minecraft stylethings on a tabletop, or you
could do, they had a game whereit would open a portal on a wall
that aliens would fly out of theportal and you'd shoot them, uh,
by looking at them and clickinga button type thing.
And even finding a flat surfaceis difficult because when you

(21:25):
get closed, a surface isn'tflat.
Then you see some of this in thear uh, videos you'd seen,
they'll build a point cloud ofyour environment.
You slam type technology to mapout where you've be in within
your household, the room you'rein.
And that data is very noisy.
Flat surfaces are not flat.
And then you get into problemsof.

(21:48):
So if I use the point cloud asthe reference that I'm gonna
render against and I render ontop of that, then the depth
buffer from the point cloud willclip the 3D objects prematurely
before they're on the surface.
So you get these weird gaps andthese weird, uh, weirdly bottoms
to objects where they'resupposed to be sitting on the 3D

(22:08):
surface and it's these problemswhich need to get solved to make
AR usable.
And it's why AR isn't usable.
It's fraught with issues.
You mentioned some of them ofhow does glass work, how do
mirrors work?
And my house isn't your housewhere once you put VR on, it
doesn't matter what room I'm in,it's entirely virtual.

(22:31):
The light from the outside worldis blocked out and you're now in
this other world.
It doesn't matter whether you'rein daylight, dark, indoors,
outdoors, it'll always be thesame where AR is.
Very dependent on theenvironment you're in.
And it's, if the sun's out, forexample, or my house has lots of

(22:57):
glass, it just wouldn't work.
There's so much bright lightcoming in.
We are limited in how bright wecan make the displays.
We don't have the dynamic rangethe sun has, so cameras don't
have enough dynamic range tocompete with local objects.
If the sun is in the frame,you've got lots of, you see it
all the time on your phone.
You've got lots of, uh, exposureproblems.

(23:20):
It gets better with HDR, but wedidn't have HDR back then, and
it even, it doesn't solve theproblem.
And cameras have a much widerdynamic range than displays have
to compound.
The problem with Magic LeapHollow lens style AR is the
light you add in the glass.
The virtual light is additive.

(23:41):
You can't subtract light fromthe real world.
You can only add to it.
Which means you can't easily doshadows.
I can't subtract light, I can'tmake it look like the virtual
object has a shadow.
And we came up with some tricksto handle this.
You could do things like rendera, a gray polygon over the whole

(24:02):
frame, and then you couldsubtract from the gray to
simulate the shadow, but thatreduces the dynamic range of the
display.
And now you're adding, kind ofwashing out the real world
because you're adding gray lightto everything coming in, which
is the real world.
And it's things like, okay, Ihave a shiny object, and I

(24:24):
opened the door and the lightcame in.
How quick can that shiny objectrespond to an environmental
light?
Imagine taking a vr, a ARheadset to a, like a nightclub
where there's lights flashingall over the place.
Like how fast could thesevirtual objects actually respond
to it?
How do they respond?
Is the other question.

(24:44):
So now you've got more camerason your head because now you've
got to see the incoming lightthat, uh, the, the real world is
seeing.
You've gotta process that, makeit into some sort of environment
map that you can project backonto a 3D object to simulate it
being a shiny object in the realworld.
And then you get into problemsof, okay, so I looked left, but

(25:06):
I never looked right, so itdoesn't actually know what's
over there.
So does it make it up, does itmake you do a full 360 with your
head as part of the userexperience of like, it's not
very natural.
It takes time to build up thismodel of the entire environment
and you can, an object canreflect like that's behind you

(25:27):
that you've never looked at ornever will look at.
So it can never be real.
It's so difficult to do.
You could do 3D cameras and have360 D degree cameras, but now
you've got more sensors on yourhead and you've got more weird.
Circular lenses on your head soit's not very wearable.

pj_3_01-22-2024_100459 (25:46):
It sound, it sounds like a amazing
neck exercises, you know, whenyou're wearing all of these
sensors.
and to be fair, this is, this isonly for a, a static scene,
right?
We're not even talking yet aboutdynamic scenes like you have to
look all the way around just toget a static scene correctly

Track 1 (26:04):
Oh yeah.
It's so difficult.
All of these problemsindividually solvable in very
contrived examples.
But when you put them alltogether in the real world and
just let a consumer do what theywant in their own house, none of
these problems are solved to aacceptable level that you'd be
like, I could tolerate this.
Like ar right now he's not eventolerate'cause it's so like,

(26:26):
that's broke, that's broke, thatdon't work, that don't work.
And, and then you go outsidewith it.
And none of them are really madeto go outside.
But let's take him outside andnow you've got the sun to deal
with and adding light.
To where the sun is in the glassmakes zero difference.
'cause we can't add enoughlight.
Uh, we can't subtract light.
So we can't make it darker.

(26:48):
We are now get the problem ofscale.
And some houses get this too,of, we only have, we do all the
tricks we can do with optics.
We've got stereo cameras, we'vegot depth cameras, and, but if
we do stereo, uh, discrepancybetween two cameras, they're
only six inches apart.

(27:08):
'cause the best you could putone on each side of your head
and we can put them wider thanyour eyes, but we don't have the
processing that your brain hasto undo the information.
So in reality, cameras arealways wide as they can go.
So maybe they're an inch widerthan your eyes.
'cause they're on the corners ofa pair of glasses frames.

(27:30):
We can only really derive depthfrom that stereo camera to about
a.
20 or 30 feet, which is enoughfor indoors, but outdoors, it's
irrelevant.
I can look down the, my road andsee all the way to Denver
airport, which is 70 miles away,and I can easily tell that the

(27:50):
plane is behind the mountain.
Um, the distance, although mybrain, my eye resolution can't
do that via stereo imaging.
There's a lot more going on inyour head than just stereo
imaging and pure stereodiscrepancy from a, a pair of
cameras after about 20 feet, agiven point doesn't move at all

(28:13):
between the two images of thetwo cameras, so you'd have no
idea whether it's uh, close,whether it's far.
You've got no in depthinformation about that at all.
So our brains are using, I knowthat's a mountain, I know that's
a tree, and every now and thenyour brain will get it going.
You'll look at something andyou're like, oh, is that in
front or behind that?
And you can't tell.

(28:34):
And that's the AI side where Ithink this is a problem that's
solvable by ai and we start tosolve these problems in the same
way our brain solves theseproblems.
Your brain isn't doing purely,I'll take two images from two
eyes and I'll figure out whatthe scene looks like.
And you can just prove that bycovering one eye up and you can
still see depth perfectly fine,you lose things like true depth

(28:58):
perception if you're trying toland a plane or land a
parachute, covering one eyewould be a bad idea.
Uh, but in general, lookingaround in the world, if you only
have one eye, you can stillperceive the world in three
dimensions.
So there's a lot more going on

pj_3_01-22-2024_100459 (29:12):
Yeah, you, you can still have, you
know, parallax understandingwith one eye when you like look
and you're like, oh, I see.
Like how this is movingrelatively to each other.

Track 1 (29:21):
mathematically, you shouldn't have, from a purely,
if our eyes are just doing thisprocessing, if, if it's

pj_3_01-22-2024_100459 (29:27):
Well, for a single frame.
Yes.
Yeah.
Yeah.
I.

Track 1 (29:30):
it's, it shouldn't be.
So you add all of this togetherand it's just not usable.
Unfortunately, most of thekiller apps for AR are outdoors,
that's why there are no killerapps for ar and that's why it
was never adopted by consumers.
Now, it was used a lot byenterprise and the military use
it for training, and these areall controlled environments.

(29:55):
If you are using an AR headsetin, for example, on a conveyor
belt, and you're looking atthings coming down the conveyor
belt, the installation of thatsystem could dictate that there
can't be any windows, therecan't be any mirrors, and the
lighting needs to be this brightall the time.
It can't be darker, can't belighter.
All sorts of things you canspecify for a enterprise factory

(30:18):
type environment.
Absolutely not.
Things that you can specify fora house

pj_3_01-22-2024_100459 (30:24):
So to get specific, one of the
applications, if I recall, sodaiquiri was very much in this
niche industrial enterprisespace.
And if I recall correctly, theywere having their headset be
available to like, oh, I need torepair this particular area of a
naval vessel.

(30:44):
I think where it was, as yousaid, a very controlled
environment, very like lockeddown so that, Hey, I need to
understand the schematics inthis area to properly understand
how to fix it,

Track 1 (30:56):
And Daiquiri were entirely enterprise based.
They had a few r and d teams,which probed possible consumer
spaces.
HoloLens kind of went for theconsumer space and realized
that's much easier to do.
Enterprise side magically waslike all on consumer to start
with.
Now they're all on enterpriseand they've missed the boat

(31:16):
there too'cause everyone else isalready there that, uh.
They didn't understand theinteractions of the technical
problems.
Magically was a very badly runcompany.
Very small people solvingproblems, management, just
putting up bullshit videos thatwas like, look, this is how it
all fits together in an idealworld.

(31:37):
And that was literally renderedfootage.
And it's, it's the littledetails that we just talked
about which make it impractical.

pj_3_01-22-2024_100459 (31:47):
At a

Track 1 (31:47):
AI can help with a lot of these problems.
Like for example, I always usethis example, I have a game
where I have a guy, a bad guythat runs around the kitchen.
I could walk in your house, finda kitchen and go, oh, let's have
a bad guy, which run around.
And I could figure out in myhead how the game would play in
your space.
I can go to my kitchen andfigure out the same thing.
Very different type of play,very different layout in the

(32:10):
kitchen, the back.
But I can still visualize howsuch a bad guy would interact
with my kitchen.
I.
That interaction in my kitchenis very different to the
interaction in your kitchen.
You have an island, you have twoislands, I have no island for
example, and, but it's all inthe kitchen.
First of all, how do you findthe kitchen and then how do you
have emergent gameplay for thistype of game that can scale

(32:34):
across the unknown number ofkitchens that are out there?
And how do you take thatemergent behavior and tell the
same story for the same game,for the same person in an
entirely different environment?
All unsolved problems.
So it kind of makes it that itcan't be done and people who
attempt, it's typically in avery contrived environment.

(32:57):
You're coming back to, I couldmake a game in my kitchen, it'll
play great, but it would onlyplay in my kitchen.
So it's useless to anybody else.
Really.
That's the same enterpriseproblem of like, it only works
in this ship, in this factory.
And we have specialists who comeand we work it for somewhere
else.
Possible in the enterprisespace, not possible in the

(33:18):
consumer

pj_3_01-22-2024_100459 (33:18):
One of the, uh, and maybe this gets to
the, the, the whole notion ofthe contrived environment.
I recall reading a lot, youknow, this is eight years ago
about Magic Leap.
You know, they had StevenSpielberg in, or they had like,
a lot of luminaries come in, gotto try it out, like the test
runs, like maybe it was in thewarehouse or in the, in the

(33:39):
building, and it was all hookedup still to, you know, massive
computers.
To what extent is there a, a bigproblem, not just to the
contrived environment, but like,hey, like how do we shrink this
to be power friendly and weightfriendly for your head?

Track 1 (33:58):
So, magically been survived, either of those?
Uh, unfortunately.
So we've already talked aboutthe, uh.
All the sensors you need.
Most of these sensors need to behead relative.
So they need to be on your head.
They, the cameras need to movewith your head.
The IUs need to move with yourhead.
The depth sensor needs to movewith your head, which is why so

(34:18):
far we haven't seen a pair ofglasses and apple or no closer
than anybody else in making apair of glasses because where
would they put all thesesensors?
They're not magic.
They can't magically make adepth camera, be a 10th of a
millimeter square and just pullit in the middle of your
forehead.
Apple can't do that.
So they're playing with the sametechnology that exists in the
real world as everybody else's.

(34:39):
And then you've got power, thisthing, where are you gonna put
the processes?
Where are you gonna put the uh,battery?
How are you gonna charge thisthing?
Which is why they always comewith a puck if like, okay, the
sensors on your head to keep theweight down, the battery's
heavy, the processor board'sheavy.
Uh, we'll just put a puck inyour pocket.

(35:00):
Then we'll have a cable but nowyou've got a lot of data in that
cable.
You have, all the camera feedsand mipi won't go that far.
So now you've gotta do things.
Okay, well we've gotta take itoff the camera bus and process
it into something else and thensend it over USB ethernet or
whatever you want to use for theinterlink from the headset to
the base.
That's more electronics.
Now we need all these conversionchips also on your head because

(35:23):
we can't get the camera signalsoff the head.
'cause they'll only go sixinches in a phone.
That's fine.
But it's more than six inchesfrom my head to my waist.
And the cable needs to be longerthan that too because it needs
to be comfortable to wear.
So you get into all these formfactor issues and this increases
power.
All these conversion chips andall this extra processing you

(35:45):
have to do, which you don'tthink you have to do, adds to
the power budget.
And then you've got, how do youkeep your core, uh, how long can
it be for.
Blah, blah, blah.
So I think Apple are in a betterposition to fix some of this
because they already have aphone, which is, could be a
puck, it can certainly add tothe experience.
And it's a very powerful devicewhere magically didn't have such

(36:09):
a thing.
We kicked out Qualcomm, put inthe Nvidia X two and it was a
much better move.
The X two is very powerful, itwas at the time and could do
much more, but it's compared tothe amount of processing
required, it's pretty muchirrelevant.
And again, it, if you run itflat out all the time, and a lot
of it does have to run flat outbecause it has to deal with all

(36:32):
the display and all themovements and all of that, then
the battery doesn't last verylong.
So coming back to your questionof like, yeah, magically used to
connect the headset to a pc, sonow it's totally static and they
should have done this from dayone and at least allow it to be
solved this way.
Just sell it as a device foryour pc.
It's always powered.

(36:53):
It doesn't necessarily have toslow down.
It does for power reasons toconserve power doesn't have to.
It could run flat out nonstop.
So they do this and they'd alsodo all the tricks too.
If you look at the old photos ofMagic Leap, they had very
creative placement of furniture,very weirdly designed rooms.

(37:14):
And these rooms would've likecheckerboard couches, which were
used as part of the recognition.
So it knew you was in that room.
They'd have photos on the wallwhere if you look closely, were
actually like QR locators andlots of things.
Look, look very closely at them.
Pictures from Magic Leap andthey are of all these crazy

(37:39):
rooms.
And the irony of all this isMagic Leap bought the old Ola
building'cause.
It was owned by Google whenGoogle bought Motorola.
It's this building inPlantation, Florida, and it's
the old Motorola building.
And they employed a bunch ofMotorola old employees.
So some employees got their oldoffice back 10 years later.
And so magically acquired thisbuilding from, Google acquired

(38:03):
Motorola, had this building,didn't want it, so sold it to
Magic Leap.
Magic Leap.
Got it.
It spent a fortune refitted it.
How did they refit it?
Like it's a AR company where weknow AR doesn't work with glass.
They built a goddamn glassoffice, like it was like very
Apple, everything.
Terra and Glass and like verymodern office.

(38:25):
So the headset didn't even workanywhere in Magic LEAP's
offices.
So they had these special roomsto the side, which were the old,
like fully doored offices, andthey'd set them up as living
rooms.
These are the rooms that had allof the crazy ass furniture in
it, and it would recognize whichroom it was in based on some of
this crazy ass furniture.

(38:46):
Like I said, it's a checkerboardcouches and QR codes and
pictures and things like that,which didn't help because that
made the head chucking in someof these rooms super stable.
The head jacking in these roomswas really good because the
cameras would track the positionof the QR codes on the wall.
And from doing that for multipleones, you could figure out
exactly which way you werelooking.

(39:07):
Didn't help the latency problem,but it did help the stability,
uh, problem

pj_3_01-22-2024_100459 (39:12):
Well, From a, from a development
standpoint, was that actually ahindrance?
Because did that give a falsesense of security, it was

Track 1 (39:20):
it was a massive event.
It was a massive event.
All the demos relied on all ofthis bullshit stuff.
The real world didn't have.
Um, so anybody outside of theoffices just got garbage data
where inside the office you'vegot great data and it, it was
never gonna work.
It was like they, they knew itwasn't gonna work.
'cause the technical people, andthey had lots of good technical

(39:41):
people were saying like, can'tdo this.
That's totally cheating of like,if we're gonna use a QR code for
stability, we have to tellpeople we're using a QR code for
stability.
So, I mean, it wasn't that bad.
It wasn't that obvious, but itwas, if you look closely, it's
pretty obvious.
And if he was there, it's like,this is not good.
And it's like, it's it'sassistant, but it's not really,

(40:03):
it's doing a lot of assistance,let's put it that way.
That's why the demos in theoffice were good and that's why
they could sell it to all thesepeople.
'cause he was fairly good in theoffice, but it was also very low
resolution.
Magic Leap had this great labwhere they had, the fully
variable focused displays andthings like that.
And it was a whole room.
It was no way.

(40:23):
It was an optical lab wholeroom.
And you'd look through a pair oflike binoculars and you'd get
this great experience'cause itwas all variable focus at every
pixel.
And like really good stuff wouldnever possible in a pair of
glasses.
And Magic Leap also was the onlyone at the time that did any
sort of verbal focus.
And they didn't really do verbalfocus.

(40:43):
They had two focal planes.

pj_3_01-22-2024_100459 (40:45):
I was gonna ask, does it require
basically multiple lenseseffectively?
In

Track 1 (40:49):
Yes.
And then each of those, that'sall GB'cause there's a wave
guide for all g and B.
So a typical waveguide pair ofglasses will have three wave
guides.
It'll have three lenses stackedback to back'cause the wave
guides for red, green, and bluelight are completely different.

pj_3_01-22-2024_100459 (41:02):
Oh, okay.

Track 1 (41:03):
And because they're not just wave guides, they're kind
of tricky.
The wa uh, wave guides designedbecause it's the light's
injected at the side of theglass, which is the only place
and a pair of glass is that youcan inject the light.
So they inject the light at theside of the glass that is tying
a little ole, projectors orACOs, projectors that sit at the
side of the hands, which is whythere's always a big bulge piece

(41:25):
on the side of these glasses forany sort of ar display.
And that's where the projectoris.
And it projects into the side ofthe glass, into the waveguide.
The waveguide transparentlytakes it across the glass
perpendicular to the viewdirection.
And then.
We projects it forward into youreyes in a grid shape, and that's
kind of how it works.
But there's one for red, one forgreen, one for blue.

(41:46):
And these are optimized for,like I said, wavelength and
focus.
So magically it had two of thesestacked up, so there was six
lenses in

pj_3_01-22-2024_100459 (41:54):
Hmm.

Track 1 (41:54):
and it was a RGB image at two different focal planes.
There's also issues with gettingthose focal planes because
they're physically apart.
You've got red in front ofgreen, in front of blue.
The focal length's not the samefor each piece of glass at this
point, and it's'cause they'reall a millimeter apart and it's
very, so this is what magicallyoptical tech was.

(42:15):
They fixed a lot of thoseoptical problems to get red,
green and blue to focus.
And then they did two of thoseand they ended up with two focal
planes.
So you could render close and itwould be something like 18
inches would be like interactingspace right here in the, in the
realm of where the depth cameracan see.
And then they had a second focalplane, which was at about two or

(42:37):
three meters in the distance.
And that was basically there toinfinity.
And it was very hard to renderto these things.
How did you switch focal planes?
If an object moved forward andback, would it flick?
Would it, would it flick focalplanes?
Will it blend between focalplanes?
How do blending work when it'sall additive light?
Uh, you can see both.
So again, not an easy problem tosolve.

(42:58):
Having two focal planes may seemlike a technical bullet point,
but it causes all sorts of otherproblems which aren't
unsolvable.
And again, and this is justfixing what you see, it's not
even fixing how you interactwith the world as we've already
talked about.
Those problems areinsurmountable even today.
This was seven, eight years ago.

pj_3_01-22-2024_100459 (43:17):
I find the problem we're talking about
here with the focal lengthsreally interesting.
It reminds me a lot of doinglevel of detail in games or even
in movies of like, how do youfigure out when you're in
between those two spaces, likehow best to render the object?
And in this case, we're nottalking about two, we're talking
about six.
Right?
'cause

Track 1 (43:38):
but it's not even that a bad LOD in a movie doesn't
make you throw up where thisdoes.
It could make you so nauseouswith lagging all the other
effects that are going on at thesame time, bad clipping, not
things that are behind visuallyat a distance, which puts it
behind the wall.
'cause your brain can tell howfar, where the wall is and he
can tell how far away the objectis.
And he's like, that's behind thewall and I can still see it.

(44:00):
And your brain has no realability to process that.
It's never seen it before.
I've never seen an object behinda solid wall.
And glass is handled, nothandled at all, like.
It's very difficult'cause thevisible cameras can see through
the glass, but the infraredcameras that are in the depth
cameras and all the otherinfrared cameras that are also a
part of the center set glassesopaque to an infrared now

(44:23):
they're getting differentinformation.
The visible cameras can dostereo discrepancy all the way
out the window.
And the depth camera is notagreeing with it.
Which one do you use?
And then you get the almightymirror of, what do you do with a
mirror?
It's, do you render it twice?
Can you even detect the mirror?

(44:45):
How about a mirror Tola inangle?
How about a mirror?
That's just a reflection on apiece of glass.
And if any of that's missing,your brain just rejects it.
Like that's not real.
My dream of AR is that it'smovie quality, special effects
in real time, like movie specialeffects really are pre-rendered
ar.

(45:05):
They're literally augmentedreality.
They filmed it, they added stuffto it.
They make it blend perfectly.
What I haven't worked in movies.
You see the amount of effortthat goes into doing this.
They'll, they'll rebuild thingsat a different scale.
They will render or model anentire real scene in incredible

(45:26):
detail just so they can put thespecial effects in it.
They'll match cameras, ratetracing cameras to physical
cameras and all things likethat, all to get the perfect
shots, is that the augmentedparts of the effect and the real
parts of the effect areseamlessly blended.
Reflections are perfect.
Shadows are perfect.
Whether it be real shadow onfake object or fake shadow on

(45:49):
real object, it's all perfect.
It gets better all the time.
So my goal of AR was like, thiswill be cool if we get to do all
these things.
And then you start working onthe technical bits and you can
solve A, you can solve B, butthey don't work together.
Then you solve C, which totallyreplaces a.
It's just this progression oftechnology.
Like I said, every little bitcan be solved in a very specific

(46:12):
circumstances and none of themplay nice together.
So AR to this day is stillhorribly broken factor that in
that there are no killer apps,there's no ecosystem.
There's like magically what dothey have?
Why would I buy this?
Why would I, your use your chatprogram when I could use text or

(46:32):
I could use iMessage.
Why would I invest in yourecosystem when it's an entirely
new platform and we already haveexisting platforms and it's the
platform play problem.
And I think going back to vr,Oculus have messed this up too,
of.
Why is they not a 3D avatarversion of WhatsApp?

(46:55):
Why can't I have a VR version ofFacebook?
Without that ecosystem, again,it's a hard sell.
It's just a tech demo.
And Oculus have done a lotbetter than just a tech demo to
refer to them.
But Magic Leap was always just atech demo.
The ecosystem part brings usnicely up to the 800 pound girl

(47:17):
in the room, which is the visualpro.

PJ (47:21):
And that concludes our first episode of augmented reality.
To be continued in.
Episode two of augmentedreality.
See you all soon.

All Episodes

Augmented Reality - Part 1 - Deep Dive

Episode Transcript

Popular Podcasts

Las Culturistas with Matt Rogers and Bowen Yang

On Purpose with Jay Shetty

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Augmented Reality - Part 1 - Deep Dive