Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:03):
Hello, I'm Karen Quatromoni,
the Director of Public Relationsfor Object Management Group, OMG.
Welcome to our OMG podcast series. At OMG,
we're known for driving industrystandards and building tech communities.
Today we're focusing on theAugmented Reality for Enterprise
(00:25):
Alliance area, which is an OMG program.
The AREA accelerates AR adoptionby creating a comprehensive
ecosystem for enterprises,providers, and research institutions.
This Q and A session will beled by Christine Pere Prery,
from Perey Research and Consulting.
(00:46):
Welcome,
I'm very happy to hostthis fireside chat with the
CEO of AR Genie. Tarun,
would you please introduce yourself?
Hi. Thanks Christine. Thanks for invitingme for this talk. And hi everyone,
this is Tarun Mehta andI'm CEO of AR Genie.
And what we are doingat ARGenie is combining
(01:09):
AR and AI to really improve productivityof the industrial or the enterprise
worker. And specifically we'reinterested in how we can use AR glasses,
something that's always onas someone is wearing it,
and they can use an AI system inconjunction with AR to help them improve
their daily work for theindustrial or the factory worker.
(01:30):
So really focusing on thoseindustrial use cases, the workplace,
and I want to also say we'reusing the term generative
AI or gen AI to mean that the
information had not previouslybeen connected in an
experience by some programmatic means,
(01:52):
but in fact it's being connectedsomewhat automatically using
the AI algorithm. So tell mea little bit more about that.
How does it start foryou or for a customer?
Sure. I think one thing that we haveto realize is that every company has,
there's so much data out thereand it's just really hard to get
(02:14):
access. I mean, get the right informationfrom that data. So for example,
if you have a factory out there,there's thousands of IOT sensors,
there's so many machines, every machinehas a manual. If something goes wrong,
what do you do? When wasthis machine maintained?
So there's just so much data out thereand just imagine if there is a worker or
a manager on the field and if he needsinformation to this data and if they have
(02:36):
always own access to thisinformation using AR glasses,
I think that's what we are targeting.
And really I think the crux of theproblem is this huge amount of data that
people are having. Machinesare getting smarter
as well as systems aregetting more complex.
And programmatically attachingevery piece of information is
(02:58):
very time consuming and expensive.
Plus there's new information beinggenerated all the time. Right.
I mean that's the beauty of connectedthings is that they're always connected.
Yeah, exactly. And Christine, likeyou mentioned in your blog post,
I don't know if peoplehave seen that blog post,
if you can mention ithere in this interview,
(03:19):
you really nailed the point on howwe are in AI is combining together,
especially generating thisinformation using generative AI
for things that people really haven'tplanned for or things that you haven't
programmed for. For example,
you have a machine with a new part comingin and you want to ask how to replace
or fix that machinepart. How do you do that?
(03:42):
Or if something goes wrong and the systemis able to generate response on how to
fix that part in the machine.
So where are we in thejourney? The concept is clear,
but where are we in terms ofhas this been done before?
Can you provide some exampleseither that were done
(04:02):
years ago or just a few months ago or so,
where are we in thisjourney? Can you explain.
What you see? Yeah, sure. So let me justexplain to you. We as a company AR g,
we are still pretty new to this.
We released our firstprototype at CES in January.
So that's just last month.
And I'll tell you one of the challengesthat we are facing and that we are
(04:25):
trying to overcome is, andgenerally we is just great,
but how do you know it is accurate? Imean if you're in an industry environment,
you have to be sure that yourinformation output is not.
Hallucinations.
Exactly.
And that's one of the challenges thatcompanies will face and that's what we are
trying to address, especially ifyou're combining AR and AI together.
(04:45):
So for example,
if you get an output and you get actuallyinstructions step-by-step instructions
on how to fix a machine part and not justtext information so it actually points
to the part of the machine withan AR kind of annotation, hey,
this is the part you need to replace.
And that's the tricky part andthat's what we are trying to solve.
(05:05):
And trust me,
I believe a lot of companies are workingon something similar and let's see who
comes out with a goodclever solution for this.
So you have to train the system,
you have to give it some data.
So do you use an off the shelf LLM?
Is that what you're doing? Imean it's not using words right,
(05:27):
because it's looking at imagesor is it looking at correct text?
What is it looking at?
Yeah, so we are specifically usingmultimodal lms. So multimodal means LLM,
large language model that'strained on images, voice, video.
And the idea is that especiallyin our world with ar,
(05:48):
you have AR glasses and you are lookingat the world and you have this rich
information in terms of images that youcan use and feed it to the LM and output
you get from the LLM,you want images as well,
as well as some AR instructionsthat you can feed to the end user.
I think the landscape ismuch more different than a
normal generative AI query
where you get a textual output.
(06:09):
And here we are specificallylooking at multimodal LLMs which
work with images and textand other kind of data.
So do you need to trainit using a company's own
data? Is that how you're going to goabout it or is that how you're doing it?
Now you have to get permissionand you have to get access
(06:31):
to all of the data repositoryand all this learning
information and modules andtraining and train on that. Yeah.
Okay. Correct.
And that's part of the challenge as wellbecause a lot of these companies have
their own different systems in the backendand then that's part of the challenge
is how do you get access to this data?
And so it is much more complexthan what people think.
(06:56):
And so if you a.
Good sounds easy, hard to do.
Exactly.
Well how big a dataset do we need? I mean.
Yeah, so it depends on the company.For example, if you have a machine,
for example, one machine, thenyou might have 10 manuals for it,
then the data set is small. But ifyou have a big factory for example,
(07:18):
and you have thousands of sensorswith IOT data coming up and thousands,
hundreds of machines with differentmanuals and then if you're backend system
with a lot of data, so thatcan get dig really fast.
I don't have the specific answer at the.
Moment. Some ideas of Iunderstand what you're saying.
(07:39):
You're saying it depends on the usecase and it depends on your target,
what your goals are. Okay, so millions,
potentially millions
of lines of code.
Do you have any idea how long ittakes to train a multimodal model
(07:59):
or how much it costs? What aresome figures you can give us?
Yeah, so I can give youthe exact cost right now,
but it can take anywherefrom probably half an hour,
an hour to a few days and depends on thecompany and the amount of information
they have.
Typically what we suggest companiesto do is first experiment,
see what works and they canuse some simpler techniques
than just training their
(08:23):
lms. For example,
use rag RAGE to get an idea of how
the system will perform and experimentwith which LLMs work better for them.
And also there's a question of cost.
So some of the open source models canbecome much cheaper than using chat
GPD,
where it can get really expensive ifyour prompts are too big for example,
(08:47):
or if your tokens are too much.
So let's walk through a use case.
Actually I'd like to walkthrough two just to one.
I'm more controlled.
You've already spoken about themanufacturing setting and the machine.
We'll walk through that one in a second.
But there's another one that youshared with me about maybe an
(09:09):
accident has happenedeither to a building or to a
person or an automobile or a piece ofequipment and an insurance company needs
to come in and compare the old with
the new.
And so let's walk throughthat just a little bit.
(09:30):
When you go to a customer site,
they say train on this, then what happens?
Okay, so for example,for training purposes.
So if you go to the customer siteand train on this, so for example,
an example is
let's say you go to a printer and youwant to let's a big industrial printer and
(09:52):
you want to say how do Ichange the printer toner?
So you can just ask the generative AIsystem, how do I change the printer toner?
And then it'll create the instructionsfor you and give you a step-by-step
instruction. Step one,
take out this part and then replacethis part with something else.
This one example of a systemlike this for training and other,
(10:12):
so first other use, it.
Went from speech to text,
it went from speech to text andthen that text was entered into the
model, like a prompt
and then model replieswith a series of images and
instructions. Okay.
Correct. Good.
One more thing Christina I want to addis because you have the AR glasses on.
(10:35):
So AR glasses have an image of whatyou're looking at so it can actually
understand what kind of printer youhave and generate the response that's
specific to that printer or specificto that equipment you're looking at.
We are also working on technology, so youcan actually select a particular part.
So you can say, Hey,
can you tell me the part number or whenthis part was replaced so it can look in
(10:55):
the database and findthat information for you.
Interesting. Now let's do alittle less controlled one.
So something either indoors or
outdoors or anywhere somethingless controlled than a factory.
Sure.
(11:16):
You mentioned about theinsurance one. Insurance.
Yeah. This is one of the requestsfrom one of the biggest clients,
insurance companies inJapan that contacted us.
And what they are interestedin is in something,
let's say you have a car accident and then
the issues they face is then they sendan adjuster on the field to assess the
(11:37):
damage.
And sometimes these adjusters may notbe perfect or they might have vested
interest in giving more damages than isrequired to pay and they want to have
some sort of insurance system thatcan verify what the adjuster has said.
Not to replace the adjuster but tovalidate what the adjuster says.
Yeah, exactly. Exactly. So forexample, there's a damage on the car,
(12:00):
you can capture it in the images,
make a 3D model of it and compare toa 3D model of a good car and say hey,
these are the damages andthis is how much it'll cost.
And if there's a discrepancies,then it can be flagged, for example.
Excellent, excellent. Good example ofa gen AI plus AR or plus any image.
It could be a photograph, itcould be a 3D model. Yeah. Okay.
(12:22):
Correct. Yeah.
Alright. So one of the thingsthat I suggested is that perhaps
AI would be used to erase information.
So I think as of course the
video has to be segmented,
whatever the AR glassesor device is capturing,
(12:45):
it has to segment the world andit's quite easy to segment a
face or some certain kind of machine.
You could imagine that you wantto identify the machine to provide
instructions, but you also may wantto identify the machine to hide it,
right? Yeah. Maybe it's out in,
(13:05):
it's behind a cabinet in the public andyou don't want everybody to know what's
inside the cabinet.
So what about this idea of the AI
automatically erasing or obfuscating?
What do you think of that?Is that a possibility? Yeah.
I think Christina,
I think you raised a really good importantpoint that we never thought about
(13:27):
before. And I'll tell you the reason why.
We have been talking to some companiesthat are working in the field of defense.
So for example,
if you are out in the field and if youare getting support from someone else and
you really don't want to show where youare in the field for your security and
there's wars going on, weall know around the world,
(13:50):
different places around the world andthe US is providing a lot of equipment in
these wars and they need supportand for security reasons,
they might not want to maybe want to evenhide the person who is operating over
there in the remote area.
And I think this kind oftechnology will become important.
We personally haven't seenthe requirement for it,
but I believe we will start seeing if notwe have been seeing these requirements
(14:13):
already.
So I think that's a very good pointto raise on privacy issues in AI
videos, in videos that can be used.
It could be privacy to theother users or it could be
protecting the person'slocations or some intellectual
property. It could be anythingthat anything could be blurred.
(14:35):
So I think this is an example,
instead of adding AI towhat the AR user sees,
adding AI to what the AR camera captures.
So it's kind of in the introduction.
Correct. Yeah.
Are there other, let's just say
(14:57):
out of the box kinds of uses that I'mputting you on the spot a little bit,
but in the last,
are there some things that you want toshare that you've never seen done but
you're thinking about?
Yeah,
I think there is probably a requirementor people have problems if something
goes wrong, they want things to be fixedand if there's a system that can help,
(15:20):
then I just can't be very specific rightnow because we are talking to certain
companies on really how toimprove the manual on manuals.
And I just can't talk too much detailsbecause the policy are still in pipeline,
but what about.
Compliance In many industries you must
provide evidence that something was done.
(15:42):
This doesn't really in mymind qualify as generative
ai, but is that a use casethat you see sometimes?
So what we have talked to somecompanies is in factories, for example,
for safety.
That's one thing that we have seen andI think that probably is relevant to
(16:03):
compliance as well.
Let's say a machine is too hot andif your AR glass has a IR sensor,
then you can just issue awarning, hey, don't touch it.
So that's for safety so you don'tget burned for example. So again,
that's safety.
I'm pretty sure there are requirementsfor safety compliance in industries and
AR glasses or the ARtechnology can definitely help.
(16:23):
Good example.
Yeah,
also remember AR glasses is alsorecording so you can actually see if your
workers are following yoursafety procedures or not
and take directive actions
and it may be automaticallydetected with ai.
Exactly.
The connection that I have with AI isthat if you're recording hours and hours
of activity from any user,
(16:47):
no one's going to manually wantto go through and have the time.
It's much too expensive to gothrough it individually. Exactly.
So having a model that can
evaluate every frame or every segment for
alignment with the protocolwould be a way of using AI
(17:08):
in conjunction
with the recording after the fact. Yeah.
An example for this is what we have talkedto a few companies is in construction
for example,
you are putting an antenna on top of aroof and there are certain protocols you
need to follow and we have heard casesthat people actually die if they don't
(17:29):
follow the protocols.
And if people are making mistakesand you can detect with the ai,
then you can actually warn them,hey you need to be careful.
This is dangerous stuff.
And I think definitely AI and AR canbe really helpful in that regard.
AR with capturing the data and AI withanalyzing it with these hours and hours
of data.
In real time and telling. Yeah,
(17:51):
I think this comes to me that themain value of IR is about risk
reduction. It's risk tothings, risk to people,
risk to risk. It's a huge field.
Well I'm glad we agree on that.
So thank you very much Tarun,
it was wonderful to meet you anddiscuss this topic and I look forward
(18:16):
to hearing much more about yourprogress and work in this area.
Thank you for your time.
Thank you Christine. Yeah, andthanks for inviting me as well.