Manu Sharma, Founder and CEO of Labelbox | Transforming AI Model Development | The Evolution of AI Data Labeling and Future Insights - Seed to Exit

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
To build these amazing models that we all are
using every day, theyfundamentally need two things.
One is compute, so they needlots of GPUs to train these
models on.
The second big ingredient thatgoes into training these models
is data, and there are two kindsof generally the process of
training these models.
One is pre-training, where youmay have heard of hey, these

(00:22):
models are trained on theinternet data.
That's true, and a lot of theselike hey, these models are
trained on the internet data.
That's true, and you know a lotof these companies.
They download all of the data,they crawl all the websites, all
the sources.
In fact, they will even buydata from private places, like
you know libraries and you knowprivate kind of websites like
Reddit and Stack Overflow and soforth.
So that initial phase is toessentially train a base model.

Speaker 2 (00:47):
Thanks for listening to Seed to Exit.
I'm very grateful to have youtune into another episode, and
this is one I'm very excitedabout, so today I'm thrilled to
welcome Manu Sharma.
Manu is the founder and CEO ofLabelbox, which is a data
factory platform that's becomingessential for developing AI
models.
Under his leadership, labelboxhas secured over $188 million in

(01:09):
funding from investors likeA16Z and SoftBank, so we're
going to cover a lot of topicson entrepreneurship, ai
development and what the futureof AI looks like.
I hope you enjoy the show.

Speaker 3 (01:22):
You're listening to the Seed to Exit podcast with
your host, rhys Keck.
Here you'll learn from startupexecutives, founders, investors
and industry experts.
You'll learn from the bestabout building amazing products,
scaling companies, raisingcapital, hiring the right people
and more.
Subscribe and listen in for newepisodes and enjoy the show.

Speaker 2 (01:45):
Okay, manu, welcome on.
Excited to have you, excited tobe here.
Thank you for having meSubscribe and listen in for new
episodes and enjoy the show.
I found a lot of greatentrepreneurs have had unique
upbringings or something thatreally shaped them into what

(02:06):
they are.
So, just for context, what wasyour upbringing like?
Anything formative that gaveyou the drive to where you've
gotten to where you are today.

Speaker 1 (02:14):
Yeah, so I grew up in India, north part of India, a
place called Roorkee, and I grewup in a family of artists and
engineers and as far as I canremember, my memories are filled
with like sort of building ortinkering with things with my
father or my grandfather and alot of my time during kind of

(02:38):
the middle school and so forth.
You know my mom was pursuingfashion design and so forth, so
I kind of got to see the artworld.
Uh, I used to often spend timein textile factories.
Uh, you know where the cottoncomes from, the fields, and on
the other side, you know thehigh quality fabric is exported
outside of india and across theworld and so you know a lot of

(03:01):
these.
Those are sort of my mostcherished memories growing up.
But at the same time, you knowlife was fairly simple growing
up, in a sense that I didn'thave access to a lot of things
like computer or phone and soforth until really later ages,
later in teens ages and later inteens and um, and so a lot of

(03:29):
the things that I got fat, likea lot of the ways that I would
learn about, um, things thatwere interested to me, was in
libraries, so I would go tolibraries and learn from
encyclopedia like encyclopediaand kind of explore subjects
like physics and chemistry and,you know, come back home and try
to tinker with things thatcould, you know, get close to
some of those experiments Iwould learn in those books.
Anyway, but you know, thoseformative times led me to dream

(03:55):
of pursuing technology and youknow particularly aircraft
design and aerospace engineering.
You know people like I wouldwatch sometimes astronauts on TV
and so forth and that felt like, ok, this is super cool that
humans can actually do thesethings.
And you know, short, kind offast forward, with a lot of hard

(04:21):
work and luck, I was able tocome to America for uh, for my
um, for my studies and um, thethe like I was because I was
very singularly focused on fewsubjects.
I thought at the time aerospaceengineering was really really
interesting, um and uh, I lovedthe airplanes.

(04:42):
I wanted to explore, like um,explore even potentially being a
pilot, and so forth.
Then some other subjects likephysics and technology were also
very interesting.
I was also very good withsoftware and coding growing up,
anyway, so I got to America aspart of the education program.

(05:07):
Got to America as part of theeducation program and I think
that was sort of the kind of thebig leap for me to come to
America to learn all theseamazing ways in our system.
Here is the country aboutentrepreneurship, about building
businesses, following yourpassions and so forth, and so
that's kind of like how it allbegan.
Uh, and some of these mostformative times, you know, uh, a

(05:29):
lot of the uh, a big, of a bigchallenge of you know, um, that
I had to overcome was to, youknow, come to America and and be
in the right sort of uh, havethe right conditions for me to
be able to, like, startcompanies or stay in the country
and so forth.

Speaker 2 (05:46):
And yeah, like you know so, Well, I think that's
super cool for a couple ofreasons.
One, I mean I think we allwanted to be astronauts when we
were kids, so the fact that youactually built an aerospace
company later in life is reallyfascinating.
One question you mentioned thatyou were really good with
technology and coding, but youalso mentioned that you didn't
get a phone or computer untilreally your teenage years.

(06:09):
So how did you even get startedin software engineering?
Was it entirely self-taught?

Speaker 1 (06:13):
Yeah, when I first got my computer at home I got
hooked into it.
I would, you know, learn allkinds of software programs and
you know I learned tools likeAutoCAD, photoshop, of course,
programming with.

(06:34):
You know, basic at a time, atsome point I learned Flash, and
so a lot of these, like it sortof like opened like a new world.
There's all these new things Icould learn, um, uh and so forth
.
And of course, when I would goto school, school had computer
lab and and that's where Ilearned.
So in a very span of like maybefive years or six years or

(06:54):
something like that, I, um, Igot really good at coding, um,
in fact so much so that I wouldhack uh in our school and have
all these tricks and so forthwith my buddies.
So yeah, it was a very fastlike I learned.

(07:16):
This is really cool.
I spent a lot of time rightaway and in fact so much so that
I first I even attempted tostart a business with my Flash
programming skills.
I tried to build a.
You know, I saw something likefrom US, like somebody made a
million dollar homepage and wasselling pixels for a dollar and

(07:39):
I thought that was super cool.
At the time I was like, oh myGod, maybe I could go do
something like this, like anidea like that.
So I built a you know websiteon on flash and um, you know
there was sort of like thisbeing out there, getting your
idea out and doing things thatthat was sort of the most
important kind of current fromthat story.

(08:00):
But you know, um, so that'slike how it all gets started um
and um.
And what's really interesting isa lot of people ask me um, how
am I doing the software companyright now?
Uh, given all of thisbackground, and it turns out
that nearly every engineeringdiscipline that um one you know

(08:21):
goes into school for um,requires software.
Engineering requires, like you,to code.
You know all of these equationsin physics or math and so forth
.
They have to ultimately becoded up to run simulations, to
run programs and so forth.
So so coding and software is avery core part of every

(08:41):
engineering discipline.
At this point, however, thereis kind of a professional
software engineering skills thatyou don't really get to learn
in school as much, which isabout how do you make production
software at scale and so forth,which really comes from the

(09:02):
experience.

Speaker 2 (09:03):
Absolutely, and you mentioned the software company
that you're doing now, andLabelbox occupies a fairly
unique space within the realm ofAI in the context that, for you
know, the layman is probablynot familiar with what you do.
Right, you generally will thinkof a foundational model like
ChatGPT or Cloud, or you'll haveyour point solutions, which is

(09:24):
like AI for sales, but LabelBoxis neither of those things.
So, for the non-technical orthe semi-technical person, could
you explain what LabelBox isand how it's used to train
models?

Speaker 1 (09:34):
Yeah, so to build these amazing models that we all
are using every day, theyfundamentally need two things.
One is compute, so they needlots of GPUs to train these
models on.
The second big ingredient thatgoes into training these models
is data, and there are two kindsof kind of generally the

(09:59):
process of training these models.
One is kind of pre-training,where you may have heard of,
okay, these models are trainedon the internet data.
Well, that's kind of that'strue, and you know a lot of
these companies.
They download all of the data,they crawl all the websites, all
the sources.
In fact they will even buy datafrom private places like

(10:23):
libraries and private websiteslike Reddit and Stack Overflow
and so forth.
So that initial phase is toessentially train a base model
like have a basic intelligencefrom all of this trillions of
words that are on internet, allof this trillions of words that

(10:45):
are on internet.
However, that model, that basemodel, is not as useful in
everyday life.
It has base intelligence butit's not useful.
It's not.
It cannot interact with humans.
And so there is another processcalled post-training, and it's
an umbrella term for variety ofmethods and tools and techniques
to align that model or tunethat model to work with humans,

(11:10):
and this also requires a ton ofdata, but in a very specialized
way.
And so what Labelbox does is itproduces data with humans, with
expert humans.
That essentially leads theimprovement of base models to
more aligned models that we alleveryday use, and the kind of

(11:36):
data is actually very kind of.
Right now I find it very hard toyou know, like, hey, when I'm
using any of these foundationmodels or chat assistants like
the assistants know so muchabout everything, like I feel
like I have very littleknowledge to add to these models
.
But turns out that creators ofthese models are able to figure

(11:59):
out and see, well, the modelsneeds to be improved in
mathematics or physics or, youknow, maybe aircraft design.
And the only way to actuallyimprove the performance of these
models is by finding theseexperts who are great
mathematicians, great physicistsor great aircraft designers,

(12:20):
and produce the data in a mannerthat will teach the model the
new knowledge or the new way tounderstand and reason about
these things.
And so Labelbox essentiallydoes that.
So we have a very big networkwhere anyone can sign up and
join and pass through the examsand if they are qualified, if

(12:43):
they pass with good scores, theywill be matched to a job that
will be about labeling data andgetting paid, and the data is
used for these foundation models, and so we have this network
called alignercom with double R,and then we also have our

(13:07):
software product and softwareplatform called Labelbox, and in
conjunction we operate this asa data factory and ultimately,
the data produced by our factoryis used to improve these state
of the art models every day.

Speaker 2 (13:24):
That's super interesting.
So like, let's say, I'm anunderpaid physics professor that
wants to have something of aside hustle, I can then go to a
look at all of the data thatyou're bringing in and then
annotate it using my PhD levelexperience in order to improve
the data that's going, then tocall it open AI or anthropic.

Speaker 1 (13:45):
That's right and in fact we have a ton of professors
and postdoc students in Americaand actually many other places
in Europe and India that arepart of our network, and you
know, and they're getting paidvery attractive amounts of money

(14:07):
to do this job.

Speaker 2 (14:09):
So you mentioned earlier on the pre-training side
how these companies areingesting, you know, huge
amounts of data, practically allof the data in the world, or as
much as they can get theirhands on, and I know that
there's been concerns thateventually we're going to quote
unquote run out of data, and sothen the answer to that is
synthetic data.
So I'm curious what is yourtake on how to solve that

(14:32):
problem?
And then, how does the dataannotation that your company
does?

Speaker 1 (14:46):
work with synthetic data.
Yeah, I think in many ways wehave kind of reached those
limits of the data when it comesto pre-training, because
everyone has access to these webdata sets and everyone
basically have done all thesepartnerships to get private data
from different companies,different places or different
networks.
And you know, arguably like,let's say, in the next phase of
technology development,multimodal data like videos and
images and audio.

(15:07):
Maybe we'll tap into thosesources more effectively very
soon.
But generally speaking, yes,like there is a new kind of data
required and I think there arereally promising approaches to
produce that data.
Some of these techniques arewith synthetic generation.

(15:29):
A very simple intuition aboutsynthetic generation would be
that, hey, let's say we figuredout a wide range of books that
were never published on theinternet.
We can use an LLM to go throughthose books and that knowledge
and come up with interestingderivative data sets like
question and answers and soforth, and that's a great kind

(15:53):
of way to amplify the data byusing an LLM.
That's kind of like a verysimple example.
But there's so many morevarieties of these ways of
producing synthetic data andthen I think on top of that
there is going to be need fordata with humans and AI in
conjunction.
So think about today, likeright now it's almost becoming

(16:19):
unimaginable to write softwarewithout AI assistance.
And if you're refactoring avery large code base as part of
the task to teach the models howto refactor, code, the humans,
our aligners, are more likely,you know, going to be more
successful by using a co-pilotassistance to navigate the code

(16:42):
base and re-architect andrefactor.
And that's a very simple,canonical example of kind of
this hybrid approach wherehumans are going to use AI to
produce new forms of data.
That is ultimately going tohelp improve these models and I
think we're going to see both ofthese techniques be very

(17:03):
effective.
It already is and will, I think, drive or fuel the next phase
of the AI development.

Speaker 2 (17:14):
And so then, how do you see that interplay between
the human labeling and theAI-assisted labeling evolving
over the course of the next fewyears?
Like to go back to the exampleof the physics professor do you
anticipate that a physicsprofessor will then be using an
AI to help with the humanlabeling, or is it something
that just should be keptcompletely separate?

Speaker 1 (17:33):
I think.
So I think it's going to be kindof this very nice confluence of
kind of synergistic kind ofrelationship in producing new
kind of data.
And you know, not all kinds ofdata may benefit from such an
approach, but I think my guessis that vast amount of data will

(17:54):
be produced with AI and humansworking together.
And you know, ai is kind of likea much more like an ether, like
a technology that is can beinfused in every little aspects
of like what you do every day.
And so if it's providing thisacceleration in you know little
things like hey, I want tosearch for new knowledge because

(18:14):
I want to understand what thisnew concept is before I write my
kind of response to thequestion, like that is an act
that can be further acceleratedby AI, and so you can just have
a much more purpose-built searchand knowledge system that gives
you that knowledge.
And perhaps you know, the AIcan help you figure it out.
Like hey, you have provided aresponse here, or the task you

(18:37):
have done.
Potentially, these are someerrors that the AI has figured
out already that needs to befixed, and so that's an example,
I think, where AI and humansare going to work together to
have that superhuman capability,human capability, right.
So you know, I think AI is atool and if the tool is used in

(19:02):
a manner that the tool can beused in a manner that it
produces a very big leverage,and I think we are seeing that
already in early phases of dataproduction today.

Speaker 2 (19:21):
And it's likely going to become more pervasive very
soon?
And how do you handle dataprivacy and security concerns,
particularly for industries thathave very sensitive data, Like,
for example, LabelBox sellsinto healthcare, which, of
course you have HIPAA and allthe other data privacy concerns?
How do you handle that from anannotation or data perspective?

Speaker 1 (19:39):
Yeah, so as a company , we, from day one, have been
very privacy first organization.
So you know, we not only wehave all the kind of basic
practices of HIPAA complianceand SOC 2 compliance and GDPR
and things like that.

(19:59):
That helps us to even operateour business in those markets.
On top of that, we take a lotof steps such as anonymizing
customer data.
When it goes to the humans forlabeling, every aligner, every

(20:24):
human AI tutor that we have havevery strong confidentiality
agreements with us.
They are part of the same kindof similar scrutiny we have
around security and complianceprivacy compliance as our
full-time employees.
So we are able to guarantee ourcustomers a very wide,

(20:48):
comprehensive coverage ofsecurity compliance, privacy
practices, including with thehumans that are part-time
working on labeling the dataSuper interesting.

Speaker 2 (20:59):
Well, we've talked a lot about Labelbox the product,
but I want to talk a little bitmore about Labelbox the company.
So, just to go back to thebeginning, I was listening to a
podcast that you did a few yearsago and you said you reached
out to Brian, your co-founder,and you agreed that the next
phase of your careers was thatyou were going to build the
company, or build a companytogether.
So where did that come from andwhat gave you the inspiration

(21:22):
to start Labelbox?

Speaker 1 (21:24):
So I think for both of us, we've had many, many
shared experiences since collegetimes, and one of the most
important um and um, um, anessential aspect of our shared
experience was coming up withideas and uh and finding

(21:45):
opportunities in the world wherewe could um, uh contribute uh
directly, um, you know, in term,in form of producing um objects
or uh solutions, um, uh, andyou know and pursue that and and
I think, particularly inamerica, it's, you know, it's
something that is just so uhincredible that there's

(22:07):
opportunities like that.
You could have an idea, I canwork really hard to it on it,
and there's all theseenvironmental factors that help
me that is is conducive for youto just go out there and do it.
And so since college times, wewere into this thing.
However, we did not know a lotabout building companies far

(22:28):
from venture-backed companiesand so forth, and so that was
sort of the backdrop.
We built small businesses andwe pursued interesting ideas in
renewable energy.
We even had a space companywhere we built hardware that
went to International SpaceStation.
This is, you know, we weredoing this while I was in

(22:49):
college at Stanford, brian wasat Boeing, and so we were always
pursuing these kind of things.
However, in 2018, 2017, 2018, aiwas taken off, particularly
deep learning, computer vision,and I was at a company called
Planet Labs and we saw firsthandthat, oh okay, to produce, to

(23:13):
develop these AI systems, youneed human supervision.
And we had a very simpleinsight for a long, very, very
long time, potentially foreverhumans are always going to want
AI to be more aligned tothemselves.
And you know, in other words,if you build whatever kind of

(23:34):
level of AI, whether it's foryour company, whether it's like
an AGI, the creators of those AIsystems want to make sure that
AI is behaving the way they wantit to.
And if that is the case, thenhow would you, how are these
companies and teams going toensure AI is aligned?
And the answer is actuallycomes down to data, because that

(23:57):
is a primary form ofcommunicating with an AI, and,
and so we saw an opportunity tobuild products and services in
that sort of kind ofintersection of human and AI.
And you know, and we thought,we made a bet that supervision,
human supervision, is going tobe required, albeit that it will

(24:20):
change dramatically astechnology gets better and
better, which it already has.
Back in the day, 2018, the datalabeling or annotation, was very
meticulous.
You know you're going to haveto teach, uh, ai basically every
little detail about you know,hey, I want you to uh, like,
this is what a car looks like.
Uh, now, um, the, the, the, thelabeling is actually at a much

(24:44):
more higher sophistication.
Uh, you know, professor of,labeling is actually at a much
more higher sophistication.
You know, professor of, let'ssay, mathematics is teaching AI
about how to reason about thatproblem at a very high level in
a natural language.
You know so just in the span ofsix years.
You know, the interface and theway these models are taught are
becoming actually more likehumans, are becoming actually

(25:06):
more like humans, like the waywe would teach students and
teach each other through thiscommunication and preferences.
Sometimes, preferences isreally the way to go about
making a judgmental quality,because we sometimes are not
able to describe why such athing is so great, but we're
able to say, like that certainlyis better than everything else

(25:26):
we've seen before, and so thatis how it's being done today,
and I think it's probably goingto take even more simpler forms
into the future.
How'd you get your firstcustomers?
We launched our product onReddit and we kind of hacked a
version of LabelBox and onnights and days and nights and

(25:47):
weekends, we kind of hacked aversion of Labelbox on nights
and days and nights and weekendsand we launched on Reddit and
it struck the chord with thecommunity of computer vision
researchers and so forth andthey started signing up.
They started asking us hey, thisis kind of cool.
Maybe what do you think aboutthis extra two or three features
?
Maybe what do you think aboutthis extra two or three features

(26:08):
?
And every day we would wake up,we would see more signups and
we would ship more featuresbased on what we had learned the
day before.
And that's how the snowballeffect started.
And, yeah, just a matter of twoor three months, we were

(26:29):
charging for the software andour tools.
And when we started charging,then over the course of the next
few months, we started addingzeros to our price.
We started selling it at $10,$100, $1,000.
And when we saw that priceelasticity like okay, we can
keep charging far more moneythat's when we realized this is

(26:51):
actually becoming very valuableand perhaps we should really
just go at it full time.

Speaker 2 (26:59):
Super interesting.
So when you say you launched onReddit and you were having
people sign up, then presumablyyou're not.
Were you trying to get peopleto sell to people on reddit or
were you trying to get theannotators that could then come
in and add their knowledge, orboth?

Speaker 1 (27:14):
um, so um, we um in the early days, like, our
product was basically a softwaretool that allowed them to label
data by themselves, and so thesimple example would be let's
say you are a healthcare companyand you have maybe access to

(27:35):
five or 10 radiologists and youwant to just label that data,
and at the time, all of thetools were desktop tools, and we
were probably the first toolthat was cloud-based, where you
simply loaded up the data andyou could kind of
collaboratively label data amongfive people in a world, again

(27:55):
in the backdrop, whereeverything was desktop and you
can imagine how good you cancoordinate five different people
different places on a desktoptool to label all kinds of data.
You'll have to divide all thedata to five people and all that
stuff.
We just handled all that, andso that was really our core
first product, and we were goingout to Reddit and all these
different places online toessentially find and appeal to

(28:19):
the researchers, the engineerswho are building these systems,
these AI systems, and so that'skind of like how we began.
Now I should also say thatbefore building this product, we
spent many months researchingthe space and we talked to a
number of people in SiliconValley who were building AI

(28:40):
companies, and we honed downlike, okay, this is actually a
problem.
People, companies need muchmore better, much better ways to
label the data, manage theprocess and all that.
And so that led to convictionlike, okay, let's go build a
prototype.
That led to like, okay,launching this on online um and

(29:03):
um.
And then, and then you know,yeah, that's how we started.

Speaker 2 (29:07):
So when did you, when did you switch from that B2C
product selling to theresearchers to more of a B2B
product, and how did you makethat transition?

Speaker 1 (29:15):
We were B2B from day one and a lot of these
researchers were like.
So some of them were certainlyhobbyists, but many of them were
part of the companies they wereworking in.
So you know, a lot of this aidevelopment could really happen
in companies that had a lot ofdata and so, um, right, so like

(29:39):
you could only build, let's say,a, a vision system, uh, for
vehicles, for cars, if you havecars on the road, or you can
only build medical imaging AI ifyou have medical data.
So these researchers, they wereemployed by these companies and

(30:00):
that's how we had that insight.
We really needed to talk tothem and engineers and who had
the problem, and they were morelikely than not building their
own tools themselves, using theopen source tools perhaps, or
they may have tried some legacyproviders and that was not
really an ideal experience forthem.

Speaker 2 (30:20):
Got it Okay, excuse me.
And then on the hiring side,how did you hire your first 10
to 15 employees and how did youconvince people to make that
early jump?

Speaker 1 (30:33):
Well, I think the first 10 people most of them,
came from our prior networks orprior experience.
We knew these people from ourprior jobs, from our prior jobs
and um and so, um, weessentially went into that mode

(30:54):
where, like, we were hiring ourfirst few employees, um and
colleagues to like going throughthe people we had admired the
most in our previous jobs, andnot everybody, of course, uh,
wanted to jump and you a verywild rest experience with a
super early stage startup.
But many people did, and soBrian, myself, dan we basically

(31:15):
went into our respectivecompanies and felt like who are
the great people that we workedwith and that helped and it also
runs out very quickly.
We were early in our careers.
We only knew so many people,and that helped and it also runs
out very quickly.
We were early in our careers,we only knew so many people, and
I think a lot of the hiringafter that, after a handful of

(31:38):
people, then, really was aboutspreading the name and going
around in our communities in SanFrancisco to find people who
were interested in a very earlystage startup experience and, of
course, then convincing them tojoin Got it.

Speaker 2 (31:57):
So that was quite a few years ago and Labelbox has
obviously grown tremendouslysince then.
Talk me through what the lastfew years have been like and how
you've gotten the companytowards that today.

Speaker 1 (32:10):
Yeah, I mean, last few years has been extremely
dynamic, to say the least.
Nobody expected thebreakthroughs that we are now
using every day.
I think no one really expected,like, how amazing chat, gpt and
the the transformer was.
I think we had a very earlyinsights or sort of a vibes

(32:35):
coming in Like, okay, you know,as people are pretty excited
about transformers, this andthat, but it wasn't really until
chat GPT launched that theinterface and everything became
like aha, this, okay, this ishow it's going to be.
And so you know, when you'rebuilding a company, one of the
most important aspects aboutscaling the organization is that

(32:59):
you're operating in a marketthat is stable and you know when
the markets are stable, you canoperate a team.
You know, kind of in a muchmore conventional kind of all
the ways that you know you learnfrom all the like entrepreneurs
, like how do you scale, and youknow solve problems and grow
the team, things like that.

(33:19):
But our market actually is verydynamic, like you know.
It just changed so fast, soquickly.
It just changed so fast, soquickly and because of that we

(33:49):
had to kind of shift towards amuch more early stage startup
mode where you know we are lessmanagement, less sort of layers
of hierarchy and much more kindof the information flow from the
market and insights is, youknow, dissipated to everyone
much more quickly.
And so that's what we did, andum and and that has helped us to
understand and build newproducts and services better,
faster than otherwise we wouldhave.
And so in the in the last coupleof years, in the last couple of

(34:13):
years, Labelbox has evolvedfrom providing software tools
and software platform to a datafactory, and what I mean by data
factory is we are a companythat provides all of the tools,
the system, the infrastructurefor a company to operate a data
factory themselves to producethis labeled data for their AI
models.
But we also operate the datafactory ourselves and we

(34:35):
essentially sell the data,provide the data to the
customers who want a much morekind of fully managed experience
, and we do that together in asame platform.
And so this is really reallyremarkable.
It's enabling AI teams toproduce higher quality data
faster.

(34:55):
They're able to have thisiterative feedback, so an AI lab
that is really trying toinnovate on physics or
mathematic problems are reallyable to talk to these human
experts in those fields veryrapidly and figure out how best
to produce data in, whatevertheir domains are, and so that's

(35:17):
the kind of the evolution ofLabelbox, I would say, in the
last two years, and we have oneof the fastest growing network
right now.
So you know, if any of thelisteners here are interested in
a side hustle and think theycan contribute in whichever
domains that they are a part of,please check out alignercom.

(35:41):
And, yeah, awesome.

Speaker 2 (35:44):
So I was going to ask you about that.
How transformative was that forLabelbox in the chat GPT moment
when it was released, did youhave like a holy shit, this is
going to change everything forus moment, or did it kick in a
few days later?
What was that?

Speaker 1 (35:59):
like slower, maybe over the course of months and
you know, maybe like six monthsor so, and it was more in a form
of like how the businesses werereacting, and so we knew that

(36:19):
it was something reallyinteresting and very profound
potentially.
However, we didn't know how thebusinesses would react to it.
We didn't know how thebusinesses would react to it and
like how would AI teams, likeall our customers that we have
in different industries, wouldreact to it?
And in many ways, a lot of theinterest for most of the

(36:42):
enterprises just went intogenerative AI, because it's like
was sort of instant ROI forthem to do things that they
otherwise couldn't have donebefore, and so that was sort of
a learning experience for usthroughout a period of time, and
that also meant theirpriorities were shifting.

(37:04):
They were re-architecting theirteam and their focus.
Some companies were no longergoing to build AI systems
anymore because they're going tojust rent AGI or like the AI
from foundation model companies,and some companies had much
more clarity where, like, okay,this is their moment and they're

(37:26):
going to use these foundationmodels as a new backbone and
then build their custom AI ontop using that architecture, and
so that thing played out overthe course of months and
quarters, and that helped usunderstand, like, okay, there
are going to be some customersin the enterprise segment that

(37:49):
are going to be building AIsystems because they have the
data, they have the talent to godevelop those things.
And then there are going to bea new class of customers, their
frontier AI labs or foundationmodel developers, so generative
AI startups that are actuallygoing to need a completely new
way of data that has not beenproduced before, and so we need

(38:10):
to also serve them.
And so that was all thelearnings of, I would say, 2023.

Speaker 2 (38:18):
And where is Labelbox at now, just so people don't
have to Google?

Speaker 1 (38:21):
from a funding and a headcount perspective, yeah, we
are over 150 people companyprimarily based in San Francisco
, but we have few satelliteoffices around the country and
in New York City and Poland, andwe are hiring in many, many

(38:43):
various roles, and our companyis venture-backed.
So we've raised about $190million from a variety of
investors like SoftBank, a6&z,kleiner, perkins and Google, and
, yeah, so we've intentionallykept our company very focused

(39:04):
towards this product and thethings that we're building.
And one of the learnings ofthis generative AI also is that
you don't need to have as manypeople as before generative AI.
A lot of our teams are moreproductive, actually by being

(39:25):
smaller, but by leveraging a lotof the AI tools and and so
that's been kind of a revelationfor us to have a huge unlock
for us to stay nimble, move fast, have fun and also, you know,
have that sort of vibe of likesmaller teams and so forth.

Speaker 2 (39:44):
So, in terms of the guiding principles that have
gotten to you, to where you'reat today staying nimble, staying
lean.
What else have been thoseguiding principles, either from
a product or a peopleperspective?

Speaker 1 (39:57):
and always kind of figuring out where the puck is
going and having the company andthe teams aligned towards,
always be pointed towards wherethe things are going.

(40:18):
Irrespective of all the pastdecisions and places we may have
been, it's always about thefuture and I think that's been
sort of like a guiding principle.
Places we may have been, it'salways about the future and I
think that's been sort of like aguiding principle in all forms
of, in all facets of companybuilding.
It comes to people, theteammates, it comes to strategy,

(40:41):
it comes to what we're going tobuild.
You know how are we going tocommunicate about ourselves?
So it's like, you know, it'salways like where the things are
going, where the future is, andI think I think that's probably
my most profound guidingprinciple at the moment that I
can think of.
But yeah, that's like it's so,it's so true, you know.

Speaker 2 (41:03):
Love that Always be learning.
So when you, when you thinkabout the future, love that
Always be learning.
So when you think about thefuture, what do you see, both in
the field of general AI?

Speaker 1 (41:12):
I think it's been conveyed by so many people like

(41:37):
this looks like the mostprofound one, because it's
perhaps because there's noceiling to it, like AI and AGI,
there's no end to it, like itcould just keep getting smarter
and smarter, and like, what doesthat mean?
And all that, and so I think,um, so, with that as a backdrop,
um, I think we are.
We are seeing, um, theseamazing, amazing capabilities of

(41:59):
ai systems and how it'schanging everyday job, right, so
it for our teams.
It's unimaginable to writesoftware without co-pilot AI
systems.
It is unimaginable for us towrite documents or any writing

(42:19):
that we do, it's unimaginable todo without AI today, right, and
so there's so much of thesethings that are already
intrinsically changing and, bythe way, six months from now,
nine months from now, I think wemight see similar significant
shifts that we saw in the lastyear.
So this is really anexponential trend.

(42:40):
It's it's very hard to fathomthe progress.
You know, I speak to my phoneall the time now, I never used
to do that, uh, even a year ago,and like these are the things
that are changing basically,like these subtle things that
are changing about, you know,everyday life.
Uh, I think is the mostexciting part of it.

(43:01):
Honestly, um, I still getmesmerized like how can an AI
just like produce?
Sounds like that are uncanny atthis point, you know, and it's
pretty remarkable.
I'm particularly excited abouthaving a personal teacher.
You know, for me, I'm learning,I love asking questions, I love

(43:25):
like, just like going deep intoa topic and so forth.
So I use tools like Perplexity,claude and Gemini and all these
AI tools, and it's really coolfor me to just always discover
new knowledge.
I'm very excited about my kidsdoing the same way.

(43:45):
Discovering new knowledge likethat.
I think it's going to be veryprofound.
So those are the things that Ithink are particularly exciting.
When it comes to Labelbox, weare always asking a question
what is the next AI capabilityin terms of reasoning, in terms
of the things that it cannot doyet, and how will we help the
world by producing the rightkind of data that will enable

(44:11):
them to achieve that milestone,achieve that breakthrough, right
?
So there are a few examples ofit today.
Ai is really really good inEnglish language, but if it
comes to like differentlanguages around the world, it's
not so engaging yet.
Around the world, it's not soengaging yet, and so a lot of

(44:31):
the work that is going right nowis to make these AI models have
stronger performance in otherlanguages like Spanish,
portuguese, hindi, basicallylike following the world's
population.
Then a big focus for everyone isaround agents.
Like, okay, how are we gonnamake um ai go beyond the
assistant?
Um, assistants are cool, but inmost businesses, um, you need

(44:56):
something more, like likereplacing a, like a full
function, like you know,whatever the job that is like,
how do we augment that entirely?
That means that the ai needs tobe able to interact with
different pieces of software,understand how to operate those
software stack and so forth, andthat needs to be taught to the

(45:17):
AI systems.
It's not, you know, yet asrobust as people would like.
That means they need new data,and so we are focused on that.
And then the world is I, we'regoing to be very much multimodal
, and so what?
One of the great things I likeabout google gemini is that you
could like upload a video andaudio and documents and ask a

(45:39):
comprehensive question that doesanalysis across all those
modalities.
And you know that is a a prettyconvincing um experience to say
like the world is, the AIsystems are going to be
inherently multimodal.
They're going to be able tosense audio video.
You know documents like alltext and process it all together

(46:01):
.
And so what does it mean for therest of the companies that are
trying to build multimodalmodels, and how do we go produce
that data at scale?
And so, you know, reasoning isanother one Turns out.
So, like you know, one of theways we've the recent OpenAI
models that have been released,open models that exhibit really

(46:23):
remarkable reasoning skills.
A lot of the data that wasproduced were with, again,
people with phd backgrounds, thedomains and basically like
taking a very abstract problemand decomposing that into steps
of like, hey, how would a personmake it into smaller problems
and the questions one would askin a more logical, rational way,

(46:45):
and that that reasoning tracewas then used to improve that
system as an example.
And so those are the things wethink about in Labelbox in that
context of this exponentialtrend and new capabilities and
all that.

Speaker 2 (47:00):
Super exciting.
On a slightly more personalnote, I know you mentioned your
kids.
I have a two-year-old myselfand you know it's funny because
when I think about technology,developments in technology, I
often think in the window of thenext two to three years.
But when I think about you know, in the context of my son, I'm

(47:21):
thinking in the next 20 and whatthe world is going to look like
when he's a grown-up.
And frankly, I have no idea.
I don't think anybody does.
But you know for yourself, youknow your series D, the.
The average startup life cycleto exit is, you know, seven to
10 years.
You're touching on seven.
Obviously, I have no ideawhat's going on.
You know in the internalworkings of the company, but

(47:42):
we're probably at a point wherein the, in the not so distant
future, you know the time may ormay not come for an exit.
So for yourself, in terms ofyou know life, life potentially
post label box, and in terms ofbuilding a legacy, what do you
want that to be?

Speaker 1 (47:57):
I don't have, I haven't really thought about
anything like it, but I thinkone of the things most likely
going to be true is that I willbe, you know, I'll be building,
I will be, um, um, you know I,um, I'll be building, I'll be
building products and uh,solutions, and um, that's just
very intrinsic part of who am Iand who I am, and so, um, uh, I

(48:18):
think, um, you know, um, it'sthat's the core essence of it,
like it's so much fun and, uh,it's just my way of expressing
in the world where, like, I lovebuilding things, um, in all
different facets, um, whetherit's products in softer land,
whether it's like objects andfurniture at home and so forth,
and so, um, I think that's goingto be likely true.

(48:41):
Um, you know, um, we have, wehave long ways to uh go with
even realizing the many aspectsof our vision at Labelbox.
You know this AI alignment isbecoming more and more important
now than it has ever been, andwe play a very important role in

(49:05):
helping companies align theirmodels, and so I think you know,
so that's like that's kind ofwhat I think about it we play a
very important role in helpingcompanies align their models,
and so I think you know.
So that's like that's kind ofwhat I think about it.

Speaker 2 (49:13):
Cool, I love that.
Well, manu, it's been such apleasure chatting with you.
Thank you so much for coming onthe show.
I really appreciate it.

Speaker 3 (49:21):
Thank you for having me, thanks for listening to See
to Exit.
If you enjoyed the episode,don't forget to subscribe and
we'll see you next time.

All Episodes

Manu Sharma, Founder and CEO of Labelbox | Transforming AI Model Development | The Evolution of AI Data Labeling and Future Insights

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

Betrayal: Weekly

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Manu Sharma, Founder and CEO of Labelbox | Transforming AI Model Development | The Evolution of AI Data Labeling and Future Insights