All Episodes

April 23, 2024 • 46 mins

Together with our community, we engineer sparse LLM, CV, and NLP models that are more efficient and performant in production. Why does this matter? Sparse models are more flexible and can achieve unrivaled latency and throughput performance on your private CPU and GPU infrastructure. Check us out on GitHub and join the Neural Magic Slack Community to get started with software-delivered AI.

http://neuralmagic.com/

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Intro / Outro (00:02):
Welcome to Building the Future, hosted by
Kevin Horek. With millions oflisteners a month, Building the
Future has quickly become one ofthe fastest rising programs with
a focus on interviewingstartups, entrepreneurs,
investors, CEOs, and more. Theradio and TV show airs in 15
markets across the globe,including Silicon Valley. For

(00:22):
full showtimes, past episodes,or to sponsor the show, please
visit building the future showdot com.

Kevin Horek (00:31):
Welcome back to the show. Today, we have Brian
Stevens. He's the CEO at NeuralMagic. Brian, welcome to the
show.

Brian Stevens (00:38):
Thanks, Kevin. I'm happy to be here.

Kevin Horek (00:40):
Yeah. I'm really excited to have you on the show.
I think what we're gonna talkabout today is actually really
innovative, very cool.Selfishly, I'm really excited
because I'm doing some stuff inthe space, and I really want
your opinion on a bunch ofthings. But before we dive into
all that, let's get to know youa little bit better and start
off with where you grew up.

Brian Stevens (01:01):
Yeah. I grew up in, in a small town in upstate

Kevin Horek (01:05):
New York, close, Chicago. Okay. Very cool. So
walk us through you went touniversity. What did you take in
wine?

Brian Stevens (01:16):
Yeah. Like, the, I'm probably gonna date myself,
because, you know, like in highschool, you know, it was really
when, like, programming juststarted to become, you know,
early parts of like high schoolcurriculum. And so I was able
to, I was able to, dabble is theright word, but, you know, get a

(01:37):
little bit of experience, youknow, on that front. But I but I
thought it was, just a fun classto take, like Okay. Create or or
create.
I didn't think it was could everbe a curriculum. And then, move
from, we moved the family grewup, in New England. So even
though we were kind of spendinga lot of time in New York, you
know, born there and wentthrough high school there, they

(01:59):
wanted to get back to, myparents' roots, Maine, New
Hampshire. So we, we moved backmy senior year. And, and, but I,
I, you know, the UNH thing kindof came out, came about
backwards because I actuallywanted to be a creator, but I
didn't know that you could dothat with code.
So I was trying to do it with,Wood, believe it or not.

(02:20):
Interesting. And, yeah. And soso I had, like, this guidance
counselor that I just, like,have to pay down how much I owe
him because he just, like,steered me in this other field,
which was computer science,which was just getting rocking.
And and the University of NewHampshire actually had a really
great emerging computer sciencecurriculum and in part because

(02:43):
they were, Digital EquipmentCorporation.
You know, that grew up inMassachusetts. And so they UNH
was kind of a feeder school ofall their technology. And so
just kind of fell into fell intoit there, to be honest. Like, I
was thinking about it for yearsand just fell in love.

Kevin Horek (03:02):
Very cool. So walk us through your career, maybe
just some highlights along theway Because you've done a ton of
stuff, and I really wanna kindadive into neural magic and what
you guys are doing there.

Brian Stevens (03:14):
Yep. Yep. Yep. Probably had, like, 3 main
sections with the first was, youknow, like I said, it was
digital. They did all theiroperating system development in
Southern New Hampshire, and so Ijoined that team, you know,
right out of college.
And that was great because, youknow, you really only got access
to technology if you work for abig company. Like, the Sure. The

(03:35):
whole of open force wasn'tprevalent back then. And so if
you wanted to, like, explore,then you had to work for
somebody that had thatcapability. And then but it was
great for a for a young engineerbecause I just, you know, just
brilliant people that I, wasable to spend my day with and
learn from.
And so that that set me up, butI was a I was an oddball
computer science, developer, CSdeveloper in that. I really

(04:00):
cared about the outcomes thattechnology was driving and the
user experience that youraudience had around those
technologies. So I I kind offell into that direction, and
that, you know, fast forwarding,it led me to the future for
enterprise wasn't gonna be, youknow, these proprietary
companies that were selling bigexpensive servers and

(04:22):
proprietary software stocks. Itwas gonna be more along the
lines of, like, commodity intel.And so that was the pathway that
led me to to Red Hat, and then,just first principles looked at,
one, what's a business model onopen source?
But then I think moreimportantly in that is, you
know, what do enterprises reallyneed, from techno from, you

(04:45):
know, technology stacks. And Idon't mean just the features. I
mean, everything else aroundthat. And we focused on we
focused on that and not justspeeds and feeds. And then, and
and and part of that was, youknow, I was lucky to see, you
know, the, you know, the publiccloud's born.
Amazon was our biggest customerat the time, so I saw that, you

(05:08):
know, front row seat. Did thatmovie and then, you know, went
and joined Google for 5 yearswhen they were just getting
started and and building cloudand, went there, moved out to
California, and, again, like,refocus them, you know, away
from just serving tech companieswith with cloud capabilities and

(05:31):
and, you know, pointed that atenterprise. I just felt like the
opportunity for whatenterprises, you know, need from
as a service, like, wasunfulfilled. Did that for 5
years. Dad was diagnosedAlzheimer's back in New
Hampshire and

Kevin Horek (05:47):
Oh, sorry to hear that.

Brian Stevens (05:48):
Get back in yeah. Thank you. So so that led the
the a really quick transitionback. Still stayed working for
Google, but, you know, beingremote pre COVID, you are the
odd person out. So did that fora couple years, and then and
then, yeah, kinda wrapped thatup and had intended on just

(06:10):
doing, some boards and actuallywriting some code.
And that was kinda where I wasat, up until, Color began.

Kevin Horek (06:20):
Okay. So walk us through, coming to Neural Magic
and what exactly is it?

Brian Stevens (06:27):
Yeah. So Neural Magic, was born out of MIT, and,
the the 2 founders. And one ofthe founders, Nir Shavat, MIT
professor, he focused, he whathe teaches on, what he's expert
at is is is really, he'll killme the way I say it, but, like,

(06:47):
really under really, you know,systems level, infrastructure
and performance. So reallyunderstanding, you know, how
computer systems and specializedprocessors work, and then how
you build, that marriage betweenhardware and, software
efficiency. And so, you know,really extraordinary person.

(07:10):
And then and then along comes,you know, the emerging field of
AI and behavioral. And hestarted, you know, looking at
how would AI meet computersystems. And the ideas he had
was, was that you could thatexisting commodity CPUs, you

(07:31):
know, Intel class CPUs, thethings you, you know, running in
your MacBook, could become,like, really great, processors
for running, machine learningmodels. And that was really
contrarian, back then. But thatwas really, you know, his and
and his other co founder Alex'searly ideas on, like, building a

(07:52):
tech stack that made AI workreally amazingly well just on an
ordinary CPU.

Kevin Horek (07:58):
Okay. So walk us through how you came to be there
and and become CEO.

Brian Stevens (08:05):
Sure. So my my plan was, don't ever take a full
time job again and just get toexplore and spend your time on
the, you know, piece of it thatyeah. Like and so I'd reached
that point, and I was veryhappy. And then, unfortunately,
I met, Nir, and and I just fellin love with him. I fell in love

(08:28):
with, and I met him through aVC, you know, a Okay.
A VC friend, you know, of mineand, just had dinner with him,
but with no intended you know,no intentions. But the but he's
just, he's the kind of personthat that you would love to
spend time on with in anycapacity, to be honest. That's

(08:49):
awesome. And then, you know, Iwas always, like, the areas I
spent my time on are areas that,like, that really trying to
understand something that that Idon't understand. Right?
I mean, it sounds obvious, but,like, but that really intrigues
me. And what I realized in thespace of AI machine learning
specifically, it's everythingyou've learned made you realize

(09:13):
there's so much more you have tolearn. So you didn't feel like
you're an expert, but you had Ithink the journey is you're
never expert. And that's that tome was, you know, at this point,
my career was really highlyinteresting. And, you know, it
and with Nir taking that focusaround, a contourium view
around, you know, running onCPUs was how was really

(09:33):
interesting.
And so it was, you know, the,committed a day a week, to help
Neural Magic as they build abusiness. Did that for about a
year year and a half. COVIDhits. All of a sudden, I
realized even though I onlywanna spend a day a week with
them, I've realized I'm spending7 days a week. That's awesome.

(09:54):
And then, yeah. And he's a verypersuasive guy and I was very
passionate about the space. So,you know, after a year and a
half of advising them, I Ijoined full time just about 3
years ago as CEO.

Kevin Horek (10:06):
Very cool. So what exactly does neural magic do?
And then can you maybe explainthe concepts behind it without
getting too technical? Andbecause AI is everywhere, people
hear that, and machine learning,do you maybe want to give us
kind of, like, a high level ofwhat you guys do, how it ties to
that, and kind of where we're atin the space? Because I think

(10:27):
there's a ton of myth aroundwhat's happening right now too.

Brian Stevens (10:31):
Sure. I would say the if I kind of put the what AI
can do aside for a second Sure.That's perfect. And, and I go
and look and what I love now is,like, everybody has a really
well I'd say pretty wellcalibrated on the capabilities
of AI because, you know, evenpeople that aren't in our

(10:51):
fields, right, is experienceexperiencing it through, like,
CAPT and other models. Sure.
Right? And so so that's reallyinteresting. So I think people
can see, these capabilities thatdidn't exist before. What we're
trying to do at the highestlevel, like, from a mission
perspective is bring thosecapabilities to, enterprises,

(11:16):
but bringing the capability in.So in many ways, like, that way
to explain neural magic is tofirst talk about just the
capabilities of AI that havereally exploded, you know, in
the last, well since November ofof 2022 when OpenAI made the

(11:37):
Chachapiki model.
So it's really easy for peopleto see these capabilities around
especially around large languagemodels that didn't exist before.
And what we're aiming, there wasmajor breakthroughs on that. So,
like, on on how they got there,which is really, really
compelling. But what Neuromagicis aiming to do is to enable

(12:01):
enterprises to use thatcapability, but to do it in a
way where they completelycontrol their own destiny. And
so it's really back to, youknow, the open source roots, you
know, that I, you know, grew upon with with Red Hat around the
the the control and flexibilitythat open source brings to, end

(12:26):
users and enterprises.
And so one would argue, like,well, yeah, but there's no open
source capabilities in the inthe AI space. Those are only,
like, the big tech companies.That's actually not true. So,
like, the an amazing, group ofof AI models for large

(12:46):
languages, has been developedand, in open source with really
permissive licensing. And theinnovation rate of these new
models, you know, they get theyget more accurate, they get
faster on a month over monthbasis.
So there's there's great, set ofchoices that people have to own

(13:06):
their own AI model. And whatwe're aiming to do is to help
them use those open AI modelsand, be able to optimize them in
such a way that they can runanywhere that an enterprise
would want them to, whetherthat's, you know, in their in

(13:28):
their cloud zone or whetherthat's inside of their an
existing data center, whetherit's to run inside of a brick
and mortar, location. You know,once the model state of the art
is opened up, it really puts,you know, customers and
enterprises in control of theirdestiny. And the value of that
is, you know, they get pureprivacy. They can 1, they own

(13:51):
the model.
2, they can customize the modelas it makes sense, on datasets
that make sense to their usecase. And then they control,
like, the terms of deploymentlocations and their choice of
infrastructure. So so it can bethis really liberating future
where where you'll get the bestcapability, but you'll get it on
your terms. And where neuralmagic comes out of that is,

(14:15):
there's these models aremassive. Now Sure.
Okay. L and large languagemodels. So and that was the
breakthrough. The prior worldwas, you know, large language
models wouldn't actually behaveany better than smaller models
from a from a capabilityperspective. And that was
actually turned out to not betrue.
So they're amazingly capable,but they're really hard to run

(14:39):
just because they're so big. Andso that's why the world, let's
say, without neural metric hasbeen heading down this pathway
that you gotta, like, buy reallyexpensive infrastructure to run
the models on. And thatinfrastructure is prevents a lot
of optionality, you know, thatcustomers would have otherwise.
And so we get in and we optimizethe models, and we build

(15:03):
deployment capability that letsthe models run, like, super
efficiently across, allinfrastructure choices. And then
and then so you can you get topick, you know, where and on
what you wanna run your AImodels on.
And that's really important,because if you really believe
this world where these AI modelsare gonna be parts of every

(15:27):
application, and that's kind ofwhat I subscribe to in the
future, they're just gonna belibraries and then you're gonna
use AI capability, you know,across the stack. Then, you
know, you need the flexibility.You want that to feel like just
any other application and not apiece of the app that has to
have, like, really expensiveserving infrastructure in order
to run it.

Kevin Horek (15:49):
No. I agree with you. I I think it's gonna be in
everything, and it just I thinkit makes a lot of sense for a
lot of things. But but I'mcurious. Okay.
So if I'm a large retailer or abank or or, like, a enterprise
company, how do I actually startusing neural magic? Like, do you
guys consult and kinda come inand look? Or, like, how do you

(16:11):
figure out where I can actuallyimplement the technology and
where I should be using AI?Sure.

Brian Stevens (16:16):
And so the the where I should be using AI, most
enterprises are already downthat road.

Kevin Horek (16:24):
Okay. Oh, interesting.

Brian Stevens (16:25):
So they're already down the path of, what
you just said is use caseselection. So where are, in some
cases, the high valued ways Iwanna integrate? Typically, it's
vision or language models.

Kevin Horek (16:39):
Okay.

Brian Stevens (16:40):
My existing enterprise. And so, and so
they're already so we we reallymeet them where they are, where
but I'll be at with, you know,the vision ones are you know,
have been going on for years.The large language models are
really have been only going onfor, like, the last 6 to 9
months. But so we work with themonce they've, defined the set of

(17:05):
use cases that they want to useAI for. And then we work with
them to, you know, reallyshoulder to shoulder to how to
optimize their model simply, ina way that gives them the
flexibility to run it on, youknow, their preferred piece of
infrastructure, you know.

(17:25):
And so and then often part ofthat too will come help with in
most cases, they wanna fine tunetheir AI models because, like,
if you, like, a good example islook at the, OpenAI models.
They're really amazinggenerally, but they they won't
have a level of specificity thatmight be important to an an end

(17:49):
user. And so to get that levelof specificity, like one example
that uses, like it might knowwhat a cucumber is, but it might
not know the, you know, the thecondition, shape, size,
etcetera, like, of a cucumber.And Right. So what enterprises
do if you're in the cucumberbusiness is they can actually

(18:10):
fine tune and train their model,you know, that existing model,
like, to know that level ofspecificity.
Did that make sense for theiruse case? So we can help them
with that fine tuning. But mostimportantly, we take these large
language models, and we weshrink them down significantly
that in a way that keeps theiraccuracy. But the the shrinking

(18:33):
of the model means, as we weretalking about before, these
heavyweight models are hard torun, all of a sudden become much
smaller models that are easierto run. So that's a big part of
kind of magic that is neuralmagic is how do you actually
make these models smaller butnot lose any of the capability.

Kevin Horek (18:49):
Right. Okay. And then how do you work with them
kind of on a hardwareoptimization space? Because we
all know that is part of the bigchallenge with AI right now too.

Brian Stevens (18:59):
Yep. And it and it's the 2 parts. So the the
first part is apply a set ofmodel optimization techniques to
their existing model.

Kevin Horek (19:09):
Okay.

Brian Stevens (19:09):
And that makes the model smaller. And it it's
obviously way more complicatedthan that. But, like, but it's
it's not from an and the bridgeuser where, like, how that works
is kinda rocket science. Andthen and then the second part is
then what we are able to do iswe're able to help them size. So
the, really, it means, like,choosing the right

(19:30):
infrastructure, in this case,it's CPUs.
Do you wanna run on Intel, AMD,ARM? What size of processors do
you need? How much memory do youneed? Aspects like that. What
generational hardware do you canyou do you wanna use something
that is legacy hardware thatyou've, you know, already have
in your data center?
Or do you want to, like, takeadvantage of some of the newer
capabilities of CPUs that arecoming out that have, like, AI

(19:53):
friendly instruction sets cominginto them. So there's
definitely, help them withsizing and then help them with
optimization, at runtime.Because we also have a software
stack that runs on the CPU, thatruns these models in a really
performant way. And so there isa there is a a path where we

(20:14):
work with them generally to dofurther optimizations at
deployment size at at deploymenttime.

Kevin Horek (20:21):
Okay. So how does it work then from your side?
Like, do you just bring in,like, different members of the
team based on what the user istrying to do? Do I just
basically go to your GitHub andand start implementing? Or,
like, walk us through

Brian Stevens (20:40):
that. Yep. So, like, the the it's definitely a
a product led, experience.Meaning, what we didn't want is
we didn't wanna have to have thecall sales button on the
website.

Kevin Horek (20:54):
Right.

Brian Stevens (20:54):
Yeah. Right? Like and so what and which is fine.
It's a great choice for many,but, like, we wanted to build a
open community that, enterprisedevelopers or otherwise, can
just join our Slack, right, andactually get started. And so we
try to make the productcapabilities, the documentation,

(21:17):
the how to's all available.
And then and then as as peoplejoin the Slack, then and they
have they can just go there forhelp on how to get started. They
can go there, when they havechallenges. And so we meet them
in Slack first, the community.And so there's thousands of, you
know, machine learning engineersthat live in the Slack community

(21:38):
and we help them there. And thenthere's also enterprise pathways
because, you know, sometimesthat's not necessarily the best,
pathway for for certainenterprises.
So then the enterprise pathwaysare such that, you know, we
really look like a extension oftheir machine learning
engineering team. So it'sdefinitely the capability we

(21:59):
have are codified through,software products, You know,
when for the one, you know, setof software tools for the
optimization piece, and then oneset of software tools for how,
you know, running it on CPUs inproduction. But still though
with that, like, the best thingyou can do is to, you know,

(22:19):
help, you know, machine learningengineer engineers that are
getting started or this is newto just help them shoulder to
shoulder and act like anextension of their r and d team.
And that and that's what we

Kevin Horek (22:31):
do. Got it. Okay. You have some demos and examples
on your GitHub page. Can youmaybe give us some examples of
how people have leveraged thetechnology just so, you know,
people can say, like, hey.
I could actually use that.

Brian Stevens (22:45):
Yep. The vision is definitely further along, you
know, in terms of because it's amore mature space. Sure. And so
like the you know, for years,people have been using vision
and neural magic for vision, usecases. And then the large
language models are really just,you know, that's where all the

(23:06):
fervor is right now.
Right? Okay. This is becauseit's going to have a larger
impact on types of things thatlarge language models can do for
enterprises from operationalefficiency, etcetera, etcetera.
So but they're definitelyearlier on that journey, with or
without Neuromagic. But on thevision side, like, oh my gosh,

(23:28):
customers, like, doing, youknow, self driving car lane
detect well, not necessarilyself driving car, but lane
detection.
So pretty sure your vehicle'sgonna have lane detection.
Right? So imagine sticking,like, a big heavyweight power
hungry GPU into a car. Not justfor the cost, but just the power
consumption and the failurerates, you know, etcetera, these

(23:49):
things have. They're notdesigned for that.
So the ability just to run yourAI model on your existing CPUs,
you know, that might that arealready in cars for doing lane
detection is, like, one reallypowerful use case. Interesting.
It also includes, you know,airplane use cases that are in
there around for vision. Anotherset of eyes is always a really
great thing as we know. Sure.

(24:11):
When you're taxing or otherwise.Retail stores, a lot of, using,
like, RSTACK and just generallyretail stores that, are are, in
some cases, helping withshrinkage. You know, it just,
you know, positive as if we'regonna give you a better

(24:32):
experience as you do your ownself checkout, but also, you
know, helping, you know, I'm notsure that's a better user
experience you have, but it'sdefinitely deployed to help
reduce the amount of, you know,shrinkage or theft, right, that
retails have. So that'shappening generally across
retail, including at, like, the,you know, as you walk out the
door. So number a number ofthings like that, just anywhere

(24:56):
you can imagine, like machineintelligence for vision.
There are usually cases at theedge that GPUs are are are
difficult.

Kevin Horek (25:05):
Okay. And that that's that's really
interesting. I'm also curious towhat are your thoughts on kind
of the large action model stuffthat's kind of coming out too?
Are are you guys gonna ever gointo that space or or what are
your thoughts on that space?

Brian Stevens (25:18):
We we haven't yet. Like, everything's really
been large language right now.So we Okay. We had the the last
year, and we're just, about tobring out support for that. And
the reason being is the way theway large language models
process, they're even moretaxing on infrastructure.
So as much as, you know, so thestate of the art, for for

(25:41):
natural language processing,there were smaller models
before. And they didn't have thecapabilities that these new
large language models have, likethese open source LAMA, open
source mies trial. These arereally exciting new state of the
art models that have thecapability of what you'd see
come out of, you know, the bigtech for serving APIs. But they
process completely different.They're really, they actually

(26:05):
need more infrastructure, to runthese, because not only the
amount of compute, but the, theyreally press on the memory,
requirements and memorybandwidth requirements that, you
know, that that you need to runthese.
And so, like, a lot of our techstack, was rebuilt to support

(26:25):
large language modelsspecifically, so that you don't
need to so you can run them onCPUs, so that you can run them
on even, you know, GPUs thathave less memory or smaller
GPUs. In many cases, you know,you can use our optimization
techniques and you won't need abig GPU. You can use a small
GPU. So all that's really givesenterprises optionality and it's

(26:46):
and it's really liberating. Itlets them use AI in use cases
that, that are more pervasivebecause the cost is reduced.
No.

Kevin Horek (26:56):
That that's really cool. So I'm curious to get your
thoughts on you you mentionedsomething earlier about, like, a
lot of companies that you guyshave been working with actually
have what they wanna do with AI.I've seen, like, where yes.
That's true for some companies,but I've seen where companies
maybe wanna implement it, butthey don't really know where to

(27:17):
start or how to go about doingit. What are your thoughts
around that, and how can neuralmagic, like, help them actually
maybe go through that discoverya little bit more?
Because let's be honest, not allof us are technical. And I think
not a lot of people have stillkind of played with AI or maybe
they have and they don't knowthey have. Right?

Brian Stevens (27:36):
Yeah. Yeah. No. That's true. And then, like so
we, because we are, even thoughwe focus on the optimization
aspects of that, So existingmodels, that that we optimize
and make them run anywhere.
The reality is we have theexperience of, I'd call it like
the the art of the possible.

Kevin Horek (27:58):
Sure.

Brian Stevens (27:58):
So so if you looked at large language models,
what what are a reasonable setof expectations, that that a end
user could have around thecapabilities of these new AI
models that are coming out? Andso we can very much help them
with that. So Okay. Obviously,the the part that they need to
do is to really understand the,the, you know, the grounds up

(28:24):
set of use cases they wannaconsider. And then we can help
them down select that into someOkay.
Choices based on the AIcapabilities. And, you know, one
of the, public companies thatI'm on the board of did
something similar, you know,where they they across their
space and the way they work withcustomers, they they looked at
over a 1000 use cases. You know?So they Oh, wow. Outsourced a

(28:46):
1000 use cases of the types ofthings that AI could do, and
then they down selected thatinto the 10 based on the
capabilities of large languagemodels today.
And that I think is a reallyvery reasonable choice because
you said it best. I'd say, like,this is it's a world where
everybody's doing it, but noteverybody's, like, delivered to
production yet. Yeah. Right?Because I think it's, like, you

(29:10):
know, there's skills that areneeded as well, right, around
this and and assembling yourmachine learning engineering
team, you know, that has abilityto, not just assess the use
cases, but then also get theright model into production and
maintain it.

Kevin Horek (29:25):
No. That that makes a lot of sense. So I really
wanna talk about, maybe this iskind of high level, but it seems
like if you read the news, it'slike, you know, AI is taking
over the world and gonna destroyit, which is no worse. Like, I
don't think that's ever gonnahappen, but we are so far from
that ever being an issue that,like so, like, where we actually

(29:49):
at with, you know, kind of AIand machine learning? Because I
I think there's so many miss outthere and, like, there's so much
paranoia around it.
So can you maybe, like, talkthrough that a little bit?

Brian Stevens (30:01):
Yeah. I think, like, the, I think a lot's
changed in in, you know, a year.Sure. We weren't talking about
these things a year ago. So thethe the capabilities I worry
less about, you know, thesetting an AI model, right,
that's learning and smarter thanpeople.
Yeah. Part that keeps me up atnight is because, just like

(30:26):
these, even the open source AImodels can help enterprises,
become more efficient. Right?What used to take a knowledge
worker to analyze data, is likenow all of a sudden the
knowledge worker is just makingthe decision the data is
analyzed for them, like, in areally meaningful way across,
like, large volumes of data.Like, those, like, knowledge

(30:49):
workers have just been, like,completely empowered A 100%.
Through the use of of AI. Andthat's amazing because it's all
around, like, you know, youknow, having more output per
person. Right? So I don't lookat this as the world where you
don't need people. I look at theworld where the, the business
output per person is just gonnago up much higher.

(31:10):
And that's an amazing thing. ButI do worry I do worry less on
the sending aspect and morearound the fact, like, you know,
if if we've, you know, alsoarmed, you know, the bad people.
Right? So, like, in my eyes,just because I think, like, you

(31:31):
know, like, if you're trained intechnologies, you can look at an
email, And you can look at itand know that, okay, that's a
phishing email. That's, youknow, not there's no way.
Look at the URL. There's justdifferent aspects of it, right,
that you can look at and know,like, it's spam, it's fake, and,
you know, don't click that link.That's gonna become next to

(31:51):
impossible. Yeah. Down the road.
And that's I think it's area soyou need so it's gonna be a
security arms race as wellbecause the quality of the bad
people are just gonna getthey're gonna get a, more robust
set of tools. Right? It'llactually really look like you
know, today, like, the fakingPayPal stuff looks like fake

(32:12):
PayPal. Yeah. Yeah.
100%. This world down the roadwhere, man, that looks really
incredible. And so that's thestuff that I think worries me
the most in the next, you know,number of years.

Kevin Horek (32:26):
Yeah. That's that's an interesting point. But to
your point before that about,like, it's changed my workflow
immensely and just saved me aton of time. Like, I moved my
default search engine away fromGoogle just to one of these,
like, chatbots now because it'squicker and it gives me better
things. And then I'm generatingdesign ideas with, you know, AI

(32:47):
sometimes.
And, sure, it maybe only getsyou 20 to 80% of the way to a
screen. But if I can just youknow, on the days you're not
feeling creative, if it can justspark some creativity quickly
and you can iterate from there,like, and then you can summarize
things and make me a betterwriter. Like, it's it's not

(33:07):
really taking away. It's takingaway maybe some of the boring
parts or the parts I hate or theparts I'm not good at, but it's
not wrecking my job, at leastnot today.

Brian Stevens (33:17):
Yeah. Like, you're more you're more
effective. Right? I mean, maybeit's better quality, but you're
certainly more effective, youknow, and your output, you know,
that you get done, You know,it's amazing, like, even in your
design background. Yep.
Like, you starting from, youknow, some you know, you can you
can probably have like asoftware tool that you now can,

(33:40):
you know, develop you the someof the starting designs that you
want. Not so it's not just it'snot just, you know, text. And,
and I think that's reallypowerful. And so, so I agree. So
like, like you've, you'refurther along than I am.
I've, I've used all of the themodels. It hasn't changed what
I've done in search. It's justbecome a, a daily tool that I

(34:01):
used, for different tasks,together with search, at least
the types of things that I usesearch for. So it's just like
yeah. So now it's not just onething goes to the next.
It's just like we've just we'vebeen armed with this amazing
tool that's gonna make us, muchmore much more productive.

Kevin Horek (34:17):
Well yeah. And, like, even just something as
simple as, okay. Like, I I havea design, and then I can test it
through AI, like, a heat mapthing. And then out of Figma,
you can get it to write, youknow, react or whatever code you
want. Sure.
It's not a 100%, but it's betterto give the developer 80%. Maybe
they're gonna have to rewritesome stuff than them starting

(34:38):
from scratch. Right? And then Ican go in and tweak like, it's
changed my workflow immensely.Right?
And I think if it can it it'sobviously only gonna get better.
And if people I think the bigthing is, and I want your
opinion on this, is it's comingwhether people want it or not.
It's already here arguably. Youbasically need to adopt these

(34:58):
tools to make yourself betterthan trying to say, like, no.
Don't do that because it's gonnawipe me out.
Right? You're gonna have to justembrace it.

Brian Stevens (35:07):
Yeah. And I think I think that's true. Like, you
were saying, like, if you had toback up the clock, what degree
would you pick? Right? Yes.
Is your is your is your, thething you pick going to be
automated away or is it going tobe empowered? And I'm multi
camp, like all these degreesago, I'm just, maybe I'm

(35:28):
contrarian, but I'm on the camplike these, these, the, the jobs
are still going to exist, butman, you're going to be just
supercharged, you know, notjust, like out there with a, you
know, back at, back at yourHOMV, the comp, just the you'd
write a 1,000 line program.Right. And back then, like just
the taxing that you had on theinfrastructure, it could take it

(35:51):
an hour to compile that program.So the reality of it was, and so
look at today's computers, likeyou can file in seconds.
So today people use the, youknow, they don't have to write
quality code because thecompiler like catches everything
right away. We had to likereally the tax of having a bug
in your code was really highYep. Because it caught for an
hour a turn. So we got we gotreally good at, like, developing

(36:14):
software first, you know,developing a program first out
of the gate. So I look at it as,like, that you just said, like,
the the design tools, the writerauthoring tools, the the code
the the open source code models,and there's, you know, the the
paper token ones like GitHubsand stuff like that.
But the the the open sourcecoding tools are really popular.

(36:35):
And and if all of a sudden youcan accelerate and generate half
the code, you know what I mean,that a developer would have, you
know, that a developer uses.That's really powerful. It's
only gonna get better. But,yeah, they're still gonna be
there as a job.
They're just gonna, like, getway more done, and you're gonna
end up being you're gonna havethe company that you work for or
the start up you work for isgonna have a even more robust

(36:58):
set of tools, but you'll stillexist.

Kevin Horek (37:00):
Sure. A 100%. Well, even some of the no code code
tools now or, like, the barrierto entry into the space now to
even build your own startup.Like, Flutterflow is really
good, and, you know, Bubble'snot bad for certain things.
Like, there's a bunch of reallygood no code tools.
And then the crazy thing aboutit and I'm not a developer,

(37:22):
like, we kinda cover throughoutthe show. Like, I built my own
chatbot in Flutterflowconnecting to chat g p t. Like,
sure, it was a tutorial online,but 2 years ago, I would like,
it would not have been possiblefor me to do that. Like or I
would have struggled so hard andneeded, like, so much time, and
what I built in an afternoonjust wasn't possible for me to

(37:43):
do, like, 2 years ago. And whyI'm bringing this back is
because I think what you guysare doing at Neural Magic like,
anybody can leverage thetechnology that you're building,
whether you're technical or not,because I think there's so many
tools that you could justleverage to to use, you know,

(38:03):
your basically large languagemodel back end, basically.

Brian Stevens (38:07):
Yeah. Yeah. Yeah. Like, I mean, there's there's
there's it's gotten so mucheasier, and we're only a few
years in. I I love the I mean,the in the low code, like,
that's a great example because,like, the the low code was
trying to solve the sameproblem, make development more
efficient.
Right? You just shouldn't haveto be, like, understand how to
build a full stack program.Right? Like, it just doesn't
make sense. And so the missionof of of, you know, AI helping

(38:31):
that process, you know, they'rethey're really kindred kindred
spirits and, like, yeah, like, II just don't wanna see this
world where AI is used for themost highly valued use cases
because they're it's reallyexpensive.
And so it has to have an ROI. SoI don't want and you just have
to do an ROI analysis of, can Iafford, you know, whether it's

(38:54):
the infrastructure or paying pertoken? Right? Like, I want to
eliminate that and just let AIbe used everywhere that's
helpful. And you're not making acost analysis of whether it
delivers the value.
You know what I mean? For thecost that, you know, it takes
to, to deploy it. And I justthink like, and so again, it's

(39:15):
much it's very aligned with opensource and, you know, let's just
go commoditize this whole spacearound the these deep models,
you know, not just theframeworks Yeah. But deep models
and make them easier for peopleto use and adapt.

Kevin Horek (39:29):
That's that's actually really fascinating. So
how does Neural Magic monetizethen?

Brian Stevens (39:35):
We, so we so everything around the, ways you
optimize models

Kevin Horek (39:41):
Okay.

Brian Stevens (39:42):
We just pay open source. So, like, so pre
deployment. So there's a lot oftechniques around, quantization
that are out there that we'veled the, all the research on
these new, you know, the thebest quantization arms are out
there as well as, what we'reknown for is also sparsity,
which is how do you, like, 0 outa lot of what's called the model

(40:04):
weights that these things have.And if you can zero out 80% of
the model weights and keep, youknow, 99% of the accuracy, then
you got something. But thatside, we've all open sourced
that.
So all these tools and researchtechniques on how to optimize a
model we've open sourced. Andthen, what we monetize is when
somebody, gets to the pointwhere they wanna, put these AI

(40:28):
models to work in production usecases.

Kevin Horek (40:30):
Right.

Brian Stevens (40:30):
They will monetize the deployment side
engine,

Kevin Horek (40:34):
Okay. That

Brian Stevens (40:34):
runs on the CPUs, you know, so it comes by way of
a subscription with support and,you know, you know, everything
around, helping people with the,you know, their deployment stack
and the models they're using,etcetera, etcetera.

Kevin Horek (40:48):
Got you. That that makes sense. And it's just like
a paper usage kind of thing, orhow does that work?

Brian Stevens (40:52):
It's a paper package, so it can be sized
based on the type of problemyou're solving. So the easiest
way because it accelerates,model optimization, so, you
know, I mean, it can make a aset of CPUs 4 to 12 x faster is
you just take a tiny piece ofthat, you know, the efficiency

(41:14):
that you could even back throughway of subscription. But then
the value you bring to them is,develop a partner, you know, and
a support partner, that helpsthem on their journey.

Kevin Horek (41:23):
No. Makes makes a lot of sense. So you've been in
this tech space a long time,some very big roles at some very
big companies. What advice doyou give to people that are
maybe just starting out, peoplethat are been in the industry a
long time, and and just anyother advice that you've kinda
learned along the way that you'dlike to pass on?

Brian Stevens (41:42):
Yeah. I think I think what, you know, what
helped me a lot was, especiallyon the engineering side, is
don't get buried into thetechnology. Really, really
understand, you know, theexperience that you're creating,
you know, for an end user asconsumer enterprise, etcetera.

(42:04):
Right? Doesn't matter.
But just really understand theproblem you're solving, the
experience that your end userwould have in using the
technology and focus and bemanic about that. It doesn't
matter what you build. And Ithink it's I think as as
software developers, it's oftentoo easy to get caught up into
the the software part itself.That's not an ultimate problem

(42:28):
that you're trying to solve for.And so that ends up what it
means is you end up wanting tobecome super user centric.
Right? Right. Conversation, youknow, that's what I always love,
like, all the user research.Like, you wanna be the best way
is to be, I think, be adeveloper, but then innovate and
increment vastly, quickly, youknow, through a direct

(42:53):
relationship with your users andadvance that along a lines that,
make sense for them.

Kevin Horek (43:00):
No. I I think that's actually really good
advice. And I think the nicething about what you just said
is, like, whether you love Appleor hate Apple, they basically
made that their business model.Right? Like, they it's not
perfect, but, like, they werethe one of the first companies
to say, like, we really careabout user experience.
And that doesn't necessarilyalways mean just, like, the

(43:22):
interface. It's like from fromeverything they do, it's all
about the user and the customerand trying to make them happy.
It's not perfect, but, like, Ithink that makes a lot of sense.
Because in my experience, andyou could tell me if you're I'm
wrong here, is it doesn't matterif you have a 1,000 features. If
nobody gets past the sign upbox, it doesn't matter.

Brian Stevens (43:42):
Yeah. Yeah. And people in it that's that's why I
love like, I've always said,like, the user experience starts
with discovery. Yeah. Right.
It's not just like once Iinstalled it and how it works.
And like you said, people oftenlike think, and it must, it must
kill you. Like the userexperience is just Yep. So Milo
was like, how did they discoverus? Like, you know, what did

(44:03):
they first see?
How did they try? How did theyprocure? Like, what was the what
was the contracting? I mean, bigpart of work to Google is making
contracting so simple. Like,what is the experience?
Like, the whole user journey,you know, like really matters.
And so I even started, like,you've mentioned apple, like
when I buy a consumer productand I open it up, I like spend

(44:25):
my time on the packaging andlike, oh, he is to open it. Did
they get that part right? Like,I just think it's the whole end
to end experience, and I thinktoo few people, don't think that
way, unfortunately.

Kevin Horek (44:39):
Yeah. I I always tell people that I think
everybody at a company should bedoing user experience because
Yeah. If if you don't havecustomers, everybody goes home
or has to get a new job. Like,that's the reality.

Brian Stevens (44:51):
Yeah. We're all I always said we all work for
sales. You know? Yeah. And thenand that was like like, why,
like, you've talked about, like,Apple, like, their early d
school stuff at Stanford wassome of the things that I tried
to emulate, you know, back atRed Hat Interesting.
Long time ago. You know, justlike getting back in touch. You
know what I mean? With with whoyour users are and and and

(45:12):
meeting their needs.

Kevin Horek (45:14):
Sure. But, Brian, we're kinda coming to the end of
the show. So how about we closewith mentioning where people can
get more information aboutyourself, neural magic, and any
other links you wanna mention?

Brian Stevens (45:23):
Yeah. Like like, I know we went through a lot.
Like, I think the the the thebeginning user voyage is is to
come to, you know, neuralmagic.com. And, and there you
can jump off. Like, you can jumpoff into the community if you
wanna go the direction or youcan just, you know, set up a set
up a session and and, you know,one of our engineers or sales

(45:45):
team will kinda take you throughwhat we do and how we can help.
And so we're we're very mucharound meeting people, where
they wanna be right now andwhere they wanna be in their
journey, and then helping themwith that.

Kevin Horek (45:59):
Perfect, Brian. Well, I really appreciate you
taking the time and your day tobe on the show, and I look
forward to keeping in touch withyou and have a good rest of your
day.

Brian Stevens (46:06):
Thanks, Kevin. Enjoy that.

Kevin Horek (46:07):
Thank you. K. Bye.

Intro / Outro (46:09):
Thanks for listening. Please visit our
website at building the futureshow.com to join the free
community. Sign up for ournewsletter or to sponsor the
show. The music is done byElectric Mantra. You can check
him out at electricmantra.comand keep building the future.
Advertise With Us

Popular Podcasts

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.