The Future of Video Encoding with ASIC Technology

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:07):
Voices of Video.
Hello Dylan, thank you forjoining me today.
So can you please start byintroducing yourself a little
bit about yourself and also yourrole at Semi Analysis.

Speaker 2 (00:28):
I am the chief analyst of Semi Analysis, and
Semi Analysis is a market andsemiconductor supply chain
research firm and consulting andwe focus on, you know, from the
manufacturing processtechnology all the way through
to design IP and ASICs, andstrategy as well, all

(00:50):
surrounding the semiconductorindustry.

Speaker 1 (00:52):
So today we are going to discuss about the
development of the ASIC projectsin large technology companies
and also the driven factorbehind that massive ASIC
investment.
And in fact so, dylan.
So you come on our radar screenwhen you wrote about the very
famous Google Agos article.

(01:14):
And why did that technologyintrigue you, and can you give
us a little bit brief, a briefabout your thoughts about that
or your fundings?

Speaker 2 (01:27):
So Google's created a few pieces of custom silicon
you know, ranging from the youknow, the tensor processing unit
for AI, which you know is veryfamous and everyone knows.
But one of the lesser knownpieces of silicon that they've
sort of created for their ownin-house needs is Argos.
It's a completely new kind ofapplication specific circuit

(01:50):
right it's called.
It's a video coding unit or avideo processing unit, vpu, vcu
and the main idea behind it isthat Google has very unique
needs or at least you know theyhave a unique scale with regards
to how much video they ingestand then serve, or photos they

(02:11):
ingest and serve, and so youknow they'll get video in all
sorts of formats and resolutionsfrom their consumers uploaded
to.
You know a number of their IPs,like YouTube, google Photos,
google Drive, and they have tostream it back out to these

(02:32):
users.
But you don't just store theraw file and then send the raw
file right, because what ifsomeone wants the highest
resolution and someone elsewants they have a limited
bandwidth or limited data percycle?
You know cycle, so they youknow they have to store it in
many different formats and atthe main time.
You know, at the same time, youknow storing all this data
across.
You know the you know billionsof hours of YouTube that are out

(02:56):
there would be incrediblyexpensive on a cost basis, right
Of just storage of streamingright.
Data streaming is expensive, acost basis right of just storage
of streaming right.
Data streaming is expensive.
Networking is expensive,storage is expensive.
So they created a new ASICcalled the VCU Argos and the
whole purpose is to encode video.
Right, and before they encodedvideo as well, but they did it

(03:17):
with x86 Intel CPUs, right, theydid it with, you know, intel
Skylake and before that you knowintel sky lake and before that
you know prior generations ofintel cpus, the.
The problem with this is thatyou know these, these, this, the
cpus, are much less efficient,especially when you start moving
to more advanced video codecsuh, for example, vp9 and av1

(03:41):
that you know, save you know,save you on storage, save you
you on bandwidth but involve alot more processing to encode
the video, and so these CPUsstart to hang up in terms of
performance.
You start needing so many moreCPUs.
It's actually a delta ofmillions of CPUs that Google

(04:01):
needs if they were to encodeeverything with just CPUs, which
is incredibly costly.

Speaker 1 (04:08):
There's also one element about the innovation
products, right?
So it's not only regarding cost, it's also regarding how they
can serve the customers with anew appearance, new features,
like, of course, the statealready starting shutting down,
but without the vc, I believethey couldn't even start.

(04:31):
That that is again yeah so soyeah, yeah, so uh, I think you
also gathered many veryinteresting uh data in your
article, talking about differentsolutions and also the cost per
different solution, right?
So could you give us a littlemore insights about those

(04:54):
findings?
So there's a very interestingtalk about the software is
eating the world, but in fact, Ithink for the video industry,
it's not only eating the world,eating the CPU too, right?
So maybe you can talk a littlemore about that part.

Speaker 2 (05:11):
There's some interesting data that sort of we
could point to.
With regards to you know whythe CPU is being eaten and
Google Argos VCU, right, versusa Skylake CPU, the CPU is five

(05:33):
times slower and it uses waymore power to encode VP9, which
is Google's you know, videocodec for the entirety of
YouTube and it has been for youknow, many years, video codec
for the entirety of YouTube.
It has been for many years andit uses significantly more power
.
So, using their performance,even if you assume Google

(05:54):
servers are used 100% always,which is very, very hard to do
no one gets 100% utilization.
All the YouTube, google Photos,google Drive, you know video.
If you just assume it's all1080p, 30fps, and you do H.264,
which is a decade old or evenolder encoding technology,

(06:15):
that's 900,000 Skylake CPUs.
You know that's incrediblycostly.
Now, if you switch to VP9,which saves you a lot of
capacity and bandwidth whenyou're streaming video, then
then all of a sudden you're atfour point two million Skylake
CPUs.

(06:37):
You know each Skylake serverCPU, each Skylake server was,
you know, 15, 20, 30, $40,000.
You know it starts to add up toyou know billions of dollars.
And then if you, you know, andthat's just the 1080p 30 FPS,
right, you know, most people'sphones can shoot 4K 60 FPS.

(06:59):
Or, you know, a lot of peoplerecord at higher resolutions.
So if you use 4K 60 FPS as theassumption, then H.264 is 7.2
million CPUs and VP9 is 33.4million CPUs, right?
So this is this is getting tothe point where it's just
literally impossible to get thatmany.
Right, if you think about it,there's there's about in 2022,

(07:26):
there's about 30 million CPUsshipped total.
So we're talking about theentire capacity of the whole
world just for YouTube encoding,right, not even serving the
video, not even like any of thesearch or algorithms or comments
or likes or any of that stuff.
No, just encoding the videowould require the entire world's
capacity of CPUs.
So the situation is very direand that's why Google made their

(07:47):
VCU.
And as we look forward, right,you know, capacity of video is
just continuing to grow.
You know, in fact, more peopleare uploading video than ever
before with the advent ofshort-form video content, you
know, in the form of TikTok andInstagram reels and YouTube
shorts and so on, right?
Or Twitch, right, more peopleare streaming, right, you know.

(08:11):
With this stuff, you know,rising in popularity, being able
to store it becomes even morecostly.
You know you need to get thatfile size down, and so the
industry has rallied around AV1,which is a new video encoding
software, or codec, and itdramatically reduces the size

(08:32):
and capacity of files whilemaintaining the video quality.
But the issue is that it's socostly to compute.
You know I mentioned thesenumbers of 7 million and 30, 33
million.
These, these numbers would,would you know, more than double
, right, you know, if I thinkabout YouTube with AV1, you know

(08:54):
, you know it's hard to estimatebecause CPUs are quite
efficient.
But even with the most currentgeneration CPUs right, not
Skylake, which is really old,but you know, the most current
generation CPUs it would stillbe something on the order of 70
million servers.
So that's, that's an incredibleamount of compute.
That is not, that's not evenpossible for the world to make.

(09:14):
So YouTube wants to move to,you know, vp9, which they've
already1 with their secondgeneration version of that chip.
You know they have to use their, you know, in-house expertise
to design custom silicon forthis purpose.

Speaker 1 (09:31):
That's a truly amazing number and not feasible
at all.
So they have to find newsolutions.
In fact, that's also what wehear every day from our
customers.
So they tell us if we thoughtthe new solution, they couldn't
even run their business.
They want to or they need toright.
So it's literally the, if notnot possible.

(09:55):
In fact, when we talk about theGoogle, the YouTube, when we
talk about the AV1 or the newKodak, we know there's a famous
quote that they have a 500 hoursvideo uploading per minute for
the YouTube, right.
That's a few years ago, butthat only talked about the new
ingest videos.

(10:16):
They didn't talk about how manyvideos they have.
Once they have the new codecinvolved, they need to have
every video re-encoded in, right.
Think about that.
They themselves couldn't evencount that number.
That's just.

Speaker 2 (10:32):
That's an important point, right?
The numbers I was saying isjust what's uploaded today,
right, what's uploaded each year.
But you know we have 15 yearsof history, right?
You're going to want to savethat video and crunch down on
that storage so you don't haveto buy more storage because
storage isn't improving reallyin cost much.
So you know that's a greatpoint.

Speaker 1 (10:53):
Yeah, exactly.
And since you also mentionedthe Twitch, before, so last
September, you were highlycritical of Twitch cutting their
revenue splits to producerscompared to YouTube right.
So why side Could you providemore insights?

Speaker 2 (11:14):
Last September, twitch made a very sort of
controversial move, if you will,by changing their revenue
splits with their partners.
So this was in, you know, Ithink late September or October
they cut the revenue split from70-30 to 50-50, which is

(11:38):
significantly less than YouTube,which has a 70-30 split.
And you know, these cutstargeted, you know, their larger
content creators, becauseTwitch was, you know, in a bad
place.
Right, they have all this videouploaded to them and they have
to distribute it, but theycouldn't make, you know, enough

(12:02):
money to support it.
Why?
Because their infrastructurewas behind where Google's is.
Google has a superiorinfrastructure due to their use
of their Argos chip, whichenables them to give content
creators 70% of the revenue thatthey generate rather than 50%.
And at the same time, youtubealso provides higher quality

(12:25):
video, higher resolutions,higher bit rates, HDR, those
sorts of things, even on livevideo, which Twitch cannot.
Twitch does not offer thatbecause they can't with their
CPU architecture.
So Twitch needs to move to anASIC, but they don't have those
in-house design capabilities,whereas YouTube has been, you

(12:47):
know, and Google have beendesigning their own ASICs for a
handful of years for thisproblem, and so many of Twitch
has had some really big impacts.
Like sure, they were able to cutthe revenue split from 70-30 to
50-50.
But some of their biggestcontent creators moved to
YouTube.
They switched to YouTubestreaming and they brought, you

(13:09):
know, not everyone, not all oftheir viewers switched over, but
many of their viewers didswitch over to YouTube live.
And so, you know, amazon andTwitch, you know they faced a
big financial problem.
Right, they kind of they eitherhad to, you know, go down this
route of cutting you know therevenue splits or of they either
had to, you know, go down thisroute of cutting, you know the
revenue splits or losing.
They lost some streamers.
So, either way, they were in alose-lose situation because of

(13:32):
their inferior you know lack ofbasics and their inferior
hardware and serverinfrastructure.

Speaker 1 (13:39):
Yeah, and also think about if they want to on that
base.
Think about if they want to onthat base to create more unique
or new appearance to thecustomers.
That would be very challengingwith current infrastructure.
Right yeah, more interactivecontents or model, higher
quality videos or new formats ofthe service, but that that's
really challenging.

(13:59):
Yeah, yeah, so, uh, yeah.
So we mainly talk about the x86and also the ASIC right now,
but in fact in the industrythere are still some other
solutions like GPU or IPGA.
We call it a solution, butreally do you have any other

(14:20):
insights about the hardwareapproach that we should discuss
here?

Speaker 2 (14:24):
There's some other approaches in hardware out there
in the industry.
For example, there's someXilinx FPGAs that maybe target
this market a little bit.
There's some Intel FPGAs aswell, and then there's GPUs from
NVIDIA and kind of from AMD andnow Intel as well, that they

(14:45):
sell into this market and theyall, you know, claim they can do
video encoding, and yes, theydo do it a bit more efficiently
than CPUs, but there's somemajor limitations To integrate
these into your infrastructure.
You know there are somedifficulties with regards to the
software.
You know you can't just aresome difficulties with regards

(15:06):
to the software.
You know you can't just, youknow, put these in and expect it
to work right away, becauseyour users send you all sorts of
you know, video right, allsorts of format, whether it's
vertical or horizontal, ordifferent resolutions, different
bit rates, different framerates, and these solutions are
typically a little bit morestringent in what they can take,

(15:27):
you know, or they take a lot ofsoftware work to get them to
work for these more complex, youknow, or for these like varied
workloads and use cases, and sowhen you look at this right,
like you know, you look atXilinx FPGA or an NVIDIA GPU,
you know you might get betterthroughput than a CPU, but

(15:48):
you're still you have a lot ofsoftware work.
And then, furthermore, you know,when you look at an NVIDIA GPU,
right, how much area isdedicated to the video encoding?
You know, less than 10%,actually.
Most of the area is dedicatedfor other forms of compute,
right, the general purpose.
You know GPU type of.
You know graphics processing orrender pipeline or AI and ML.

(16:12):
And you know similar occurswith the FPGA, which is not
dedicated to video encoding.
You know it's not.
It doesn't have any areadedicated specifically for video
encoding, but it is a lessflexible architecture.
And so what that ends upresulting in is, yes, you get
some improvements, but you haveto give away some in software

(16:35):
and you end up with probably amore efficient infrastructure in
some capacities, but you'restill not bending this cost
curve in an order magnitude,right?
You're still spending a ton ofmoney on encoding video and,
furthermore, the availabilityand cost of each GPU and FPGA is

(16:55):
significantly higher than a CPU, right?
Intel's average sales price fora CPU is for a data center CPU
is somewhere in the $700 rangeand AMD is like $1,000.
You know, that's last year'sdata, whereas NVIDIA, you know
their GPUs are significantlymore expensive.
Xilinx FPGAs oh gosh, they'reexpensive, they're, you know,

(17:19):
they're $10,000, right is a morereasonable number for a
high-end nvidia gpu or fp orxilinx fpga, right, not, not,
not a thousand.
So it's, it's.
You know, you get, you get.
You get a lot better cost, uh,you get a lot better throughput
per chip, but then you end uppaying more per chip and you
have this inflexibility.

Speaker 1 (17:36):
Um, so there's some, there's some problems in in that
front I think eventually, whenyou talk about the, the best
solution, the video industry youhave to consider different
factors and the cost per streamor cost per customer.
That is extremely important andto my knowledge, I think the

(17:58):
ASIC is the best way to drivethe cost down, and that's one
thing.
Another one is that many peopletalk about the video as so
interesting, so attractive,right, but to the industry
people, sometimes they also saythat the video is the ugly
animal.
There's so many things thatcould go wrong, especially when

(18:21):
you talk about the live contents.
It really need a very focusedarea.
It's a very focused area, andtry to improve the quality or
try to improve all the features.
Try to serve the customerbetter.
You have to be that's yourfirst priority, otherwise it's
just a mediocre or it's notsuitable for the high end of the

(18:43):
video industry.
That's why I think they need tobe a focus and I didn't see
that in the GPU or in the IPJcompanies, right, the video for
them is still a small piece andthere's no focus.

Speaker 2 (19:00):
Yeah, the lack of focus is important, right,
because the main market for datacenter GPUs is not video, it's
AI and machine learning.
The main market for FPGAs iswell, there's not really a
single main market for FPGAs,but it's certainly not.
Video is anywhere in the top ofthat list, right, yes, they can

(19:21):
use it there, but when you lookat where they're adapting their
next generation FPGAs for, it'smore so for, you know, 5g
signal processing or AI ornetworking, right, it's not for
for video, video encoding.
So so this, these, these, theseproducts are going to make

(19:42):
compromises, right, they'rebetter than a general purpose
CPU, but they're still notchanging the cost curve, as Mina
mentioned earlier, in asignificant way.

Speaker 1 (19:54):
We use IPG a lot in our company.
We use that for our design.
So the IPG is perfect for thesmall volume and very unique
solutions.
You need to quickly adapt notquickly enough, in fact, but you
still have the flexibility toadapt the hardware structure and

(20:14):
to study the new features anddo simulations that are good.
But once you want to have it onscale or try to economically
make sense to serve realcustomers, it's not possible.
Right, it's not served for thatpurpose at all.
Yeah, yeah.

(20:39):
So, Dylan, you're also deep inthe change of the silicon
industry, right?
So what insights do you haveabout different strategies,
right, that the company isemploying for their customer
purpose-built silicons?

Speaker 2 (20:55):
I mean in this industry there are projects,
right.
I mean in video encoding, right.
You know Meta is known to havea project working on this.
You know they're not, you know,anywhere near as developed as
Google has, you know, with theirArgos chip.
You know ByteDance, the ownerof TikTok, also has a project in
this space.
It's unknown but it's believedthey're.

(21:16):
You know they're not, you know,functional with it yet, but
we're not quite sure.
It's kind of a black box.
But they're certainly workingon it.
And you know you look around theindustry at many other you know
of these major companies thathave, you know that aren't
necessarily semiconductorcompanies.
I mean, everyone's making theirown chips, right, apple, google

(21:37):
, amazon.
You know Microsoft is workingon some.
You know all of these companiesare working on it.
But you know, in the videoencoding market, only Google has
really brought it to bearsuccessfully.
And you know.
You know you would think.
You know, hey, amazon, theyhave some of the best custom
silicon in the world, right,they have Graviton, they have

(21:59):
the Nitro DPU.
Their server infrastructure isreally efficient because of
these products.
But in the video encoding worldthey haven't deployed anything
that enables them, right, whichis why Twitch still has such
stringent limits that make itunattractive to some content
creators and had some switch toYouTube, right?

(22:20):
Is that?
Well, they delete your videosafter a certain amount of time.
Right, amazon can't afford tostore them because they're not
encoded to a high quality at asmall file size.
Right, you can't stream at avery high resolution because,
again, they can't afford toencode it in real time at a high

(22:40):
quality and low cost, whereasYouTube can't.
And so, you know, google hasbeen very successful in the
market, which has kept them asthe leader in most video content
, even today.
Right, they're gaining someshare versus TikTok.
Tiktok is actually, if you lookat the growth over the last,
you know, six months, they'vebeen effectively flat in terms

(23:03):
of watch time, um, whereas,whereas YouTube shorts, you know
, it's still smaller, but it'sgrowing significantly.
And that's you know why?
Because, yes, youtube has a lotof users, but YouTube is also
paying their content creators onshorts and they're paying them
a significant amount, more thanTikTok is.

Speaker 1 (23:19):
Why?
Because because Google has amore efficient infrastructure
right.

Speaker 2 (23:22):
And it all comes down to, you know, these custom
built chips that Google'sdeveloping right, their
infrastructure is just moreefficient, enabling them to, you
know, do more with less, and soyou know, that's, that's, it's,
it's, it's, it's a strategythat's worked really well for
them.
That maybe is is, you know,others, others want to emulate,

(23:46):
but they haven't been able to toemulate it yet.

Speaker 1 (23:50):
It's very interesting .
So what you're saying is thatthe, the efficiency of the,
their infrastructure, alsoenables their to have a better
business model on the upperlayer.

Speaker 2 (24:01):
It feeds through, which a lot of people don't
realize is that it always feedsthrough.
The business will always dependon the infrastructure below it
and you might not realize it,but this is why TikTok has
almost no monetization for theircustomers, because they have to
capture it all and it'sbelieved that TikTok is not even

(24:23):
making much money at all.
You know, despite YouTube beinga very profitable enterprise,
right, you know.
And even YouTube shorts and andmetas.
You know saying yeah, they'regoing to, they've said on
earnings calls.
You know short their reels,their Instagram reels and
Facebook reels, which is theirshort form.
Video also doesn't make moneyyet, which is which is a

(24:45):
significant deal.
Right, because Facebook.
You know they're working on anASIC, on a, on a, on a.
You know a specific videoencoding ASIC, but they don't
have it yet today, and so youknow, between, you know being
able to monetize there, but alsoyou know just the cost of each
video that's uploaded andserving it.

(25:05):
They're both at inefficiencies,and so you know Meta
specifically said that they hopeto be able to be profitable on
Reels next year, but you knowthey're not today and you know
that, coincidentally, lines upwith them saying they're, you
know they're not today.
Um, and, and you know that,coincidentally lines up with
them saying that they're,they're, they're, you know, with
, with rumors about their ASICbeing ready, you know, later

(25:26):
this year.
Um, you know, you know, maybeyou know, and that's if the ASIC
works properly on their firstshot.
You know there's, there's a lotof chance that their ASICs
don't work, um, on the firstshot, because there's always
problems with the semiconductorindustry.
So one could correlate the factthat their ASICs should be
ready later this year and themsaying they'll be profitable

(25:47):
next year on reels, as thatbeing direct evidence that their
platform is gated and theirprofitability is gated by their
lack of semiconductor expertisewith their own in-house solution
.

Speaker 1 (26:03):
So I cannot share the customer name yet, but I can
say the thing is going to be achange for this year.
So one of the biggest the shortvideo or social media company
also adopting our solutions andthey are already seeing they
have cut 80% of their operationcosts because of this.

(26:23):
So things will change.
And yeah, then, talking aboutthe big companies, they try to,
if you talk about that, thepotential they can have more to
have ASIC to change theirinfrastructure.
Is that the same for Twitter?
Right?
Just really interested.
They didn't have anything forthat yet, but since Elon Musk is

(26:44):
talking about they want tobuild everything in one app and
talk about 4K live streaming orHDR on the Twitter as well.
Do you think that makes sensefor them to have that ASIC
solution too?
Right, it seems obviously right.

Speaker 2 (27:04):
This is an interesting one, right Like.
So, if you look back in thehistory of short form video,
right, you know it wasn't TikTokthat made it popular, it was
Vine.
And then Twitter bought Vineall those years ago and then
they shut it down.
Why did they shut it down?
Because, infrastructure wise,it's just not profitable to
serve video.
And now, you know, elon Muskbought Twitter and he's, you

(27:25):
know, floated the idea thatthey're going to bring back
money and in fact, they've beentesting, you know this, this tab
with short form video sometimes.
So, you know, the question is,you know, what are they going to
do for?
For, you know, for hardwareinfrastructure, right, like,
they use a lot of on-premisesinfrastructure currently for

(27:48):
Twitter, but the problem is, youknow, that's adapted for
serving text and you know, doingthat as efficiently as possible
.
How are they going to move tovideo, which is a you know
orders of magnitude more video?
Uh, you know volumes of data,right, orders of magnitude more
volumes.
So how are they going to, howare they going to solve that
solution?
And it's you know.

(28:09):
Well, you know they just boughtthe company.
And you know, silicon timelinesdevelopment timelines take
years, um, they take multipleyears to to come to fruition.
And and it ends up, you know,even if they wanted to launch,
you know, a short form videocontent, they may not be able to
do it, you know, at anyreasonable cost until you know,

(28:29):
three or four years from now, ifthey develop their own in-house
solution.
So, so there needs to be asolution in the public market
for them.
Right, and and and, furthermore, right, Like, if you think
about it, right, you know, justjust just, meta and TikTok,
bytedance, and you know, and youknow every other company
serving a lot of short formvideo.

(28:50):
You know Amazon, or long formvideo.
Right, you know these threecompanies.
You know there's there's a fewmore in China as well.
Like, like Tencent and so on.
Right, like, these companiesall are serving tons of video.
And then you add Google as well, and you know, that's, that's
five companies that are alreadyserving tons of video today.
If all five of them developtheir own ASIC solution, that is

(29:13):
, that is, hundreds of millionsof dollars.
You know each, you know of.
You know at least, at least$100 billion of non-recurring
engineering expense at eachcompany, right, so that's $500
billion, right, poof, you know.
So this is why you knowsemiconductor industry is
important, right, like peoplealways talk about hype, you know

(29:35):
everybody's going to make theirown silicon.
Yes, they're going to maketheir own silicon where they can
, where they have the volume tosupport it.
But what if you don't have thevolume on day one?
Or you know, you look acrossthe industry, there's five
players.
That's five hundred milliondollars.
Ok, let's divide that acrosshow many units you need?
Ok, maybe you only need, youknow, one hundred million, one
hundred million dollars.
Maybe you only need a millionunits, right, you know, that's

(29:58):
OK, that's one hundred dollarsper unit that you're spending on
non-recurring engineering.
And that's not even talkingabout the cost of the chip, the
cost of the memory, the cost ofimplementing, the cost of
software.
It becomes.
It becomes too much for eachindividual company to to build.
So so this is why you need amerchant silicon solution, right
, that that can say, hey, we'll,we'll do that development once

(30:24):
and we'll actually we'll developit better because we know your
needs and we know your needs andwe know your needs, right,
we're uniquely suited to eachcompany's needs because we we
communicate with all them.
So we're more flexible.
We have a more robust software.
Uh, that's more flexible.
It can.
It can take more forms of video.
Uh, maybe today, meta's goal orTwitter's goal or Tencent's

(30:44):
goal is long form video that'sshot only horizontally at
certain resolutions.
But what if, all of a sudden,they want to do vertical video
that's at a different resolution?
Or maybe they want to add afeature that they didn't have
before.
Well, now they need to go backto the beginning of the silicon
timeline and implement it andwait three years and then have
the silicon come out, and nowthey can do that efficiently.
Otherwise they'll do it in avery expensive way.

(31:06):
So this is where merchantsilicon that's more flexible
kind of comes in.
They can take this $500 millionand bring it down to hey, we're
the only ones spending it, andnow we can sell a million units
to you, to you, to you and you.

Speaker 1 (31:22):
And that all of a sudden to you, to you, to you
and you, and that all of asudden makes more economic sense
.
That's a very good point.
So, every company they face,they have the needs, right, and
we already discussed that butthey always need to make the
decision, make or buy, right?
So right now, it seems thatseems a good solution, right?
Good candidate for that, butlet's figure out.
I think it's a good solution,right?
Good candidate for that, butlet's figure out.
Yeah, yeah.

(31:42):
So we talk about a lot of whathave existing here and what,
dylan.
So you have seen so manycompanies, so many technology
solutions, right?
So from your perspective,what's the next customer silicon
data for the video processing?
From your perspective?
What's the next customersilicon for the video processing

(32:03):
?
From your perspective?

Speaker 2 (32:04):
Of course, you know this is a very loaded question
because, of course, there's onlyone company that's making
merchant silicon for video ASICs, and that's NetApp, of course.
But you know, as far as thecustom silicon market, right,
like, ok, fine, I'm uploadingvideo, but but you know, and I'm
encoding it into a new format,but that's like one you know one
use case, uh, but it turns out,you know, if I'm on youtube, um

(32:27):
, you know, it's great because Ican, I can look at the captions
.
So how am I generating thesecaptions?
Um, it's great, because, youknow, I don't just let let them
people upload videos of, youknow war, or you know, uh, you
know other other bad items,right, you know, like, like
things that you wouldn't want toshow kids.
You don't just let peopleupload that stuff, right, they

(32:48):
prevent that.
And so how do they do that?
Right, a lot of this is AIalgorithms, right?
You know how do you docaptioning.
Well, you run a, you run a, yourun a voice to.
You know you run a model that aa voice to you know.
You know you run a model thatthat can convert voice to text,
um, and then you'll run, maybe,another model that converts that
text and adds the correctpunctuation and capitalization

(33:09):
and so on and so forth.
Um, and then you'll also, youknow, to make sure it's safe,
you know, for for everyone, ormake sure people aren't
uploading illegal content andsharing it on your website,
because now you're illegally atfault.
You know, if they do that, you,you have to scan every single
video.
Well, of course, there's somuch video no one can look at it
.
So again, you're utilizing ai,uh, you're, and you're doing

(33:32):
detection, like, hey, are thereguns shooting in this video?
Um, you know, are people dying?
Is there a lot of blood here?
Right, you know, you know, arethere?
You know illegal acts happening?
Are there drugs?
You know all these sorts ofthings.
Ok, if those things arehappening, we'll review the
video further and maybe we cando a quick pass, you know.
But we can do that, you know,after, later.
But you know, every singlevideo needs to be scanned, it

(34:03):
needs to be captioned, it needsto be hey, what's it?
What content is in the video?
You know, sure, they put searchtitle up.
You know, they put a title, butthere's a lot of other content
right.
Well, what if someone says youknow, they want, you know,
whatever the video you knowthey're looking for videos of I
don't know?
You know, whatever it is,they're looking for a video of
it.
Maybe it's a machine, maybeit's a, maybe it's a tutorial on

(34:27):
math, but the title doesn'thave that word.
But it still shows up in yoursearch.
Why?
Because when the video isuploaded, they're actually
running an algorithm that pullsout metadata.
It sees oh, what are the maintopics of the video?
Oh, this is about metadata.
It sees oh, what are the maintopics of the video?
Oh, this is about, you know, acar, and it's not just about a

(34:47):
car, it's about Toyota, and it'sabout the fuel economy and it's
about the reliability of thecar, and blah, blah, blah.
So now, when I search reliablecars, you know the video that
says, you know, the review ofToyota Camry now shows up.
right, that's the beauty of youknow, YouTube and some of these
other video platforms, you know,making content discoverable, or
hey, this content, you know,pulls out all this metadata.

(35:07):
You know you were watching thisvideo, by the way, you know why
don't we suggest this video toyou?
Well, how am I generating thatmetadata about every single
video?
So these are operations I'mdoing on every video that I'm
also doing besides videoencoding.
And do I encode the video andthen run them on a CPU, or
encode the video and then runthem on a GPU?

(35:28):
Do I have to, you know, do Ihave to do multiple passes?
It's very costly, right?
Especially when I think aboutmemory and networking costs.
You know reading and writingmultiple times.
Reading and writing multipletimes, the innovations that
Google's adding in their nextgeneration Argos and, I'm sure,
some of these other customsilicon projects from Meta and

(35:48):
TikTok, ByteDance sorry, theseinnovations.
They're adding some AIprocessing on the chip.
They're adding some generalpurpose CPUs, just a small
amount, so you can do some ofthese operations at the same
time as you're encoding thevideo, and so you're not reading
and writing data multiple times.
You're not wasting money onnetworking.
You know all of these costs arebeing saved because you're

(36:11):
putting them on the videoencoding ASIC.
So now it's not just a videoencoding ASIC, really, it's a
video processing ASIC.
And video processing involves alot more than just encoding
it's.
It involves that that detectionof, of illegal content involves
.
You know what content can Iadvertise with this video?
Hey, what, what, who, what are,what are some related topics

(36:33):
that might work with this video?
What's in the video that theydidn't say is in the title?
You know, all of these sorts ofthings captions all have to
also be processed, and that'swhat the next generation of
video processing and encodingASICs will do.

Speaker 1 (36:48):
Yeah, so that's an interesting point as well.
So when we have the firstgeneration product, we call it
video transcoders.
Then we have the secondgeneration Codra products.
We rebrand it as the BPU VideoProcessing Unit.
It's really to answer a lot ofthe questions or the interesting
points you raised that you didon that conversation.

(37:09):
So we have many featuresalready checked.
But yeah, it's for the customerto find and how to use that.
But there are so many thingsthat we can do for the video
part right Like how to identifythe contents, use the contents
or even to work more generallywith the AI part right.

(37:32):
There's a so broad range of thethings we can do now.

Speaker 2 (37:36):
What was the feedback from customers?
You know when you're firstgeneration and you know as
you're designing a secondgeneration.
What was the feedback fromcustomers when you're first
generation and as you'redesigning the second generation?
What was the feedback that madeyou decide, hey, we need to add
all these features.
That's obvious today, but thisis a couple of years ago at
least, where you had this inputand decided, or a few years ago.
So what was the feedback yougot from customers that made you
drive towards that decision?

Speaker 1 (37:58):
When they have the first generation.
In fact it's a half solution wehave the limited or not
Compared to CPU is quiteefficient, but compared to
what's ideal that the VPU is,still have some room to improve,
right.
So the customer feedbacknormally on certain areas one is

(38:19):
they want even higher density,higher performance and also they
want new codec.
So we added the AV1.
And to answer the performance,we increased the peak
performance from 4K60 to 8K60and added the new codec.
And also they want more scalingfeature because they have, like

(38:43):
the ABR lighter.
They need one resolution in anda split where the scale down to
different resolutions rightthat we add very powerful
scalers.
And also they want tounderstand the content in the
video.
Just like you said, they want toknow what happens in video and

(39:03):
to fully utilize the value ofthe video and also to prevent
some of the the quadra.
So, and also there are someother parts we are not fully
utilizing yet, like the audioprocessing as well, and also we

(39:25):
have relatively powerful 2Dengine that can and the DSP can
do a lot of programming as wellto have more flexibility on the
fly to add new features andserve for the new requirements.
That's a feedback we got fromfirst generation and put to the

(39:45):
second generation.
In fact, we are also in thefeasibility phase or third
generation.
We are also open to ideas andsuggestions.
That's why I'm also asking youis there any something that your
customer are expecting to seein the future?
So we can continue to improvethat?

(40:07):
And video is a really focusarea.

Speaker 2 (40:11):
That's super interesting.
So you know your firstgeneration solution, yes, it
encoded video, but it was very,you know, stringent in what it
could encode, right, you knowonly certain target resolutions,
which was fine for you know, atthat stage.
But as you went forward to thesecond generation, customers
demanded, you know, yes, this isan ASIC, but give us
flexibility.

(40:31):
And you know, furthermore, youneeded to add some of these
functions.
Like, you know a little bit ofAI processing so you could
caption a video or you coulddetect if it's illegal content,
or you know a little bit of AIprocessing so you could caption
a video or you could detect ifit's illegal content, or you
know these sorts of things.
And so you added that with yoursecond generation and you're
improving that in your third,you're making the chip you know
the chip much more flexible,right, you know you can just one

(40:52):
video.
You can put it in multipleformats, you can ingest it in
almost any format and put it outin almost any format.
You know you added support forAV1, which is, you know, still
not deployed heavily yet, butit's going to be deployed
heavily, right, I mean,everyone's adopting it.
You know Netflix has saidthey're going to adopt it.
You know, I believe YouTube hassaid they're going to adopt it.

(41:16):
You know there's a lot of firmsthat have said they're going to
adopt AV1.
There's AV1 support in a lot ofdevices, right.
Every Intel, every new IntelCPU, every new AMD CPU, my new.
I just got a new Qualcomm phone.
You know they all have supportfor AV1 decode right, but
encoding is very much.

(41:37):
You know, even where there isencoding support on the newest,
you know, nvidia GPUs or IntelGPUs, it is very limited in
terms of what level it cansupport and how much throughput
it is, because this is such anintensive operation and you know
the main purpose of that chipis not encoding.
So you know you've addedsupport for that as well and

(41:59):
you've raised the throughput andresolutions and flexibility.
Can you talk about, like, thesoftware implementation of it,
of this, right?
You know what are some painpoints, because I've heard a lot
of pain points in the industryabout, you know implementing
ASICs into a process.
You know implementing ASICsinto your workflows, into a

(42:19):
distributed system where I'mgetting video from everywhere
and I'm exporting videoeverywhere, right?
Can?

Speaker 1 (42:25):
you talk a bit about that.
In fact there are differentscales or different layers of
the issue, talking about asoftware solution, a total
solution for the video orgenerally for the ASIC solutions
.
So the first and most peopledidn't notice is how it can work
with the host system ordifferent operating system,

(42:47):
different kernels.
It's a really painful processfor ASIC or not only ASIC, for
any hardware to work withdifferent operating system.
They keep upgrading right thekernel always has different
versions.
When you spend hundreds ofmillions to develop a driver and

(43:10):
there's an upgrade with adifferent version, you need to
redo that again.
That's why we start from thebeginning.
We are designing a totally newapproach.
We call it the computationalstorage structure.
So or VPU is on top of theexisting NVMe driver.

(43:34):
So whatever the operationsystem or kernels, as long as
they support NVMe SSD they cansupport us.
And to adapting to operationsystem or kernel version from to
a new one.
To us it's only a few days workand mainly it's for the testing

(43:57):
.
We want to make sure everythingis right for that and we have
several forms of a few hundredservers.
Try to run the 24 by 7 to test,to make sure it's mature for
that system.
But for us it's really easy todo and we can even plug and play

(44:20):
to the system.
They can recognize the cardsand start to use it using right.
That's the very fundamentallayer that is different compared
to all the other solutions,which is much, much advanced.

Speaker 2 (44:33):
Everyone uses NVMe server SSDs, right?
Nvme being the protocol for,you know, solid state drives.
That's pretty much the dominantone, right, you know, for the
last, at least you know, five to10 years, right, it's been the
dominant protocol for.

Speaker 1 (44:49):
SSDs.

Speaker 2 (44:49):
And so you know you're sort of piggybacking off
of that infrastructure, right,saying hey, you know our ASIC,
we're just an NVMe device, right, you read, you write to us and
then you read from us, and itjust so happens when you write
to us you send us the unencodedvideo or video that's poorly
encoded, and we output theproperly encoded video, um, and

(45:10):
so you know every, almost every.
You know you know every x86 andeven arm cpu supports nvme,
right, um, you know my, mylaptop has nvme.
Uh, you know, actually, iphonespeople don't know this.
The iphone nand is actuallycommunicated with the SoC over
NVMe.
Of course you're not going toget into an iPhone, but NVMe is

(45:32):
everywhere.
So it's an industry standardthat you support and it's in
every host system.
So that's really cool and itmakes the ease of use a lot
better.

Speaker 1 (45:47):
Yeah, so we support x86,.
We support ARM architectservers.
We also support IBM 10.9 seriesof CPUs as well.
We support Linux, windows, macand also Android systems.
So whatever you have, as longas you use the SSD the MMSSD you

(46:08):
can support it.
So that's easy.
And beyond the fundamentallayer, on the software layer, we
are working with the opensource system like the FM pack
and theStreamer and to fullyseamlessly integrate to that and

(46:31):
in fact most of the customersare using these two framework
for their reading flows.
So whatever they have built onsoftware using FM pack or
GStreamer or similar solutions,they can easily to work with us.
It's just to recompile the FMpack with our library, that's it

(46:51):
.
When you run the code you justchange the pointer from software
to us, then you finish, thenyou can keep your current
workflow as it is.
That's just so easy.
Of course, there are some ofthe challenges that we can see
with the FM package.

(47:11):
Fm package was designed for thesingle thread.
It's not for the very intensive,massive and parallel processing
.
So for the bigger guys like thetop cloud companies someone you
already mentioned they willskip the FM package layer or API
directly.
Then they can truly utilize thewhole potential of the hardware

(47:35):
.
But that's an advanced layer ofhow to use that.

Speaker 2 (47:39):
You know, ffmpeg is sort of the industry standard,
right Like if I wanted to encode, I use FFmpeg on my Windows,
right Like if I wanted to encode.
I use FFmpeg on my Windows,right Like if I wanted to encode
some video that I had.
So this is not only for, likeyou know, an application like
the cloud, right Like this couldalso, you know, proliferate
down to you know, some securityapplications where I have
hundreds of cameras or dozens ofcameras in a building.

(48:02):
You know, instead of having toencode them on you know so many
CPUs.
I could use this.
You know maybe one or just twonet in ASICs and I could encode
all of those videos at once andit would work on, you know, even
a Windows system, that's.
So you don't have to upgrade alot of your infrastructure.
You plug in an ASIC and youknow again, you know almost

(48:22):
every CPU.
Even, you know, going back, youknow even desktop CPUs.
For a decade supported NVMe.
So it's not a big difficulty toget this up and running.

Speaker 1 (48:33):
Yeah, exactly, that's also one of the use case that
our customers start to using.
Whatever they have the cameras,all the style, or the new ones,
they can all accumulate thevideo streams to the server with
or cards.
They can transcode that andalso analysis them in the same

(48:56):
time, right then then then theycan uh compress it and just
either start start that on localor stream out to the call data
center for further analysis orfor the first responders to
watch the video live.
That's a very typical use case.

Speaker 2 (49:15):
So your second generation has some AI
processing right, and the thirdgeneration has even more.
So, you know, as you mentioned,you know you could take the
video from the cameras, youcould encode it.
Or you could just run inferenceand say, hey, there's somebody
on the screen, or hey, thescreen is, you could encode it.
Or you could just run inferenceand say, hey, there's somebody
on the screen, or hey, thescreen is, you know, changing.
Okay, we'll store this video,but we won't store the rest of
it.
Or hey, there's somebody thatlooks suspicious on the screen,

(49:38):
we'll send this to.
You know, we'll alert theauthorities automatically.
We'll alert our you know, ourreviewer automatically, rather
than, you know, than having towait and saying, oh no, what was
stolen?
Oh well, we can look back atthe video.
No, no, we can preempt this.

Speaker 1 (49:55):
Yeah, and there are some, in fact some of the
features that people didn't havea chance to use or didn't
realize the value of that yet.
Like we talked about the mmeprotocol, right, but by
designing that way, we also canhave a box of our cars just

(50:17):
connect to the host through themme or fabric.
That is, a pool of the resultscan be shared by not only one
host but the whole data center.
They can access the VPU, accesslocal results.
By doing that they can furtherimprove the efficiency of their
resource utilization.
So that's a pretty big thing,I'd say, for the hyperscalers

(50:43):
and I think that will be veryvaluable to the customer,
especially if you have the videoencoding, decoding part.
There.
You also have AI part, alsoshared with the whole data
center scale.
That is just, I think, is ahidden jewelry that nobody

(51:03):
really has a chance to use yet.

Speaker 2 (51:06):
So the flexibility is there.
There's some you know, uniqueuse cases, especially on like
retail, and you knowmanufacturing.
You know where you're gonnarecord a lot of video and you
want to.
You know you want to takeactions based on what's in the
video.
So you know you look at smartcities and you know we talk a
lot about the content as aconsumer earlier in the show.

(51:26):
Right, you know we talk a lotabout the content as a consumer
earlier in the show, right, youknow what happens with YouTube,
what happens with.
You know Instagram reels andTikTok and all of these.
You know forms of Twitch, youknow video streaming and all
these sorts of things.
But you know we didn't talkabout like.
You know this has use casesbeyond that in the smart city
and in the manufacturing and inindustrial.

(51:47):
You know every traffic lightcould potentially in the future,
have cameras on it, right, andyou can, instead of encoding,
you know, and you can stream itall and have maybe a single
encoding chip serve multiple.
You know a whole store or awhole block.

Speaker 1 (52:04):
Yeah, and there are, in fact, some other applications
that the normal people may notnotice, like the virtual desktop
infrastructure.
Normal people not using theirown local machine.
They're using the servers a fewmiles away or hundreds of miles
away.
That's all.

(52:25):
It's video streams streamed toyour device.
You can work from hotel, fromhome or whatever place with the
highest performance of theserver on the data center.
What you need is just a mouseoccupied and also a bigger
display.
Okay, yeah, yeah, I think thetime is over.

(52:46):
So thank you, dylan, it wasgreat to have you here and also
hear your valuable insightsabout the industry.
Thank you very much.

Speaker 2 (52:57):
Thank you as well, alex.
I look forward to chatting withyou again.
This episode of Voices of Videois brought to you by NetInt
Technologies, if you are lookingfor cutting-edge video encoding
solutions.

Speaker 1 (53:11):
check out NetInt's products at netintcom.

All Episodes

Episode Transcript

Popular Podcasts

On Purpose with Jay Shetty

Crime Junkie

Ridiculous History

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}The Future of Video Encoding with ASIC Technology

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}On Purpose with Jay Shetty

Crime Junkie

Ridiculous History

All Episodes

The Future of Video Encoding with ASIC Technology

On Purpose with Jay Shetty