Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Happy Thursday, September eleventh, twenty twenty five. You're listening to
the Blue Lightning AI Daily podcast and yes, this episode
was made with Microsoft Vibe Voice seven B. I'm zan
and today we're digging into in Vidia's new Ruben CPX chip,
which is basically AI that can remember your whole movie.
Speaker 2 (00:18):
I'm Pepa and oh my gosh, this is spicy and
video is like what if your model doesn't forget the
middle of your sentence? Except the sentence is a forty
minute video with a musical number and three plot twists.
Speaker 1 (00:29):
The headline Ruben CPX is a purpose built inference GPU
tuned for massive context across video, audio, and even code.
Think models running minutes to hours without losing the thread.
In Vidia's positioning is million token our scale workflows. That's
straight from their launch materials and press briefings. Globe Newswire
(00:51):
had the press release on September ninth.
Speaker 2 (00:53):
The million tokens bit is wild. If you've ever chunked
a long podcast or series of clips and then tried
to stitch it all back, the vibe are never the same.
This chip is like no seams, We remember everything.
Speaker 1 (01:05):
What's actually new? Two big things one. It fuses high
throughput media engines with transformer inference on the same device,
so decode, incode, and reasoning are neighbors. On Silicon two,
it's tuned for long sequences with updated KUDA and Tensor
RT schedulers to keep latency predictable as context grows.
Speaker 2 (01:24):
Translation it doesn't just go booh with TOPS, it goes
smooth less. Oops, we dropped the character arc in minute
twenty seven specs.
Speaker 1 (01:31):
They shared up to thirty pf lops in in nvidia's
NVFP four precision, one hundred and twenty eight gigabytes of
GDDR seven per device and dedicated concurrent video decode incode blocks.
The goal is to keep timelines in memory from ingest
to render. Press release confirms that Tom's hardware also breaks
down the architecture shift.
Speaker 2 (01:50):
And here's the architecture t and Vidia's going split brain
CPX is the inference half in a disaggregated setup compute
optimize vias bandwidth optimized Silicon. Tom's hardware called that outscale
flops and memory independently, so you're not overbuying one to
get the other exactly.
Speaker 1 (02:06):
Then they wrap it into a rack. The VERA reubin
NVL one four CPX platform. Data Center Dynamics reported up
to eight exaFLOPS NVFP four per rack, roughly one hundred
terabytes of fast memory and about one point seven petabytes
per second of intra rack fabric bandwidth liquid cooled.
Speaker 2 (02:23):
Obviously petabytes per second sounds like the fast and furious
of fabrics family, but for frames.
Speaker 1 (02:30):
Hah. Now why creators should care long form coherence? If
you're doing generative video AI dubbing or translation that preserves
performance and pacing. CPX is the hardware that lets the
model keep tone and timing across the entire edit.
Speaker 3 (02:44):
So who's this really four today?
Speaker 2 (02:45):
It's data centers and the big platforms that power your
favorite tools, but the downstream is you YouTubers, streamers, podcasters.
This is what makes an hour long AI assist actually usable,
not stitched chaos.
Speaker 1 (02:57):
It also targets code copilots that under stand entire repos
million token inference means read the whole project, not just
a file that's in the launch messaging.
Speaker 2 (03:08):
Use cases time For a TikToker, you could feed a
week's worth of clips and have the model keep your
brand voice across the whole montage for a podcaster end
to end dubbing into Spanish that keeps your cadence, plus
SFX timing. For a filmmaker on set, continuity checks from
a model that remembers wardrobe props and blocking across scenes
that last one dreamy.
Speaker 1 (03:28):
Does it speed up editing or just improve output quality?
Both The integrated media engines keep frames flowing, while the
transformer reasons less shuffling between chips, less io thrash. That's
tangible latency you feel as an editor.
Speaker 3 (03:42):
Hmmm, could this replace something?
Speaker 2 (03:44):
If you're a platform, maybe you ditch separate video trans
codeboxes for some workflows. For a solo creator, it replaces headaches, fewer,
weird seams, fewer Why did the AI forget my intro gag?
Speaker 1 (03:54):
Must have or nice to have for short clips, nice
to have for long form video dubbing and multi episode arcs.
It's edging into must have territory because coherence sells. If
the AI ruins continuity, the audience bounces.
Speaker 3 (04:09):
And the money angle. Ruter said.
Speaker 2 (04:11):
Nvidia pitched a Wild ROI, a mother hundred million dollar
Reuben class deployment underpinning up to five billion dollars in
token metered revenue over its life cycle. That's them saying
long sessions are where the money is.
Speaker 1 (04:22):
It's an aggressive projection, but the logic is sound. As
context windows stretch session value goes up, more minutes, more tokens,
more revenue. Again, that's ruter's.
Speaker 3 (04:31):
Reporting availability check. Can we use it right now?
Speaker 2 (04:35):
No early access with select partners, Broader availability near the
end of twenty twenty six. Pricing not disclosed. That's per
the press release and data center dynamics, So.
Speaker 1 (04:45):
Expect hyperscalers first. Then your favorite creative tools, get Kuda
and tensor RT updates to expose the long sequence superpowers
roll out probably mirrors how Blackwell features trickled into mainstream
tools quietly, then suddenly.
Speaker 2 (04:57):
Everywhere desktop or mobile, this this is data center land.
You'll feel it through cloud tools. Your editor's keep style
across film toggle stops.
Speaker 1 (05:05):
Lying competitive field, AMD's MI series custom TPUs from the
hyperscalers and specialized inference hardware like groc all smell opportunity
in long context. But in Vidia's advantage is the full
stack COUDA, Tensor RT and now Silicon that treats media
io as a first class citizen. Tom's Hardware called out
that media integration as core to cpx's positioning.
Speaker 2 (05:26):
If I had to mean this update when your AI
finally remembers Season one caption over a corkboard with red string,
except CPX just turns it into a neat notion doc.
Speaker 1 (05:36):
Risks and limitations One, it's inference only. This isn't your
training monster. Two. Software support is the gating factor. If
tools don't expose million token sessions or inline video pipelines,
creators won't feel the magic. Nvidia even hinted the software
story is the open question.
Speaker 2 (05:51):
Also, guardrails long context can be awesome, but platforms will
still cap tokens, watermark or limit export quality based on plan.
The chip doesn't decide the watermark.
Speaker 1 (06:00):
You know the drill token caps. Yes, in Nvidia's marketing
says optimized for million token, but whether you get that
depends on your provider's limits and cost model. Longer sessions
burn more.
Speaker 2 (06:10):
Tokens workflow impact For a solo creator, how much time
could it save if your project is forty five minutes
and you currently chunk into nine pieces wrangle prompts per chunk,
then reconcile style drift, I'd say hours back and less
mental load.
Speaker 1 (06:25):
For studios doing localization, the benefit is bigger one pass
dubbing that preserves timing, emotion, and lip rhythm across the
full timeline. The on die encoder's decoders reduce the ingest
rab ender ping pong, which is a real tax today.
Speaker 2 (06:38):
Could we imagine using it in our own projects? One
hundred percent. I want to show Bible in context while
we generate b roll, lower thirds, titles, and a coherent
cold open that calls back to the ending. Let the
model keep the joke alive for the whole episode.
Speaker 1 (06:52):
Trend Wise, this fits the larger shift from pure compute
races to memory and iocentric design. Data Center Dynamics and
Tom's Hardware both highlighted that disaggregated approach independent scaling of
compute and bandwidth, and racks that treat memory as a
first class resource.
Speaker 2 (07:07):
And it lines up with what we've been saying on
the blog for months long. Context is the new battleground,
not just speed but story, not just frames but flow.
Speaker 1 (07:16):
Benchmarks none published yet Beyond Nvidia's own positioning, we'll watch
for third party latency and cost per hour context numbers
once partners go public.
Speaker 2 (07:24):
Price and value no numbers, but we can read between
the lines. This is premium data center gear for creators.
The value shows up as the tool you already use.
Got a lot Smarter, not a new line item on
your credit card unless you're paying per minute for AI features.
Speaker 1 (07:39):
Ecosystem check closed or open. It's in Nvidia's stack, Kuda
and Tensor RT, so very Nvidia. That said ISVs will
build on top and you'll access it through your nl
DAW or cloud pipeline.
Speaker 2 (07:51):
Okay, rapid fire scenarios. A TikToker use it for a
weekly narrative recap where the AI keeps your editing cadence
consistent across thirty minutes of cuts. A podcaster instant multilingual
dubs that preserve your comic timing, not just words. A
graphic designer story consistent captions and motion graphics that match
episode themes. A developer repo wide assistant that understands your
(08:13):
whole codebase, not just the file you opened.
Speaker 1 (08:16):
Filmmaker on set feed dailies into a continuity checker that
flags mismatch props and pacing shifts, or do live tone
matching for ADR so it fits the prior scenes.
Speaker 2 (08:25):
Will this threaten anyone, honestly, any startup whose secret sauce
is stitching long context with clever chunking. If CPX plus
the new schedulers make long sequences cheap and clean, that
mote gets shallow competitively.
Speaker 1 (08:37):
It puts pressure on AMD and the TPU crowd to
match media integrated inference at scale. If Nvidia's Media plus
transformer pairing reduces latency and costs, that's a compelling path
for creative AI infrastructure.
Speaker 3 (08:49):
What's the catch for creators waiting?
Speaker 2 (08:51):
We're in early access land till late twenty twenty six,
but we've seen this pattern. DevKits roll out, Kuda updates drop,
and suddenly your editor has a beta toggle called content
lock or something.
Speaker 1 (09:01):
Another catch. Providers might meet long context hard expect features
like keep style across film to be on higher tiers.
The economics are too juicy to just give away. Reuter's
basically spelled out that longer sessions are the revenue engine.
Speaker 2 (09:13):
If the vibe had a slogan, hold the whole story,
that's the promise and meme number two no more amnesia edits.
Speaker 1 (09:22):
What we'll be watching one which partners get early access
streaming platforms big NLS two, whether Kudah tensor RT long
sequence schedulers ship with sane defaults so devs don't need
PhDs in attention maps three real world latency at our scale.
Speaker 2 (09:37):
Also curious who ships the first showpiece demo feature length
AI doc edited with context continuity on the moment we
see that game on quick.
Speaker 1 (09:45):
Source roll call for the curious product reveal. Inspecs were
in Nvidia's Globe Newswire press release on September ninth. Disaggregated
architecture and rack details were covered by Tom's Hardware Data Center.
Dynamics reported the NVL one for four CPX RAC numbers
eight exaFLOPS, NVFP four one hundred tear byte memory one
point seven pbdis fabric and Reuter's had the economics claim
(10:07):
a one hundred million dollar deployment could drive up to
five billion dollars in token metered revenue.
Speaker 2 (10:13):
Yep, and we'll keep tracking how fast the software side
lights up. The second your favorite editor exposes our scale
style lock, you'll hear us screaming about it.
Speaker 1 (10:21):
Final take, small, tweak or game changer for long form.
It's a game changer if the software catches up for
short clips It's a quiet quality.
Speaker 2 (10:29):
Bump speed quality, creative control. This round leans coherence. Keep
the story intact, then make it fast. I'm here for
it all right.
Speaker 1 (10:37):
That's our breakdown of Nvidia RUBENCPX and what it means
for creators. Thanks for hanging with us on the Blue
Lightning AI Daily podcast.
Speaker 2 (10:46):
Go hit blue Lightning tv dot com for news updates
and video tutorials on your favorite AI tools.
Speaker 3 (10:51):
We've got you covered.
Speaker 1 (10:53):
Appreciate you listening, Catch you later.