Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Happy Saturday, and welcome to Blue Lightning AI Daily. September sixth,
twenty twenty five. I'm Zan and today's episode was made
with Microsoft Vibe Voice seven B. We're digging into Ali
Baba's new Quin three Max preview instruct a trillion plus
parameter LLM that just dropped into preview Pippa big model energy.
Speaker 2 (00:19):
Today, big big Hey, y'all, it's pippa trillion with a
T that's like Black Mirror Lab budget numbers. But for real,
does scaling up actually help creators.
Speaker 1 (00:28):
Or is this just a flex good question? Ali Baba's
pitch is scaling works. They're saying this preview is stronger
at instruction following long multi turn chats and agentic stuff
like planning and tool use. And it's live now in
quinn Chat and via the Ali Baba Cloud API that's
straight from Ali Baba's docks and echoed inventure beats right up.
Speaker 2 (00:48):
So what's actually new new? Because Quen three already had
that two thirty five B MEOI model right right.
Speaker 1 (00:54):
The earlier Quen three to two thirty five B had
around twenty two B active parameters per token. Because it's
a mixture of experts to design. This new quin three
Max preview sits above it in the lineup and claims
better instruction adherents, more stable long context conversations, and more
reliable tool calling. Ali Baba says it beats their previous
best in internal benchmarks. Caveat those are Ali Baba's numbers.
(01:15):
No third party testing yet. Venture Beat and the Ali
Baba Docks both stress its preview.
Speaker 2 (01:20):
Okay, so vibe check. Is this a tiny tweak or
a game changer for.
Speaker 1 (01:24):
Pure scale, it's a big swing in practice for creators.
The wins are about reliability. If the model follows instructions
closely and doesn't derail in a sixty message thread, you
do fewer retries. That's a workflow game changer. If you're
chaining steps like research to script to shot list to
social edits without babysitting facts.
Speaker 2 (01:42):
Fewer a grou read the prompt again, moments and the
agentic angle. If it can plan steps and call tools,
that's like having a junior producer who actually remembers the
brief exactly.
Speaker 1 (01:53):
The preview emphasizes better planning and sequencing, and it's positioned
right in the twenty twenty five arms race around reasoning
and agent's. VentureBeat also notes a huge context window about
two hundred and sixty two K tokens and support for
context caching, which helps with those long multi asset projects.
Speaker 2 (02:09):
Hold up two one hundred and sixty two K. That's
like stuffing an entire brand playbook, a season's worth of scripts,
and your messy Google docs into one session wild within limits.
Speaker 1 (02:20):
Yes, venture Beat outlines max input around two hundred and
fifty eight K and max output roughly thirty two K
tokens in plain English, long memory, big dumps of reference material,
and long replies if you need them.
Speaker 2 (02:31):
Who's this really for YouTubers, grinding, daily podcasters, agencies.
Speaker 1 (02:36):
I'd split it like this Daily YouTubers big win on
faster ideation to scripts to thumbnails, especially if you maintain
tone across episodes. Podcasters strong for research briefs, outlines, show
notes and making clip ready timestamps if you pair it
with a tool that reads transcripts, Photo video teams, pre
production planning, shot lists, b roll suggestions, metadata tagging, then
(02:57):
hand off to your editor or your aigenstack agencies and
brand teams, long running voice consistency and multi channel orchestration, and.
Speaker 2 (03:04):
Hobbyists or is this pro studio only bring a budget.
Speaker 1 (03:08):
You can try it in quinn chat right now, so
hobbyists can poke at it. But the API pricing matters
if you're scaling. Venture Beat reports tiered pricing per million
tokens starting around eight sixty one sets for input and
three four to four one sets for output in the
zero to thirty two K tier, and climbing as you
go deeper into the long context tiers. Bigger prompts cost
more for big campaigns.
Speaker 2 (03:28):
That adds up compared to the field. How spicy is
that price.
Speaker 1 (03:32):
Venture Beats pricing roundup has open AI's GPT four point
one around two dollars per million input and eight dollars
per million output, So Ali Baba undercuts on the small
to moderate input tier, especially if you don't push the
full two hundred and sixty two K context. There are
cheaper options out there, like Writer's Palmyra x five or
even Minimax in some contexts, but capability and reliability on
(03:53):
complex chains can justify paying more.
Speaker 2 (03:55):
M So not a bargain bin, but not luxury luxury
either kind of aggressively competitive for the size that's fair.
Speaker 1 (04:03):
Also, Alibaba's ecosystem angle matters model Studio, quenchat, and per
the Nvidia Developer blog, deployment plays nicely with frameworks like Tensor,
rt lm VLM SG LANG. If you're building production agents,
infra support is a plus.
Speaker 2 (04:16):
Okay, let's talk vibe in the edit bay. Does this
actually speed up the grind or just make outputs prettier?
Speaker 1 (04:22):
The speed gain is indirect, Fewer prompt loops and steadier
long threads. If you typically do five retries to lock
tone and structure and you drop that to one or two,
you save real time For a solo creator that might
be thirty to sixty minutes per video depending on the
task chain and.
Speaker 2 (04:36):
With agentic workflows, imagine it pulling research, drafting a script,
making a shot list, then spitting out TikTok hooks and
YouTube descriptions. If the tool calling is reliable, that's a
must have, not a nice to have.
Speaker 1 (04:48):
Agreed, reliability is the key word. Ali Baba is saying.
Preview performance is better, but we need independent benchmarks to
confirm they're coming, presumably until then treated as promising but
unverified outside ali Baba's tests comparisons?
Speaker 2 (05:02):
How does this stack up to the earlier Quen three
two thirty.
Speaker 1 (05:04):
Five B on Alibaba's internal charts. Max Preview outperforms two
thirty five B on instruction following multi turn stability and
agent reliability. The message is same, DNA scaled up behaves better. Also,
the two thirty five B MEOE is already efficient per token.
Max Preview is about peak capability.
Speaker 2 (05:22):
And against the big three Open Ai, Google Anthropic, where's
the heat?
Speaker 1 (05:27):
The twenty twenty five meta is long context plus agents
plus reasoning. Ali Baba's pushing scale and context and claiming
fast response per venture beat. If third party tests bear
it out, it puts pressure on everyone, especially for multilingual
multi urn brand work. But we need those external benchmarks
before we declare a leaderboard shake up.
Speaker 2 (05:44):
Pop culture take this is like Instagram dropping stories suddenly.
Everyone needs longer context and reliable agents. Baked in meme
caption me handing my brand guide in six months of
content to one chat don't fumble the bag.
Speaker 1 (05:57):
A ha, and I'll add This could signal where content
ops are heading less micromanaging steps more orchestrating end to
end flows risks.
Speaker 2 (06:05):
What's the watch out.
Speaker 1 (06:07):
A few It's preview performance could change stability under heavy
real world workflows isn't proven yet cost balloons with very
large contexts. If you stuff two hundred K tokens every time,
you'll feel it. Tool calling reliability needs validation outside demos,
and as always, check for safety guard rails that might
constrain edgy creative briefs. Ali Baba says more tuning is
coming at general release.
Speaker 2 (06:28):
Token cap specifics again.
Speaker 1 (06:30):
Venture Beat sites a two hundred and sixty two K
context window with roughly two hundred and fifty eight K
input and thirty two K output max. That's the working
envelope access.
Speaker 2 (06:39):
Can people try it today or are we doing the
weightless dance?
Speaker 1 (06:43):
You can use it now in quen chat for hands
on for production. Ali Baba Cloud's Model Studio has the
API documentation and model IDs are in the ali Baba
Cloud docks. Beginner friendly Chat is totally beginner friendly API
is standard open AI style patterns via Model Studio and
dash Scope SDQ per Alibaba's docks. If you've integrated a
modern LLM before, you'll.
Speaker 2 (07:03):
Be fine fine. Real creator scenarios hit me TikToker, give
me five hook angles, three b roll beats per hook
and captions with emojis aligned to this brand voice doc,
then chain it to a scheduler. Podcaster read this sixty
page prep doc, build my episode, outline section, music cues,
and write clip titles with timestamps. After I upload a transcript,
(07:26):
filmmaker compare this treatment to our lookbook and output a
location specific shot list plus continuity notes for day two.
Speaker 1 (07:33):
Exactly and brand teams can run week over week campaigns
in the same thread, keeping tone consistent across emails, socials, shorts,
and site copy. That's where long context plus steadier instruction
following pays off.
Speaker 2 (07:45):
You mentioned Infra, and Vidia posted about deploying Quen three
with Tensor, rt LM and friends. That's enterprise folks love language,
speed and throughput.
Speaker 1 (07:55):
Yep in Nvidia's developer blog highlights that path for creators
using cloud tools. It mostly means lower latency and more
reliable scaling behind the scenes if your platform adopts it.
Speaker 2 (08:04):
Are there early benchmarks outside ali Baba not.
Speaker 1 (08:07):
Yet that we've seen. Venturebeak calls out ali Baba's internal
results and the preview status. We'll watch for independent evals
across reasoning, instruction, following agent tasks and long context usage.
Speaker 2 (08:18):
Trend watch. Do we think trillion scale is the future
or are we going to get smarter not just bigger.
Speaker 1 (08:24):
Both The Quen three technical reporting points to hybrid reasoning strategies,
and the field is experimenting with MOE scratch pads and
routing scale buys reliability, but algorithmic efficiency keeps cost sane.
Expect a combo large smarter models with better planning and
memory for value.
Speaker 2 (08:40):
If I'm running a small studio, is this a must? Try?
Speaker 1 (08:43):
Yes? Try it in chat If your workflows depend on
long brief adherents or you constantly chain tasks, pilot the
API on one project, measure retries, time saved, and tone consistency.
If you see fewer corrections and smoother handoffs, it's worth
a slot in your stack.
Speaker 2 (08:57):
And if I'm a student creator with Ramen money, use
quen chat.
Speaker 1 (09:01):
For tests and keep prompts tight. You can get the
reliability benefits without paying for giant context windows every time.
Speaker 2 (09:06):
One more angle. We covered Quen's creative side recently, how
it powers accessible AI video gen. This max preview feels
like the brain's upgrade behind those pipelines, better instructions in
better outputs out.
Speaker 1 (09:19):
Totally tie that with agents and you get a sturdier
backbone for content ops research to script to assets to
QA less handholding.
Speaker 2 (09:27):
Final take vibes are picks our magic more than Black
Mirror Doom for me because it's about smoothing creative friction,
not replacing humans.
Speaker 1 (09:35):
I'll co sign. It's a preview, so keep your skepticism
hat on, But if scaling works translates to fewer retries
and steadier long form dialogue, creators will feel that win immediately.
Speaker 2 (09:44):
Big thanks for hanging with us on Blue Lightning AI Daily.
If you test Quen three max preview, tag us with
your workflows dot We want to see those agent chains,
and check.
Speaker 1 (09:54):
Blue Lightning tv dot com for the full story, news
and fresh video tutorials on your favorite AI tools. Sources
today include Ali Baba Cloud documentation, Venture Beats coverage of
Quen three max's preview, and Nvidia's developer blog on Quen deployment.
Speaker 2 (10:07):
I go appreciate you, catch you later.