All Episodes

June 28, 2025 35 mins
Ready to peek behind the curtain of the world’s fastest-moving tech?

This week, we’re diving into the wildest new advancements in artificial intelligence—where the future is being built, one headline at a time. Microsoft just dropped “MW,” a compact language model that lives right inside Copilot Plus PCs, promising lightning-fast, on-device AI responses. Meanwhile, Google’s Magenta team is shaking up the music world with “Magenta RT,” an open-source, real-time music AI that lets you generate and manipulate audio on the fly. And if you’re a Mac user, meet “Similar”—the new AI agent that automates your web tasks locally, with you in the driver’s seat.

But it’s not all smooth sailing in AI land. OpenAI’s much-hyped hardware collab with design legend Jony Ive just hit a legal speed bump—a trademark dispute that forced them to pull their AI device promos (but don’t worry, the partnership is still alive and kicking).

From AI music generation to on-device language models, Mac automation, and the legal drama behind the scenes, this episode is your front-row seat to the latest AI news, breakthroughs, and controversies. If you want to stay ahead of the curve—and maybe even impress your friends with the freshest AI gossip—this is the podcast you can’t miss.

Hit play, share with your fellow tech enthusiasts, and subscribe for more unfiltered, up-to-the-minute AI updates every week. The future is happening now—don’t blink!


Become a supporter of this podcast: https://www.spreaker.com/podcast/tech-threads-sci-tech-future-tech-ai--5976276/support.

You May also Like:
🎁The Perfect Gift App
Find the Perfect Gift in Seconds


⭐Sky Near Me
Live map of stars & planets visible near you

✔Debt Planner App
Your Path to Debt-Free Living
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Welcome everyone to the deep dive, where we plunge headfirst
into the most compelling stories and breakthroughs shipping our world. Hmm, okay,
imagine this for a second. You're sitting at your computer, right, yeah,
and it's not just responding instantly. It's almost anticipating your
next thought, like it's reading your mind.

Speaker 2 (00:17):
Or picture this. They're a musician maybe, and instead of
you know, panstakingly editing a track bit by bit, you're
just jamming live with an AI, an AI that follows
your every creative whim in real time, just flowing with
you exactly.

Speaker 1 (00:30):
Or how about this an AI agent that I don't
know silently handles all your most tedious online chores, navigating
complex websites, filling out endless forms, juggling multiple accounts all
will you remain completely in charge like a true digital
coworker that well, it never clocks out.

Speaker 2 (00:47):
Yeah. It might sound like a glimpse into some you know,
far off sci fi future, but these aren't distant dreams anymore.
The world of artificial intelligence is accelerating at this well,
frankly astonishing pace. And what's really good roundbreaking, I think,
is that the most impactful innovations. They're no longer just
confined to those massive, distant cloud servers. We're witnessing incredible

(01:08):
advancements happening right there, right on your desktop, in your
web browser, even in the highly secretive design labs of
the big tech giants.

Speaker 1 (01:17):
Yeah, it's a fundamental shift, isn't it? Where AI actually
lives and how we interact with it. Absolutely, So, this
deep dive it's all about pulling back the curtain on
some of the most fascinating and honestly impactful developments in
AI today. Our mission, well, it's to unpack four distinct
yet deeply interconnected innovations that are fundamentally reshaping how we

(01:38):
interact with.

Speaker 2 (01:38):
Technology and how we create art, even how we manage
our daily digital lives. We're here to extract the most
important nuggets of knowledge. The insights make these complex topics clear,
engaging and.

Speaker 1 (01:49):
Relevant, most importantly relevant to you. Consider this year shortcut
maybe to being truly well informed about the AI revolution
that's happening right now, like right around you.

Speaker 2 (01:58):
Okay, So we'll start by exploding during this kind of
quiet revolution happening inside our personal devices. We'll look at
how AI is getting smarter faster and much more localized,
literally living on your computer. Then we'll switch gears a
bit to the well the electrifying world of real time
creative AI, specifically in music, where the machine becomes you know,

(02:20):
a true co creator.

Speaker 1 (02:22):
Ooh, I like the sound of that.

Speaker 2 (02:23):
From there, we'll uncover how new AI agents are transforming
how you interact with the web, giving you unprecedented control
and efficiency tackling those tedious online tasks.

Speaker 1 (02:33):
Yes please.

Speaker 2 (02:34):
And finally we'll delve into some ambitious plans for future AI.
Hardware device is designed from the ground up for AI
and the unexpected hurdles they face even before they reach
your hands.

Speaker 1 (02:45):
Okay, sounds like a plan. Get ready for some serious
aha moments, folks. So this should really change how you
think about your tech. So let's start with that first
one AI that feels alive on your device. What's actually
enabling that that instant feedback? Because I mean, for most
of us, right our mental model of powerful AI, it
still lives in those massive data centers right.

Speaker 2 (03:07):
Far away in the cloud, requires constant internet connection.

Speaker 1 (03:09):
Exactly, you ask a question, your device sends it off
to a server farm somewhere. The server choos on it
sends the answer back. It's like, yeah, well it's like
a conversation with a built in delay.

Speaker 2 (03:18):
Uh huh a lag.

Speaker 1 (03:20):
But what if the smartest, most responsive AI was right
there on your device, whispering answers into your ear in
like milliseconds.

Speaker 2 (03:29):
Well that's precisely the quiet revolution Microsoft has been nurturing
with their new Copilot plus pieces.

Speaker 1 (03:34):
Okay, tell me about that. What's Microsoft doing here?

Speaker 2 (03:37):
What's fascinating is how they've quietly integrated this specialized micromodel
code named MWMW Okay. And when we say micro it's
truly a marvel of efficiency. We're talking about a model
built with just get this, three hundred and thirty million parameter.

Speaker 1 (03:51):
Thirty million, okay? Is that small in the AI world?

Speaker 2 (03:54):
Oh, it's tiny for context, Many of the large language
models you hear about, you know, our GPT Gemini, they
operate with billions, sometimes hundreds of billions parameters, right, huge numbers.
So fitting that kind of capability, that intelligence into a
model that's orders of magnitude smaller, that's significant engineering feed

(04:15):
It's not just about shrinking a giant it's about fundamentally
redesigning it to be super efficient without you know, sacrificing
it smarts.

Speaker 1 (04:23):
Right, you can't just make it downb by making it
small exactly, And the why of doing this on the
device that seems foundational. What are the big benefits for
me the user?

Speaker 2 (04:32):
Well, think about the immediate tangible benefits. First off, no
reliance on a cloud server. That means zero network latency, no.

Speaker 1 (04:40):
Delay, okay, instant response and.

Speaker 2 (04:43):
Crucially, your data, your personal queries, your preferences, it all
stays right there on your machine.

Speaker 1 (04:49):
Ah, privacy, that's huge.

Speaker 2 (04:51):
It's a monumental win for privacy and security. You're not
sending sensitive information across the Internet to some third party
for processing. It's all happening locally, instantly.

Speaker 1 (05:00):
It's like having a brilliant personal assistant whose ears are
only for you.

Speaker 2 (05:03):
And whose memory is only yours.

Speaker 1 (05:04):
Yeah, that makes a lot of sense. So this local
processing that leads to the speed, Right, what's the secret
sauce there?

Speaker 2 (05:12):
It leads directly to mbu's secret sauce. It's incredible speed
and efficiency for everyday tasks. See, unlike many traditional AI
models that might process your request piece by piece or
reread the entire conversation every time you add a.

Speaker 1 (05:27):
New thought, yeah, and that sounds inefficient.

Speaker 2 (05:29):
It is MW work smarter. It's designed to break your
question down once understand its core meaning and hold that
context in memory. Okay, then it builds a response from
that deep, persistent understanding. It avoids all that repetitive processing.

Speaker 1 (05:44):
It's like a librarian who doesn't have to reread every
book on the shelf for each question. They just know
where the info is.

Speaker 2 (05:51):
That's a great analogy, a highly efficient, focused librarian who
instantly grasps the essence of your query and pulls out
precisely what's needed from their immediate grasp, maybe even recalls
previous conversations you've had.

Speaker 1 (06:04):
Okay, that's smart design, not just raw power exactly.

Speaker 2 (06:08):
It's incredibly intelligent design.

Speaker 1 (06:10):
And you mentioned a specialized chip playing a role here.

Speaker 2 (06:12):
The NPU, right, A huge part of this instant speed
comes from a specialized chip called the neural processing unit
or NPU.

Speaker 1 (06:20):
Okay, So, for those of us not constantly diving into
chip architectures, what is an NPU? Why is it so
critical for this on device AI stuff? Is it totally new?

Speaker 2 (06:32):
That's a really good question because NPUs are absolutely pivotal here.
Think of your main computer chip, the CPU, the central
processing unit workhourse. Yeah, the jack of all trades. Its
super versatile handles lots of different tasks, but it's not
optimized for any one specific thing.

Speaker 1 (06:46):
Right.

Speaker 2 (06:47):
Then you have your GPU. The graphics processing unit became
popular for parallel processing, initially for games, right, but it
also found a niche in AI training because it can
do many calculations at once. The NPU, though it's the
it's a dedicated ship. We're sometimes a dedicated part of
a larger ship engineered specifically to accelerate AI workloads. So

(07:07):
just for AI pretty much, it's built to crunch the
complex math that underpins AI models, especially neural networks, with
incredible efficiency and crucially using less power efficiency. Again, it's
dedicated function means it hits speeds and power efficiency for
these specific tasks that a general CPU just can't match.

(07:29):
So it's an evolution, yeah, but one that marks a
distinct new category of computing purpose built for AI.

Speaker 1 (07:35):
And the performance numbers Microsoft is sharing it's so pretty impressive,
they really are.

Speaker 2 (07:40):
We're talking almost twice as quick for that first word
of a response, wow, and nearly five times faster overall
compared to similar models running without that dedicated.

Speaker 1 (07:49):
Hardware five times faster. You know, my personal experience trying
to adjust to setting or ask my computer something, it
often involves that pause, that spinning wheel, that general sense
of sluggishness, like the machine is sighing. Uh do I
have to right?

Speaker 2 (08:04):
It's frustrating, like trying to talk to someone who keeps
pausing awkwardly.

Speaker 1 (08:07):
Exactly. Yeah. So the idea that you could just type
something like dim the keyboard backlight after five minutes and
it just happens instantly, no cloud delay. That feels like
a real game changer for just using your computer.

Speaker 2 (08:19):
It is. It makes the machine feel alive, almost like
it's flowing with your thoughts, not fighting you.

Speaker 1 (08:25):
Yeah, not being a frustrating bottleneck.

Speaker 2 (08:27):
It's definitely more than just a speed bump. It's a
fundamental shift in responsiveness in the intimacy of how we
interact with our tech. And to keep MW so efficient,
the team at Microsoft did some really clever technical stuff. Well.
They designed the model to share certain pieces of its
own internal structure. That means less redundant processing, smaller memory.

Speaker 1 (08:48):
Footprint, okay, smart reuse.

Speaker 2 (08:50):
Yeah. They also divided the workload, letting the MPU handle
multiple tasks at the same time without slowing down. And
get this, they added sophisticated techniques to help it track
longer conversations without getting confused or losing.

Speaker 1 (09:02):
Context, which AI can sometimes do.

Speaker 2 (09:05):
Right, all while using less memory during training. It's meticulous
engineering focused on keeping it powerful but also lightweight and fast.
It's not just brute force. It's smart frugal design.

Speaker 1 (09:15):
Okay, so elegant design, not just muscle. Now what about
training this mini AI brain? That seems critical too. You
can't just feed it random internet junk, right, especially it's small.

Speaker 2 (09:26):
Oh absolutely critical. Quality of training data is paramount, especially
when distilling knowledge into a compact form. They started with
meticulously curated, high quality educational material.

Speaker 1 (09:38):
Good start.

Speaker 2 (09:38):
Then they significantly distilled knowledge from much larger, more powerful
cloud based models distilled.

Speaker 1 (09:45):
How did that work?

Speaker 2 (09:46):
Think of it like learning the essential principles from the
world's smartest professors, but then condensing all that wisdom into
a highly efficient, focused study guide rather than memorizing every
single word they ever said. It allows MW to go
rasp the essence of complex info and reasoning without needing
the massive size of the original models.

Speaker 1 (10:06):
That makes sense. But it also needs to understand Windows itself, right,
not just general knowledge, but how to actually change settings
find things that seems like a different.

Speaker 2 (10:16):
Challenge precisely, And this is where the practical application really shines.
They specifically fine tuned MW with a huge data set
over three and a half million.

Speaker 1 (10:25):
Examples three point five million, yeah.

Speaker 2 (10:28):
Covering everything from simple screen brightness adjustments to complex privacy options.
And importantly, they included real world variations, common typos, different
ways people phrase things.

Speaker 1 (10:39):
Ah, so it understands normal human.

Speaker 2 (10:41):
Messiness exactly, not just perfect robotic commands. After all this
fine tuning and optimization, MWN now performs nearly as well
as much larger cloud models, but it stays lightweight, incredibly fast,
outputting over two hundred words per second on devices like
the surface laptop seven all locally.

Speaker 1 (11:00):
That's amazing speed. So complex queries get answered in under
half a second.

Speaker 2 (11:03):
Under half second. Yeah, and here's a smart detail yeah.
For quick one word searches, it cleverly still uses the
regular search function under the hood. We do that it
avoids those sometimes weird guesses or overly wordy, unhelpful AI
responses you get when a simple search is really all
you needed. Prevents the AI from overthinking simple things.

Speaker 1 (11:25):
That is smart. My favorite detail though, is the dual
monitor thing. If you have two screens and ask it
to adjust brightness.

Speaker 2 (11:32):
It doesn't just blindly adjust both.

Speaker 1 (11:33):
Yeah, it pauses and asks which one do you mean?
That level of understanding context that proactive clarification, that feels
really sophisticated. It's not just following orders, it's understanding intent.

Speaker 2 (11:45):
That contextual awareness, that proactive interaction. It really is a
paradigm shift. For so long we've relied on that cloud
first model. Every query around trip delays, privacy worries as
on device intelligence. The implications for user autonomy, for privacy,
for just the feel of responsiveness, they're enormous. It's not
delegating to some remote brain that takes your data. It's

(12:06):
having a smart, dedicated private assistant right there with.

Speaker 1 (12:09):
You, always ready, always private.

Speaker 2 (12:11):
It really democratizes access to advanced AI, too, makes it
available without needing constant super fast internet opens up possibilities
for everyone everywhere.

Speaker 1 (12:21):
I can definitely relate to the frustration of slow tech
trying to find some buried setting, clicking through endless menus
lag lag lag. It's like watching a video that constantly
buffers the promise of MW's instant responses, that nuanced understanding. Yeah,
it makes me genuinely excited about a future where our
devices feel like sealless extensions of our thoughts, not these

(12:43):
clunky intermediaries where the machine just gets out of the
way and works.

Speaker 2 (12:46):
Indeed, it's a fundamental step towards computing that feels truly integrated, intuitive,
making tech less of a barrier more of an amplifier.

Speaker 1 (12:55):
Okay, so that incredible leap in on device intelligence, making
our PCs feel true, truly responsive in private. It highlights
this broader shift, doesn't it This push for AI that
enhances our control not takes it.

Speaker 2 (13:07):
Away the philosophy of user empowerment. Yeah yeah, AI is
a sophisticated assistant, not an autonomous.

Speaker 1 (13:12):
Overlord, which brings us neatly to our next topic, a
new breed of AI agent that's rethinking how you interact
with the web. So we've seen AI as this hyper
efficient assistant on your PC, but what of an AI
could be your diligent co worker for online stuff, Handling
compless web tasks, navigating pages, filling forms, all while keeping

(13:35):
you firmly in the driver's seat.

Speaker 2 (13:36):
This brings us to Similar, a tiny startup aiming to
reinvent web browsing. Starting on the Mac. Similar they're introducing
a new kind of AI agent specifically focused on what
they cleverly call web chores.

Speaker 1 (13:48):
Web chores I like it.

Speaker 2 (13:50):
We all have those, don't we just and their core
principle is absolutely rooted in local control and privacy, unlike
a lot of current web automation tools that rely on
cloud hosting puppetry.

Speaker 1 (14:00):
Puppetry, you mean someone else is pulling the strings remotely.

Speaker 2 (14:03):
Essentially, Yeah, your browsing session is controlled from a distance server.
Similar instead spins up a fully sandboxed WebKit window directly
on your.

Speaker 1 (14:11):
Desktop sandbox so like isolated.

Speaker 2 (14:13):
Completely isolated. It's its own secure mini browser environment on
your machine, separate from your main browser. Nothing leaves that
isolated window. It's a self contained digital workspace just for
the AI.

Speaker 1 (14:26):
Okay, that sounds huge For privacy, no cookies, no log
in tokens, no auto filled data getting sent off somewhere exactly.

Speaker 2 (14:33):
It all lives right there on your own SSD.

Speaker 1 (14:35):
That's a massive relief. I've always been wary of extensions
or services that want to automate things, especially if they
need access to all my browsing data. The moment your
data leaves your machine.

Speaker 2 (14:45):
Yeah, you lose control, you do? You introduce potential vulnerabilities.
Knowing everything stays local, that's reassuring. Think about filling out
complex web forms, maybe for a new account at tax thing, setting.

Speaker 1 (14:56):
Up a utility, Oh yeah, painful.

Speaker 2 (14:58):
You're typing away a page after page and you start wondering,
is this secure? Is my autofill data going somewhere weird?
This local approach really simplifies that anxiety. Puts you and
your data back in charge. It absolutely does. It tackles
one of the biggest privacy headaches in the cloud era.
So many AI apps need you to upload sensitive data,
often without full transparency on how it's handled or secured.

(15:21):
Similar's on device operation just sidesteps that completely. Your personal
and browsing data stays under your direct local control. It
shifts the power back to the user, away from implicitly
trusting third parties, towards trust built into the architecture.

Speaker 1 (15:36):
Itself user empowerment.

Speaker 2 (15:37):
Again, definitely, it's a significant step, and this.

Speaker 1 (15:40):
Leads to their core philosophy. You mentioned shared control. The
AI drives, but you never lose the wheel. So it's
not full automation.

Speaker 2 (15:47):
No, it's not about fully autonomous AI taking over your browser.
It's about a dynamic, collaborative partnership. The AI handles the mundane,
but you keep ultimate authority.

Speaker 1 (15:57):
How does that work? In practice? Makes a mistake.

Speaker 2 (16:01):
That's where instant override comes in essential to this. Imagine
the agent is on an e commerce site. Maybe it
tries to hit by too soon, or it misreads a
cappy tcha, or just fills a form field wrong. With
similar you don't wait for some clunky handoff, dialogue or
lag while a remote model tries to figure out the
page again. You just instantly nudge it aside with your trackpad.

Speaker 1 (16:24):
Just like that with the trackpad.

Speaker 2 (16:26):
Yeah, overwrite a form entry, steer it to a different tab,
click a button at missed, No anxious waiting. The control
is fluid, immediate. That responsiveness is key for building trust.

Speaker 1 (16:37):
Okay, so it's not like that annoying GPS voice yelling
turn right when you know it's wrong and you can't
shut it up.

Speaker 2 (16:44):
No, definitely not.

Speaker 1 (16:45):
It's more like like having a really efficient coworker. They
post the mail cart, sort the documents, but they still
tap you when a signature is needed or there's a
judgment call.

Speaker 2 (16:54):
That's the analogy. They use a diligent coworker who gladly
pushes the mail cart, but still taps you when a
signature is required.

Speaker 1 (17:00):
I like that it builds trust, which, like you said,
is crucial if we're going to accept AI helping with
more complex tasks.

Speaker 2 (17:07):
That analogy perfectly captures their thoughtful approach. And this isn't
just a simple script. It's built on some seriously sophisticated tech.
The founders they cut their teeth on multi agent systems
at deep.

Speaker 1 (17:19):
Mind ah deep mind pedigree okay.

Speaker 2 (17:21):
They leverage that deep expertise to build an open source
framework called S two to power similar. Think of it
like a highly specialized modular pit crew for your web browser.

Speaker 1 (17:31):
A modular pit crew love it. So how does his
pit crew work? What are the different members doing?

Speaker 2 (17:37):
It breaks down into several layers, each smart in its
own way. Top level, a high level planner sketches out
the overall goal, complete this grocery run or gather tracking.

Speaker 1 (17:47):
Numbers okay, the strategist.

Speaker 2 (17:49):
Then there's a vision layer. It actually reads the live
pixels on the screen like a human seize spotting buttons,
text fields, interactive bits, adapting to different website layouts so it.

Speaker 1 (17:59):
Sees the page does and just read the code.

Speaker 2 (18:01):
Exactly, which makes it more robust to weird website designs.
Below that, you have smaller specialized subagents handling specific granular tasks,
scrolling precisely, entering text correctly, clicking a specific link. The
specialists and a crucial component sort of the brain of
the operation is a short term memory thread. It keeps
track of past actions, observations, what was clicked two minutes ago,

(18:24):
what info was just extracted to maintain context, avoid repeating
steps or getting lost.

Speaker 1 (18:30):
That memory part sounds vital for complex tasks.

Speaker 2 (18:33):
Absolutely. This modularity makes it robust and adaptable to the
chaos of the web, and.

Speaker 1 (18:39):
The benchmark seem to back that up. It's not just
good in theory.

Speaker 2 (18:42):
Indeed, Similar posted a really impressive ninety point zero five
percent completion score on the web Voyager benchmark, which is
designed to be messy and reflect the real world.

Speaker 1 (18:52):
Ninety percent. That's high, and.

Speaker 2 (18:54):
What's maybe even more telling is that it significantly edged
out models from giants like open ai and n on
benchmarks like O World and Android World.

Speaker 1 (19:03):
Why is that significant?

Speaker 2 (19:04):
Because those environments are full of the stuff that usually
trips up automated bots, pop ups everywhere, infinite scroll pages,
poorly labeled inputs.

Speaker 1 (19:11):
That need inference, the bane of my existency.

Speaker 2 (19:14):
Right. The fact that similar excels there shows its robust design.
It can handle the unpredictable, frustrating reality of the modern web.

Speaker 1 (19:22):
That is seriously impressive, because, yeah, those are the things
that make you want to tear your hair out. Pop
ups you can't close, scrolling forever. Yeah, it's a minefield.
So are people actually using this effectively? Any cool stories?

Speaker 2 (19:35):
No? Yeah, the anecdotes really bring it to life. One
user described using it for a three store grocery run
for a cookout.

Speaker 1 (19:42):
Okay, juggling multiple sites.

Speaker 2 (19:44):
Yeah, imagine getting siracha from Instacart, organic buns from Amazon Fresh,
and a specialty gluten free cake from some local bakeries.
Obscure portal, managing conformation pop ups before each checkout.

Speaker 1 (19:58):
That sounds stressful just thinking about it.

Speaker 2 (20:00):
Handled it popped up a quick sound good confirmation before
each final step, seamlessly integrating three different shopping experiences.

Speaker 1 (20:07):
Nice. What else?

Speaker 2 (20:08):
Another user kept it running quietly in the background to
pull tracking numbers from five different e commerce sites.

Speaker 1 (20:14):
Oh I need that.

Speaker 2 (20:15):
Right, scraped every intransit status update, then dropped a tidy
summary right into notion. Saved hours of tedious copy pasting.

Speaker 1 (20:25):
Huge time saver, total stress reducer. I'm often juggling orders
or trying to consolidate info for projects. That sounds like magic.

Speaker 2 (20:34):
There was also a journalist using it to harvest a
dozen research articles from various academic databasesok, bloiled them down
into tidy abstracts, and then took back control to cherry
pick specific quotes for their piece.

Speaker 1 (20:47):
Wow. So it's like a hyper efficient research assistant doing
the grunt work, freeing you up for the creative part.

Speaker 2 (20:52):
Exactly, and crucially, because it's all local, the agent can
log into paywalled sites using your existing keychain credentials from safarr.
Nothing leaks out, nothing stored off device, but.

Speaker 1 (21:02):
No re entering passwords or messing with security nope.

Speaker 2 (21:05):
And if your Wi Fi flakes out on the train
or something, your workflow doesn't just collapse. The agent keeps
chugging along on.

Speaker 1 (21:10):
Your device reliability.

Speaker 2 (21:12):
That's key too, that blend of robust local autonomy plus
that instant fluid override. That's really their big philosophical swing.
The team behind similar they don't think people will fully
trust hands free browsers for years. Frankly neither do.

Speaker 1 (21:27):
I Yeah, probably not completely.

Speaker 2 (21:29):
So their whole design focuses on this collaborative trust building approach,
giving users superpowers without making them feel like they've lost control.
It's a really interesting blueprint for human AI collaboration.

Speaker 1 (21:41):
It's a compelling vision. Okay, So from making our digital
lives easier with these efficient private web agents, let's switch
gears again. Let's explore how AI is making waves in
a completely different realm, real time creative expression.

Speaker 2 (21:55):
Ah yes, music.

Speaker 1 (21:56):
Specifically music where Google's Magenta team has delivered what feels
like a surprising left hook innovation with Magenta Real Time
or Magenda RT. And this isn't just about generating static tracks, right,
this is about live jamming.

Speaker 2 (22:09):
Magenta RT is absolutely a breakthrough in generative audio, especially
because of how accessible it is the model itself.

Speaker 1 (22:15):
It's open weight, open weight meaning.

Speaker 2 (22:17):
Meaning the model's parameters, its brain are publicly available. It's
under the Apache two point zero license, lives freely on
GitHub and hugging face super accessible for anyone with the
tech know how.

Speaker 1 (22:27):
That's huge for the creative community. No black boxes exactly.

Speaker 2 (22:30):
It's an eight hundred million parameter model, so it's substantial,
capable of rich musical complexity. But here's the kicker, the
really revolutionary part. It's speed.

Speaker 1 (22:40):
Okay, how fast.

Speaker 2 (22:41):
It can generate two seconds of high fidelity forty eight
kilohertz stereo audio in about one point twenty five seconds,
even on a free tier collab TPU.

Speaker 1 (22:50):
WAIT generating two seconds takes less than two seconds.

Speaker 2 (22:53):
Yeah, that's a forward real time factor of roughly point
six y twenty five, which means they.

Speaker 1 (22:57):
Actually jam live with it. This, this is revolutionary creative work.
I've dabbled in music creation, and the biggest bottleneck is
always render time laid on a track, add an effect
generated part. Wait. It kills the flow, totally kills the flow.
It's so frustrating. The idea of feeding its style prompts
mid performance, changing the music as you play, and never

(23:19):
waiting for a giant batch render that sounds incredibly liberating.
Turns the machine from a render farm into well into
a bandmate.

Speaker 2 (23:28):
It absolutely is liberating. You're not waiting for the AI
to think, breaking your concentration. You're in this continuous improvisational
conversation with it shaping the output as it unfolds. If
you think bigger picture, this moves generative music from static
output type of prompt get a file to a dynamic
interactive experience. It unlocks totally new improvisational possibilities AI as

(23:51):
a responsive creative partner.

Speaker 1 (23:53):
It's not just an evolution, it's a paradigm shift in
how musicians can work spontaneous composition exploration stuff and really
do before exactly, So, how does it work technically? How
do they get that seamless low latency generation? What's this
streamer pipeline?

Speaker 2 (24:07):
Okay, So the pipeline from AGENTA RT, while it shares
some ideas with other models like music ELM or music effects,
rolls everything into this streamer. Audio isn't processed as one
big file. It's intelligently broken down into discrete CODEC.

Speaker 1 (24:21):
Token codec tokens little digital bits of sound.

Speaker 2 (24:25):
Essentially, yeah, small highly efficient digital representations. The magic happens
in how each segment is generated. It's conditionally based on
a running ten second history of the music that just played.

Speaker 1 (24:35):
Okay, context, plus.

Speaker 2 (24:36):
Your real time text prompts or even tiny audio clips.

Speaker 1 (24:39):
You feed it audio clips too cool.

Speaker 2 (24:41):
And the sophisticated conditioning happens through something called a joint
embedding named music Coco.

Speaker 1 (24:46):
Music Coco sounds delicious.

Speaker 2 (24:48):
It's a hybrid of Mulan and Coco. Technically, think of
music Coco as this smart interpreter that understands both the
raw audio and the nuances of human language. It lets
it seamlessly blend your live input with your text pompts
guiding the ais improvisation.

Speaker 1 (25:03):
So you could like type lo fi housegroove.

Speaker 2 (25:06):
Exactly, type lo fi house groove with warm roads piano.
It starts generating that vibe. Then maybe a few bars later,
you play a little guitar lick. The model seamlessly blends
that guitar into the track, or even adapts its future
generations based on that new melody. The key is it
maintains a consistent vibe and musical context as new chunks

(25:27):
roll in. Makes it feel.

Speaker 1 (25:28):
Fluid, natural, so it's not just cutting in awkwardly. It's
evolving the musical idea in real time precisely.

Speaker 2 (25:36):
That's what makes it feel like jamming with another musician,
not just using a.

Speaker 1 (25:39):
Tool that seamless blending is absolutely vital for a live feel,
avoids those jarring transitions you sometimes get with AIMDIA. So
how do they train this thing and optimize it for
real time?

Speaker 2 (25:52):
Training involved a vast amount of data around one hundred
and ninety thousand hours of instrumental stock tracks. Gives it
a deep diverse understanding of music structure, genres, instruments. All
that data went through a hierarchical neural codec.

Speaker 1 (26:05):
Fancy words, what's that do?

Speaker 2 (26:06):
It's a sophisticated algorithm that learns to compress and decompress
sounds super efficiently while keeping all the essential musical detail.
Ensures the fidelity stays crisp, professional, but keeps the token
sequences short and manageable for real time processing.

Speaker 1 (26:21):
Efficiency again seems to be a theme today.

Speaker 2 (26:23):
It really is. The Transformer model itself is also highly
optimized for this chunked streaming. Each two second window is
generated with overlapping boundaries AH.

Speaker 1 (26:33):
To avoid clicks.

Speaker 2 (26:34):
And pops exactly ensure smooth audio output. Plus the used
advanced stuff like XLA compilation and heavy cache reuse to
stop the device memory from thrashing.

Speaker 1 (26:44):
Under the load, so optimized data handling like as chef
prepping ingredients.

Speaker 2 (26:48):
Good analogy preprocessing, and smartly storing data to ensure a smooth,
continuous process without bottlenecks.

Speaker 1 (26:55):
So it's not just creative, it's technically clean. What about
creative freedom for you know, everyday musicians or game devs?
Can you really morph genres on the fly without it
sounding like a mess?

Speaker 2 (27:05):
Google claims you can, and the demos look remarkable. Prompted
to go from synth wave to Boston of a piano midbar,
it supposedly performs the shift without dropping tempo or coherence.

Speaker 1 (27:17):
That's wild, unprecedented control for improv.

Speaker 2 (27:21):
And remember it's open weight, self hostable, Apache two point
zero license, no API bills, no rate limits, throttling you
music to.

Speaker 1 (27:29):
My ears, literally no vendor lock in.

Speaker 2 (27:32):
Right artists, developers, creators. They have full control, download it,
run it locally, experiment freely, fine tune it with their
own music, all without huge costs or limits. Significantly lowers
the barrier to entry for advanced generative music.

Speaker 1 (27:47):
And they're hinting at even more.

Speaker 2 (27:48):
Possibilities beyond self hosting.

Speaker 1 (27:50):
Yeah, tantalizing hints future developments like on device inference.

Speaker 2 (27:54):
Pathways running round on your laptop or phone.

Speaker 1 (27:56):
Potentially yeah, with specialized chips, even less latency, more portabils,
and personal fine tunes. Meaning it's just you could train
the model on your own music library or style.

Speaker 2 (28:05):
Imagine DJs baking in their sonic fingerprints for live sets,
game devs generating dynamic soundtracks that react to gameplay, individual
artists creating a truly unique AI collaborator that gets their
specific vision.

Speaker 1 (28:19):
That's next level personalization.

Speaker 2 (28:20):
If you compare this to other approaches to fusion models
like refusion or big auto regressive ones like open AI's jukebox,
which sound good but take ages.

Speaker 1 (28:29):
To generate, right you wait for the file dump.

Speaker 2 (28:31):
Magenta ARTI's breakthrough is unequivocally that low latency. You're actively
steering the music in real time, improvising, interacting. It's a
totally different, much more fluid, deeply integrated creative workflow.

Speaker 1 (28:43):
What an exciting time for creators. The machine as a partner,
not just a post production tool. Okay, so we've seen
AI as a swift private assistant on the PC.

Speaker 2 (28:53):
A diligent collaborative web coworker, and now a responsive creative
partner in music.

Speaker 1 (28:58):
There's a clear trend here, AI getting deeply embedded in
our immediate environment.

Speaker 2 (29:02):
Are workflows becoming more integrated, responsive, personal.

Speaker 1 (29:05):
Which brings us to its most ambitious form, perhaps entirely
new dedicated hardware designed to redefine our physical interaction with AI.
But uh oh, it's already hit some snags.

Speaker 2 (29:17):
We're talking about open AI's hardware ambitions. Yeah, yeah, specifically
that buzzy partnership with ex Apple design legend Jony Ive.

Speaker 1 (29:25):
Ah yes, Jony Ive, big news.

Speaker 2 (29:27):
Huge excitement around the announced six point five billion dollar
acquisition of his design shop Io, with plans for ultra
minimal AI devices. That freezing alone hints it's something radically
different from phones and laptops.

Speaker 1 (29:40):
Jony Ive. I mean the guy shaped the iPhone, Apple
Wash AirPods. He redefined how tech looks and feels. I've
always admired that Apple design philosophy, blending form and function
so seamlessly. It feels almost inevitable.

Speaker 2 (29:52):
Right, beautiful simplicity.

Speaker 1 (29:53):
So the idea of his touch on a revolutionary AI
device built from the ground up for AI, that's incredibly compelling.
He sounded super excited in early chats, telling Sam Altman.
Everything he learned in thirty years led to this moment.

Speaker 2 (30:05):
And Sam Altman flashed a prototype, called it the coolest
piece of technology that the world will have ever seen.

Speaker 1 (30:11):
Okay, bold claim, high.

Speaker 2 (30:12):
Stakes, high praise, maybe bordering on hyperbole, but it speaks
to the vision. Details are still frustratingly scarce, though insiders
describe it as an unobtrusive desktop companion.

Speaker 1 (30:23):
Unobtrusive so not demanding attention.

Speaker 2 (30:25):
Seems like it something that maybe quietly stays aware of
your environment, listening, perceiving, designed to fit naturally alongside your
MacBook and iPhone, complimenting, not replacing.

Speaker 1 (30:37):
The bigger vision. Sounds like ubiquitous AI seamlessly integrated into
our physical space, moving.

Speaker 2 (30:42):
Beyond screens exactly, AI becoming part of the fabric of
daily live, almost invisibly, always available, subtly enhancing how we
interact with the world. A big leap towards ambient.

Speaker 1 (30:52):
Intelligence where the environment itself becomes smart. Okay, very sci
fi yeah, But then the snag. Yeah, just as the
hype was built, reality bit back.

Speaker 2 (31:01):
A very real, very traditional hurdle, a trademark tangle, classic
case of cutting edge innovation colliding with messy business realities.

Speaker 1 (31:09):
What happened exactly?

Speaker 2 (31:10):
An existing hearing aids startup named Iyo iyoh. Yeah, they
launched a lawsuit against open AI's new venture Io over
the almost identical.

Speaker 1 (31:20):
Name oh I oversus iyo.

Speaker 2 (31:22):
Close, and the consequence was swift a court ordered takedown
of open AI's promotional stuff related to the venture.

Speaker 1 (31:29):
Really what got taken down overnight?

Speaker 2 (31:31):
The detailed blog post announcing the partnership, that slick nine
minute video of Ivan Altman chatting about their vision vanished
from openy dot.

Speaker 1 (31:41):
Com, wow pulled completely from.

Speaker 2 (31:43):
Their official platform. Yeah, you can still find the video
floating around YouTube from third party uploads, but open Ai
scrubbed it clear legal directive.

Speaker 1 (31:53):
That's dramatic, especially for open Ai that usually dominate the news.
So is the whole jony I've deal dead or just
the name?

Speaker 2 (32:01):
Thankfully for open Ai know the deal itself is totally intact.
According to their spokesperson Kayla Wood, it's just the branding,
the name Io, and the related promo material that needed scrubbing.
It'll stay in limbo until the lawyers sorted out.

Speaker 1 (32:12):
Okay, so the project lives, but under a cloud or
maybe without a name for now, pretty much.

Speaker 2 (32:18):
This whole incident really highlights the messy real world challenges.
Even giants like open ai run into existing legal frameworks,
especially with trademarks, when they push into hardware.

Speaker 1 (32:28):
It's a powerful reminder, isn't it. Innovation doesn't happen in
a vacuum. It has to navigate the world of existing rights, names, regulations.

Speaker 2 (32:36):
Absolutely.

Speaker 1 (32:36):
It shows the path to break through hardware isn't just
tech innovation and brilliant design. It's also navigating the complex,
often litigious landscape of IP and existing businesses. These are
just speed bumps. They affect timelines, perception.

Speaker 2 (32:50):
Precisely, and the device itself isn't expected until next year anyway.
That implies a long development process before the legal tangle.
Now they have this added leg Yeah, it raises that
important question for all of us in a field moving
this fast. How do traditional legal frameworks adapt to new
unforeseen applications and branding. How do we balance protecting existing

(33:13):
rights with fostering groundbreaking innovation.

Speaker 1 (33:16):
It's a tricky balancing act and likely to get more
complex as AI gets woven into everything. For sure. Wow,
what an incredible journey today. We've really taken a deep
dive into some truly fascinating advancements, haven't we from AI
that lives in things right on your computer? Faster, more personal,
profoundly private.

Speaker 2 (33:33):
Two intelligent agents that streamline your online tasks with unprecedented control,
real collaboration.

Speaker 1 (33:39):
Yeah, And exploring that thrilling potential of AI for real
time creative expression in music, making the machine a true
co creator.

Speaker 2 (33:48):
And finally looking at the ambitious future of dedicated AI hardware,
but also recognizing that even the biggest innovations face those
unexpected real world hurdles.

Speaker 1 (33:58):
The common thread, it seems, through all of this, it's
this clear accelerating move towards AI experiences that are more integrated,
more responsive, more personally tailored.

Speaker 2 (34:09):
Absolutely, these aren't just abstract tech concepts and research papers anymore.
They are actively shaping how you work, how you create,
manage info, interact with the digital world every single day.

Speaker 1 (34:19):
They're making technology feel less like this rigid tool you
have to operate, more like I don't know, an intuitive,
fluid extension of your own will almost dissolving into the background.
Well put, so, as AI gets more deeply embedded in
our devices, even our physical environments, maybe here's a final
thought to chew on, how will this increasing intelligence, this autonomy,

(34:40):
redefine our fundamental relationship with technology itself?

Speaker 2 (34:43):
HM Moving beyond tools, Yeah.

Speaker 1 (34:46):
Moving beyond tools to something more like intelligent companions or
even co creators sharing our space, our creative work. What
new forms of human AI collaboration might emerge when the
AI isn't just in the distant cloud, but truly a
on your device, capable of real time understanding, nuanced interaction,
seamless integration into your daily flow.

Speaker 2 (35:07):
It's a future that's rapidly unfolding right now, immense potential, profound.

Speaker 1 (35:11):
Questions, and we're definitely just getting started.

Speaker 2 (35:13):
Thanks for joining us for this deep dive.

Speaker 1 (35:15):
We'll catch you next time.
Advertise With Us

Popular Podcasts

Stuff You Should Know
My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder is a true crime comedy podcast hosted by Karen Kilgariff and Georgia Hardstark. Each week, Karen and Georgia share compelling true crimes and hometown stories from friends and listeners. Since MFM launched in January of 2016, Karen and Georgia have shared their lifelong interest in true crime and have covered stories of infamous serial killers like the Night Stalker, mysterious cold cases, captivating cults, incredible survivor stories and important events from history like the Tulsa race massacre of 1921. My Favorite Murder is part of the Exactly Right podcast network that provides a platform for bold, creative voices to bring to life provocative, entertaining and relatable stories for audiences everywhere. The Exactly Right roster of podcasts covers a variety of topics including historic true crime, comedic interviews and news, science, pop culture and more. Podcasts on the network include Buried Bones with Kate Winkler Dawson and Paul Holes, That's Messed Up: An SVU Podcast, This Podcast Will Kill You, Bananas and more.

The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.