All Episodes

April 14, 2025 25 mins
ChatGPT will now remember your old conversations OpenAI Open-Sources BrowseComp Benchmark to Enhance AI Web Browsing Capabilities YouTube Just Dropped a Free AI Music-Making Tool Spammers Exploit GPT-4o-mini Model to Bypass Filters and Target 80,000 Websites Gemini 2.5 Pro Launches Deep Research for Enhanced AI Assistance Elon Musk's AI company, xAI, launches an API for Grok 3 Google's Ironwood TPU Redefines the AI Chip Game #AI, #ChatGPT, #OpenAI, #Google, #YouTube, #ElonMusk, #xAI
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Welcome to Innovation Pulse, your quick, no-nonsense update on the latest in AI.

(00:09):
First, we will cover the latest news.
Open AI is enhancing chat GPT's memory for pro users.
YouTube adds AI music tools, spammers, misuse AI tech, and Gemini 2.5 Pro expands research
capabilities.
After this, we'll dive deep into Google's strategic move with their new Ironwood TPU,

(00:33):
transforming AI inference and its impact on the market.
Stay tuned.
Open AI is enhancing chat GPT with a memory feature that allows it to recall past conversations
even if users haven't manually saved them.
CEO Sam Altman announced that this update supports the goal of creating AI that grows

(00:54):
alongside users.
This new capability builds on a previous memory feature that retained limited data for future
interactions.
The update distinguishes between saved memories and reference chat history, using insights
from past interactions to improve future responses.
However, it's not available in the European Union, United Kingdom, Switzerland, Norway,

(01:20):
Iceland, and Liechtenstein due to strict AI regulations.
The rollout is initially for users on chat GPT's $200 pro subscription, with plans to
extend to $20 plus subscribers and other tiers soon.
Users can opt out of this feature through personalization settings, ensuring their privacy preferences

(01:42):
are respected.
Next we'll discuss the impact of Browsecomp.
Open AI has introduced Browsecomp, a new open source benchmark for evaluating AI agents
web browsing capabilities.
Browsecomp contains 1,266 challenging questions designed to test how well AI can locate and

(02:05):
analyze complex information on the web, simulating real-world scenarios like academic research
or market analysis.
This benchmark emphasizes finding hidden treasures in information-rich environments rather than
answering common questions.
By open-sourcing Browsecomp, open AI aims to foster global collaboration and innovation,

(02:30):
offering developers a tool to optimize AI performance in real-world web settings.
Initial evaluations show models trained for deep web research like deep research, excel
in complex tasks, highlighting Browsecomp's effectiveness in distinguishing model capabilities.
Browsecomp's release paves the way for smarter AI applications, crucial for businesses, academia

(02:57):
and individual users.
It also prompts discussions on AI ethics, such as data privacy and algorithmic bias, encouraging
a safer AI ecosystem.
YouTube has introduced an AI-powered music feature for creators, allowing them to generate
royalty-free instrumental tracks directly in creator music.

(03:20):
This tool helps streamline the process of finding music by enabling creators to describe
the vibe they want, including instruments and mood.
The AI then creates a custom track, free from licensing worries.
Initially available to US-based creators in the YouTube Partner Program, this feature
aims to reduce time spent searching for music, allowing more focus on content creation.

(03:45):
While it doesn't replace human composers, it simplifies soundtrack creation at scale.
This development showcases YouTube's commitment to enhancing creative control, setting a foundation
for creators to integrate original music into their brand effortlessly.
The move reflects a strategic step beyond simply competing with platforms like TikTok

(04:07):
or Reels.
Spammers exploited open AI's capabilities to send unique messages to over 80,000 websites,
evading spam filters.
This was detailed by Sentinel Labs researchers.
They highlighted the dual nature of large language models, which can be used for both

(04:28):
helpful and harmful purposes.
The spammers, using a framework called Akirabot, automated sending messages to promote dubious
SEO services.
Akirabot employed Python scripts to vary domain names and used OpenAI's chat API, GPT-40
Mini, to tailor messages for each site.

(04:49):
This customization helped bypass filters that block identical content.
Messages were sent via contact forms and live chats on targeted websites.
Researchers emphasized the challenges AI poses in defending against spam.
OpenAI, recognizing the misuse of its service, revoked the spammer's account and thanked

(05:11):
the researchers for their findings.
Join us as we step into the future of advanced AI research.
Gemini 2.5 Pro Experimental now includes deep research.
Available to Gemini advanced subscribers, this powerful AI model, recognized for its

(05:32):
capabilities, enhances the research process.
In testing, reports generated by Gemini deep research were favored over other providers
by more than a two to one margin.
Users note significant improvements in analytical reasoning and information synthesis, resulting
in more insightful reports.

(05:53):
This feature is accessible on the web, Android, and iOS, allowing users to generate comprehensive
reports on various topics, saving hours.
Additionally, the audio overviews feature can transform reports into podcast style conversations
for easy listening on the go.

(06:14):
Users can explore these features by selecting Gemini 2.5 Pro Experimental and choosing Deep
Research in the prompt bar.
Elon Musk's XAI, despite facing a lawsuit from OpenAI, is pushing forward with its
Grok 3 AI model, now available via API.

(06:37):
Grok 3, launched months ago, competes with models like GPT-4O and Google's Gemini, offering
image analysis and question responses.
XAI provides Grok 3 and its mini version, both featuring reasoning capabilities at varying
costs.
The pricing matches Anthropics Clawed, 3.7 Sonnet, but is higher than Google's Gemini

(07:02):
2.5 Pro.
Grok 3's context window is smaller than advertised, handling up to 131,072 tokens instead of the
claimed 1 million.
Musk initially described Grok as edgy and controversial, willing to tackle topics other
AIs avoid.

(07:22):
However, earlier versions leaned politically left, which Musk attributes to its training
data.
He aims to make Grok politically neutral, though the effectiveness of this shift remains
uncertain.
And now, pivot our discussion towards the main AI topic.

(07:46):
Today we're going to explore Google's latest advancement in AI hardware, the Ironwood
Tenser Processing Unit.
During Google Cloud Next 25, the tech giant unveiled this new custom chip designed specifically
for AI inference, marking a significant shift in their hardware strategy.

(08:06):
We're joined by tech analyst Yakov Lasker to discuss what this means for the industry,
Google's competition with NVIDIA, and the economics of AI computing.
Welcome to the show, Yakov.
Thanks for having me, Robbie. I've been following Google's TPU journey for years,
and this announcement represents a fascinating pivot in their strategy.

(08:27):
I'm excited to dive into the details with you today.
Feel free to ask your first question.
Let's start with the basics.
For our listeners who might not be familiar, what exactly is a TPU, and how does Google's
development of these chips fit into the broader AI hardware landscape?
A Tenser Processing Unit, or TPU, is a specialized chip designed by Google specifically to accelerate

(08:50):
AI workloads.
Unlike general-purpose CPUs or even GPUs, TPUs are application-specific integrated circuits,
ASICs, built from the ground up for tensor operations.
The mathematical calculations that power modern machine learning models.
Google's been developing these chips in-house for over a decade now, through six previous

(09:11):
generations before Ironwood.
While companies like Nvidia focus on creating versatile chips sold to everyone, Google's
TPU strategy was initially about creating custom silicon for their own AI research needs.
This allowed them to optimize specific workloads that they knew they'd be running repeatedly.
That's helpful context.

(09:31):
What makes this new Ironwood TPU announcement particularly significant compared to previous
versions?
What's truly groundbreaking about Ironwood is that it's the first TPU Google has positioned
primarily for inference rather than training.
Previous generations were either focused on training, the resource-intensive process of
developing neural networks, or pitched as dual-purpose chips that could handle both training and

(09:56):
inference.
Ironwood marks a strategic pivot to focus on inference, which is the process of using
already-trained AI models to make predictions for real-world requests from users.
This shift reflects the maturing AI industry, where we're moving from primarily research
and development to actual widespread deployment of AI systems, serving billions of users daily.

(10:20):
You mentioned the distinction between inference and training.
Could you elaborate on that difference and why it matters for chip design?
Absolutely.
Training is the process where an AI model learns patterns from data.
Think of it as the education phase.
This requires massive computational power, but happens relatively infrequently.
For a company like Google, they might train a major new model version just once or twice

(10:44):
a year.
Inference, on the other hand, is when the trained model makes predictions or generates
content for real users, like when Google Assistant answers your question or Gmail suggests replies.
Inference happens billions of times daily across Google's products.
The computational requirements differ significantly.

(11:06):
Training needs to process huge data sets repeatedly, while inference must deliver low-latency responses
to millions of concurrent users.
That makes sense.
So why is Google suddenly shifting its TPU focus toward inference now after years of
emphasizing training capabilities?
The timing aligns with an economic inflection point in AI.

(11:27):
First, we're seeing companies move from experimental AI projects to deploying production models
that serve real customers, shifting demand from training to inference.
Second, the emergence of reasoning AI models, like Google's Gemini, has dramatically increased
inference costs.
These reasoning models don't just generate simple responses.

(11:50):
They perform multi-step thinking processes that can require 10, 100x more computation
than earlier models.
As Google noted when describing Ironwood, reasoning and multi-step inference is shifting
the incremental demand for compute, and therefore cost, from training to inference time.
With inference now consuming more resources than ever, optimizing those costs becomes

(12:15):
critical to Google's bottom line.
Speaking of economics, can you help us understand the market dynamics between training chips
and inference chips in the industry?
Training chips occupy what we call a lower volume market.
Only research labs and a handful of tech giants have the resources to train massive foundation
models, and they only do so periodically.

(12:35):
These customers might buy thousands of chips for their training clusters.
Inference, however, is a high volume market.
Every company deploying AI, from startups to enterprises to cloud providers, needs inference
chips to serve their customers daily.
This could mean millions of chips distributed across data centers worldwide.

(13:00):
From a chip manufacturer's perspective, inference represents a much larger revenue opportunity,
which explains why Google might be repositioning its TPU program to capture this growing market.
Let's talk about the competition.
How do Google's TPUs compare to Nvidia's GPUs, which seem to dominate the AI chip market

(13:21):
Nvidia GPUs and Google TPUs represent fundamentally different approaches to AI acceleration.
Nvidia's GPUs were originally designed for computer graphics but have evolved to be remarkably
effective for AI through their parallel processing capabilities and robust software ecosystem,
CUDA, being the crown jewel that developers love.
TPUs, being purpose-built for AI, can achieve higher performance and energy efficiency for

(13:46):
specific workloads.
They excel at the matrix multiplication operations central to neural networks.
However, Google's TPUs have historically been available only within Google's ecosystem,
while Nvidia sells to everyone.
TPUs also lack the software flexibility of GPUs.
They're optimized for Google's TensorFlow and JAX framework specifically.

(14:08):
We've seen reports about the enormous costs of building and running large AI models.
How might Ironwood impact Google's economics in this AI arms race?
The economics here are fascinating.
Wall Street analysts estimate that if Google were selling TPUs as hardware to Nvidia customers,
they could have generated up to $24 billion in revenue last year, indicating the substantial

(14:31):
value of this technology.
For Google itself, greater adoption of in-house TPUs could significantly reduce their AI infrastructure
costs compared to purchasing chips from vendors like Nvidia, whose GPUs command premium prices
due to overwhelming demand, with projects like DeepSeek and Stargate pushing AI infrastructure
costs into the hundreds of billions, even modest efficiency improvements translate to

(14:55):
enormous savings.
This cost advantage could help Google remain competitive with other tech giants investing
heavily in AI.
You mentioned that Google has been developing TPUs for over a decade.
How has their chip strategy evolved over the generations?
Google's TPU journey reflects their evolving AI ambitions.
The first generation TPU, revealed in 2016, was focused solely on inference for existing

(15:21):
models.
With the second generation in 2017, Google began talking about combined abilities for
both training and inference, reflecting their expanding research goals.
As Google's AI models grew more complex, so did their chips.
The fourth and fifth generations brought massive performance improvements.
While the sixth generation Trillium TPU, which became generally available just last December,

(15:46):
was positioned as a versatile chip for both training and serving predictions.
Ironwood represents yet another pivot, acknowledging that inference is now where the largest computational
demands and costs are emerging.
Google Cloud currently relies heavily on chips from Intel, AMD, and Nvidia.
Does Ironwood signal a change in that relationship?

(16:08):
This is perhaps the most significant strategic implication.
According to research from KeyBank Capital Markets, Intel, AMD, and Nvidia chips, currently
make up about 99% of processors used in Google Cloud instances with TPUs accounting for less
than 1%.
Ironwood likely signals Google's intent to reduce this dependency on external chip suppliers.

(16:30):
While Google had previously described TPUs as necessary for cutting edge research, but
not alternatives to commercial chips, this explicit push into the inference market suggests
they're now willing to compete more directly with traditional suppliers.
This could dramatically alter the power dynamics between Google and chip vendors like Nvidia,

(16:51):
whose AI chips command premium prices due to unprecedented demand.
We've been hearing a lot about new reasoning AI models.
Could you explain what these are and why they particularly impact inference costs?
Reasoning AI models represent the next evolution in large language models.
Traditional LLMs like earlier versions of chat GPT or BARD generate responses based on pattern

(17:15):
recognition and prediction.
Reasoning models like Google's Gemini or Anthropics Clawed add an extra layer.
They simulate a thought process, breaking problems into steps and thinking things through before
responding.
This approach dramatically improves capabilities for complex tasks, but comes at a computational

(17:35):
cost.
A reasoning model might generate 10 to 50 times more internal statements than what the
user sees in the final output.
This test time scaling, as Google calls it, shifts the computational burden heavily toward
inference time.
Optimizing for these workflows requires specialized hardware designs, which is precisely what

(17:58):
Ironwood aims to address.
Let's talk about the manufacturing side.
Google doesn't fabricate these chips themselves, right?
How does production work?
That's correct.
While Google designs their TPUs in-house, they partner with external manufacturers for
production.
For several generations, they've worked with Chipmaker Broadcom to help take each new TPU

(18:21):
design into commercial production.
This arrangement allows Google to focus on their core competency.
Chip architecture optimized for their AI workloads, while leveraging Broadcom's manufacturing
expertise.
It's worth noting that this differs from Nvidia's approach, which works with foundries
like TSMC, but handles more of the chip design and commercialization process themselves.

(18:45):
Google's approach might offer flexibility, but potentially at the cost of some margin
that goes to their manufacturing partners.
How might Ironwood impact developers and businesses using Google Cloud?
For Google Cloud customers, Ironwood could translate to better performance and potentially
lower costs.
When running inference workloads, developers might see new service tiers or instance types

(19:08):
optimized specifically for serving AI predictions at scale.
Businesses deploying reasoning heavy AI applications could benefit most significantly, as these
workloads align precisely with Ironwood's design goals.
I expect Google will position these chips as differentiators against competing cloud
providers, particularly for customers who need cost-effective scaling for advanced AI

(19:32):
capabilities.
The key question is whether Google will pass those cost savings on to customers, or capture
them as improved margins.
The article mentioned DeepSeq AI.
Can you elaborate on how developments like that have influenced Google's chip strategy?
DeepSeq AI represents a fascinating case study in AI economics.
We've created models that can outperform much more expensive options, but at a fraction

(19:56):
of the cost.
Wall Street analysts have been particularly focused on this development because it demonstrates
that spending efficiency, not just raw investment, matters tremendously in AI.
This focus on cost efficiency puts pressure on Google and other tech giants to optimize
their AI infrastructure spending.
When analysts see companies achieving comparable results with smaller budgets, they naturally

(20:22):
question whether the massive capital expenditures by bigger players are justified.
Ironwood appears to be part of Google's response, demonstrating that they're innovating to improve
the economics of AI at scale, rather than simply throwing more resources at the problem.
Let's talk energy efficiency.
How does this factor into Google's chip strategy with Ironwood?

(20:45):
Energy efficiency is absolutely critical in modern data centers, both for operational
costs and environmental impact.
TPUs have traditionally offered better performance per watt for specific AI workloads compared
to general-purpose processors or GPUs.
With Ironwood focusing on inference, which runs 24 to 7 across global data centers, the

(21:05):
energy savings could be substantial.
Google's data centers already consume enormous amounts of electricity, and AI workloads are
increasingly driving that consumption.
By optimizing specifically for inference efficiency, Google can better manage their carbon foot
print while keeping operating costs in check.
This is especially important as inference workloads continue to grow, with the proliferation

(21:28):
of AI-powered services.
How might Ironwood impact Google's competitive position against other cloud providers like
Microsoft and Amazon, who are heavily investing in AI?
This represents a potential competitive advantage for Google in the cloud wars.
Both Microsoft and Amazon rely heavily on NVIDIA GPUs for their AI offerings, which

(21:49):
means they're subject to supply constraints and premium pricing that have characterized
the GPU market recently.
If Google can scale deployment of Ironwood TPUs effectively, they could offer inference
capabilities at price points or performance levels that competitors can't match using
off-the-shelf GPUs.
This would be particularly valuable for attracting enterprise customers who are looking to deploy

(22:12):
AI at scale, but are concerned about costs.
However, the advantage depends on whether Google can manufacture TPUs at sufficient
volume and whether their software ecosystem can match the development flexibility that
NVIDIA's CUDA provides.
Looking ahead, what does this shift toward inference-focused chips tell us about the

(22:33):
future direction of AI hardware?
I believe we're witnessing the beginning of greater specialization in AI silicon.
The first wave of AI acceleration was about getting models to work at all.
NVIDIA's GPUs were remarkably well suited for this due to their parallel processing
capabilities.
The second wave was about massive scaling for training ever larger models, leading to

(22:57):
specialized training clusters.
This third wave, exemplified by Ironwood, focuses on making AI economically viable at
scale.
We'll likely see increasing divergence between training and inference hardware, with each
optimized for their specific workloads.
I also expect vertically integrated companies like Google to gain advantages by tailoring

(23:20):
hardware precisely to their software needs, creating performance and efficiency gains
that general purpose solutions can't match.
Finally, do you think we'll see Google offer these TPUs to external customers similar to
how NVIDIA sells their GPUs?
While Google hasn't explicitly stated plans to sell TPUs as standalone hardware, the economics

(23:40):
suggest they might consider it.
If Wall Street analysts are correct that Google could generate $24 billion selling
TPUs to the market, that's a compelling business opportunity.
However, I think it's more likely that Google will keep TPUs primarily within their ecosystem,
offering them through Google Cloud Services rather than as purchasable hardware.

(24:03):
This allows them to maintain control over the full stack and capture recurring revenue
through Cloud Services rather than one-time hardware sales.
It also aligns with their broader strategy of using proprietary technology to differentiate
their cloud offerings in an increasingly competitive market.
Yakov, thank you for sharing your insights on Google's Ironwood TPU and helping us understand

(24:25):
the broader implications for the AI hardware landscape.
This has been an enlightening conversation about the economics and technology driving
the next generation of AI computing.
It's been my pleasure, Robbie.
The AI hardware space is evolving rapidly and developments like Ironwood reveal how
the industry is maturing from research projects to production scale deployment.

(24:47):
I look forward to seeing how Google's inference-focused strategy unfolds and how competitors respond
in this increasingly crucial battleground for tech supremacy.
That's a wrap for today's podcast.
Ironwood AI is enhancing chat GPT with memory features and tackling spam misuse while Google

(25:09):
pivots to the Ironwood TPU for AI inference efficiency, potentially challenging Nvidia's
market dominance.
Don't forget to like, subscribe and share this episode with your friends and colleagues
so they can also stay updated on the latest news and gain powerful insights.
Stay tuned for more updates.

(25:33):
Thanks for watching.
I'll see you in the next one.
Bye.
Bye.
Advertise With Us

Popular Podcasts

Stuff You Should Know
Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.