The Context Engineering Revolution in AI

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Welcome to Innovation Pulse, your quick, no-nonsense update on the latest in AI.

(00:09):
First, we will cover the latest news.
NASA and Google are teaming up on an AI medical assistant for space.
Google enhances Notebook LM.
DeepSeq releases a powerful language model.
Meta faces challenges and Apple adds live translation to AirPods Pro 2.

(00:30):
After this, we'll dive deep into chroma's groundbreaking research on context rot in AI models
and explore Jeff's innovative approach to context engineering.
NASA and Google are collaborating to develop an AI-powered medical assistant for space missions,
dubbed the Crew Medical Officer Digital Assistant.

(00:54):
This system aims to support astronauts' healthcare needs during long missions to the moon,
Mars, or beyond, where communication delays with Earth can be significant.
While current missions to the International Space Station have strong connectivity,
future missions will experience much longer communication lags.

(01:16):
The AI space doctor is designed to bridge this gap by providing timely medical guidance
when immediate Earth-based support is unavailable.
Initial testing has shown promising results in diagnosing medical issues.
While a human medical officer will be on board, the AI can offer additional support,

(01:36):
especially if the officer is the one needing care.
This technology could eventually extend to improve healthcare in remote or underserved
areas on Earth.
Google is integrating deep research into Notebook LM, enhancing users' ability to find and incorporate
sources directly into their notes.

(01:59):
A new interface in the sources tab includes a search box, a toggle for web or Google Drive sources,
and a deep research option.
Previously part of Gemini, this feature helps users, such as students and professionals,
efficiently gather and filter information.
Deep research provides contextually relevant data, saving time on verification.

(02:23):
An upcoming tutor feature aims to transform Notebook LM into a learning assistant,
offering guided explanations and interactive support for educational needs.
This aligns with Google's strategy to incorporate AI across its tools,
making research and learning more seamless.
While still in early development, these enhancements promise to streamline the process of

(02:47):
integrating diverse research into users' notes.
Up next, we're exploring DeepSeek v3.1's groundbreaking features.
DeepSeek, a Chinese AI startup, has launched its latest language model, DeepSeek v3.1.
This model enhances reasoning, tool use, and coding performance,

(03:09):
rivaling open AI and Anthropic models but at a lower cost.
DeepSeek v3.1 can switch between chain of thought reasoning and direct generation,
offering flexibility.
It excels in tool calling and agent tasks with structured formats and supports,
custom code and search agents.

(03:30):
With 621 billion parameters, it uses a mixture of experts designed to reduce inference costs.
The model's context window of 128,000 tokens surpasses most competitors.
It demonstrates strong performance in benchmarks for coding,
math, and tool use.
With thinking mode surpassing previous versions, the model supports tool and code

(03:55):
agent integration, allowing for scriptable workflows.
DeepSeek v3.1 is open source, available under the MIT license,
and compatible with prior versions, promoting both research and commercial use.
Enterprise and team customers can now enhance their productivity with premium

(04:18):
seats that include Claude and Claude code under one subscription.
This integration allows seamless transition from ideation to implementation,
offering developers capabilities to explore and execute advanced code with ease.
Admins benefit from enhanced visibility and control, ensuring smooth scaling across

(04:40):
organizations.
The new compliance API offers programmatic access to usage data for better governance and auditing.
Admins can allocate standard or premium seats based on user needs, providing flexibility
and predictable billing.
Premium seats enable users to collaborate with Claude throughout the development life

(05:02):
cycle, from research to implementation.
Success stories from companies like Behavox and Altana highlight the transformational
impact, showcasing improved productivity and ambitious project engagement.
Admins can manage seats, monitor usage, and set spending limits to maintain control.

(05:24):
The compliance API enhances regulatory adherence, offering real-time data access
and automated policy systems.
Wall Street is pausing on AI investments after news of Meta's AI team restructuring and
hiring freeze.
Initially, Meta attracted over 50 AI experts with lucrative offers,

(05:46):
signaling a strong commitment to AI.
However, this spending halt raises concerns.
While Meta's AI investments boosted its stock by over 25%, surpassing competitors like
Alphabet and Microsoft, CEO Mark Zuckerberg has now stopped such extravagant offers.

(06:08):
This decision isn't just about ending big contracts, it highlights Meta's internal issues.
According to tech columnist John Herrmann, Meta's need to spend heavily on talent
reveals a negative company culture, unclear vision, and flawed strategies.
Idealistic AI professionals are drawn to companies like Open AI and XAI rather than Meta.

(06:33):
Herrmann suggests Meta's high offers reflect desperation, not strength, in attracting talent.
Let's now switch to the importance of user consent.
NBC Universal offers privacy options for residents in certain states.
To stop the sale or sharing of personal information for targeted ads, residents should

(06:57):
use the opt-out form provided. This requires toggling a switch to deactivate the sale of data
and then submitting the form.
It's crucial to manage these settings on each device and browser used, as preferences must be
set individually. Even after opting out, ads will still appear, but they might be less personalized.

(07:20):
For mobile apps, users can adjust geo-location permissions through device settings.
Always active cookies are necessary for basic site functions and security.
If you clear cookies, preferences will need to be reset.
NBC Universal stresses the importance of completing the opt-out form to ensure privacy

(07:41):
rights are respected, as per their policy.
Open AI CEO Sam Altman expressed concerns about the United States potentially
underestimating China's AI capabilities. He suggests export controls alone may not be effective,
warning against oversimplifying the competition between the United States and China in AI progress.

(08:06):
Altman noted China's potential to quickly advance in AI research and product development.
Nvidia CEO Jensen Huang echoed these concerns, criticizing export bans on AI chips like Nvidia's
H20 and AMD's MI308 as they may harm United States technological leadership.

(08:26):
Despite bans, China reportedly acquires banned chips through the black market.
Chinese media encourages reliance on domestic hardware, but Nvidia's advanced products remain
attractive. Although China's domestic chips aren't as powerful, they compensate with high
production and electricity capacity, challenging the United States in the AI sector.

(08:51):
AirPods Pro 2, despite being three years old, are set for another upgrade without a new hardware
launch. Rumors suggest that Apple will reveal a live translation feature for AirPods Pro 2
and AirPods 4 at the upcoming iPhone 17 event. This feature will allow real-time conversations

(09:12):
with speakers of different languages, enhancing global communication and travel experiences.
Previously, Apple surprised users with health-focused updates for AirPods Pro 2,
including hearing aid, test and protection features. With the absence of AirPods Pro 3 this fall,
Apple continues to keep its older model relevant through significant software updates.

(09:38):
The upcoming live translation feature underscores Apple's commitment to enhancing user experience,
even without new hardware, and could make cross-cultural interactions more seamless.
Let's now turn our attention to the India Exclusive subscription benefits. Open AI has

(09:59):
launched ChatGPT Go, a new subscription plan exclusive to India, priced at 399 rupees per month,
making it the company's most affordable option. This move targets India's vast market of nearly
one billion internet users, aligning with a common strategy among global companies to offer

(10:19):
cost-effective services in price-sensitive regions. ChatGPT Go allows users to send and
generate 10 times more messages and images compared to the free version with faster response times.
Higher tier plans offer even more benefits. The ChatGPT Pro plan is available at 1900

(10:40):
rupees per month, while ChatGPT Plus costs 1,999 rupees per month. Earlier this year,
Open AI's CEO Sam Altman met with India's IT minister to discuss creating a low-cost AI ecosystem.
India is currently Open AI's second largest market, potentially surpassing the United States soon.

(11:02):
And now, pivot our discussion towards the main AI topic.
Welcome back to Innovation Pulse. I'm Alex, and today we're diving into something that's going
to completely change how you think about building AI applications. With me is Yakov Lasker,

(11:24):
who's been tracking the infrastructure side of AI development. And Yakov, you brought me this story
about a company called Chroma that just dropped some research that's making waves.
Oh, this is wild, Alex. So picture this. Every major AI company is out there,
bragging about their models, handling millions of tokens perfectly, right? Needle in a haystack,

(11:45):
perfect green charts, the whole marketing machine. Well, Chroma just published research showing that's
basically, well, let's call it creative marketing. Wait, hold up. Are you telling me that when
Claude or GPT says they can handle a million tokens? They're not actually handling them well.
That's exactly what I'm saying. They call it context rot. The more tokens you feed these models,

(12:07):
the worse they get at actually using the information. It's like trying to have a conversation while
someone's reading you an encyclopedia. Sure, you can hear all the words, but good luck remembering
what was on page 47 when you're on page 300. Okay, this is already blowing my mind,
but let's step back. What's Chroma and why should we trust their research on this?
Great question. So Chroma built what's become the most popular open source vector database.

(12:31):
Think of it as the Google for your AI applications memory. They've got five million monthly downloads,
and here's the kicker. They're used by pretty much every major AI project you've heard of.
When researchers at places like Stanford need to build AI agents, they reach for Chroma.
So they're not just theorizing, they're seeing this problem first hand in real applications.

(12:53):
Exactly. And the founder, Jeff, has this fascinating background. He's worked in
applied machine learning for a decade. And he says the thing that drove him crazy was this gap
between demos and production systems. Like you can build a cool AI demo in an afternoon,
but making it actually work reliably. That felt like alchemy, not engineering.
Okay, I love this analogy. It's like the difference between making a perfect meal

(13:18):
once versus running a restaurant that has to serve 500 customers every night without fail.
Yes. And here's where it gets interesting. Jeff says the entire industry has been using
the wrong vocabulary to think about these problems. He absolutely hates the term rag,
you know, retrieval augmented generation that everyone throws around.
Oh, why is that? I thought rag was like the standard approach.

(13:42):
That's his point. He says ragmashes together three completely different concepts,
retrieval, augmentation and generation into one confusing term. It's like calling your car a
combustion steering braking system. Each piece does something different and has different problems
instead of rag. What does he propose? This is where it gets really exciting.

(14:05):
He's coined this term context engineering. And I think this is going to become a major job
category. Context engineering is the discipline of figuring out what information should go into
an AI models context window at any given moment. Wait, that sounds obvious, though.
Don't you just put in what's relevant? Oh, Alex, this is where it gets wild.
Jeff breaks this down into two loops. There's the inner loop,

(14:28):
what goes in the context window right now for this specific task.
But then there's the outer loop. How do you get better over time at making these decisions?
Okay, so it's not just about being smart once. It's about building systems that get smarter.
Exactly. And here's the thing that really hit me. Jeff says this is actually what separates

(14:48):
successful AI companies from unsuccessful ones. Every AI startup that's doing really well,
their secret sauce isn't just having access to good models. It's that they've gotten really,
really good at context engineering. That actually makes sense. If everyone has access to the same
models through APIs, the differentiator is how cleverly you use them. Right. But here's where

(15:13):
the context-rot research becomes crucial. Most developers today are just dumping everything
they can into the context window, thinking more context equals better results.
But Chroma's research shows that beyond a certain point, you're actually making your AI dumber.
So what's the solution? How do you do context engineering well?
This is where Jeff describes this really elegant pattern that's emerging. Think of it like having

(15:38):
a really good research assistant. First stage, you use fast, cheap tools like keyword search and
vector search to go from maybe 100,000 possible documents down to 300 candidates.
Okay. So that's like your research assistant gathering every book that might be relevant.
Perfect analogy. Then, and this is the clever part, you use an LLM as a re-ranker. You give it

(15:58):
those 300 candidates and ask it to pick the 2030 most relevant ones. Jeff says this is way more
cost-effective than people realize because you're using very small focused prompts.
Oh, so you're using AI to curate information for AI. That's recursive in a fascinating way.
Yes. And it gets even more interesting when we talk about memory. Jeff argues that what we call

(16:21):
AI memory is really just applied context engineering. When chat GPT remembers something about you,
it's not storing that information in some mystical way. It's gotten better at retrieving
and organizing relevant information from your conversation history.
Wait, so memory is really just smart search? In a way, yes. But here's what I find fascinating

(16:41):
about Jeff's perspective. He's not just thinking about the technical solutions.
He's thinking about the cultural and philosophical implications of building AI systems.
How so? Well, he makes this observation that Silicon Valley has become obsessed with following
gradients, you know, measure user behavior, follow the data, optimize for engagement.

(17:02):
His critique is that if you just follow what people want in the moment, you end up building
a gotcha app for middle schoolers. Ouch. But I mean, that's kind of what happened with social
media, right? Exactly his point. Instead, he advocates for having a strong contrarian vision
and being maniacally focused on executing that vision well. For Chroma, that meant taking years

(17:25):
to build their cloud service properly instead of rushing something basic to market.
That takes serious conviction, though. How do you maintain that focus when everyone around
you is raising hundreds of millions and making big splashes? This is where Jeff's philosophy gets
really interesting. He talks about viewing life as short and precious. So you should only do work

(17:45):
you absolutely love with people you love working with, serving customers you love serving. He'd
rather build something truly great than optimize for quick money. That's beautiful in theory.
But does it work in practice? Well, look at the results. Chroma has 70 million all time downloads
and is the number one choice in the vector database space. They've proven you can build a

(18:06):
successful business by prioritizing craft and developer experience over everything else.
Okay, but let's bring this back to practical implications. What does this mean for developers
who are building AI applications today? Great question. First, stop thinking that bigger
context windows automatically mean better performance. Start measuring your retrieval

(18:26):
quality with what Jeff calls golden data sets. Small, high quality examples of queries and the
chunks that should be returned. How small are we talking? Jeff says even a couple hundred
high quality examples can be transformative and his advice. Get your team together for a pizza
party and spend a few hours labeling data. That's it. That's how Google open AI and Anthropik do it
too. I love that this billion dollar AI innovation comes down to pizza and manual work. Right. But

(18:52):
here's the deeper insight. Jeff argues we need to stop making memory artificially complicated.
Instead of inventing 10 different types of AI memory, focus on the fundamentals. How do you curate
information? How do you measure if you're getting better at it? How do you systematically improve?
So it's really about building systems, not just prompts. Yes. And this connects to something bigger

(19:16):
Jeff mentioned about the future. He thinks we'll move away from this crude pattern of converting
everything to natural language and back. Future AI systems might stay in embedding space,
that mathematical representation that models actually understand. That's a fascinating thought.
Like why are we translating everything into English for a system that thinks in math? Exactly.

(19:37):
And he also predicts we'll move from retrieving information once at the beginning to continually
retrieving throughout the generation process. Imagine an AI that can pause mid sentence to go
look up exactly the right information it needs rather than trying to load everything up front.
That would be like having a conversation with someone who can instantly access their entire

(19:57):
memory and all of human knowledge, but only pulls out exactly what's relevant to the current moment.
Perfect analogy. But here's what I find most compelling about Jeff's vision.
He's not just talking about making AI more powerful. He's talking about making AI development
more predictable, more measurable, more like traditional engineering. And less like alchemy.

(20:19):
Exactly. Instead of stirring the pot and hoping something good happens, you have clear principles,
measurable outcomes and systematic ways to improve. Context engineering gives developers a
framework to think critically about their AI systems. So if someone's listening to this and
they're building AI applications, what should they do differently tomorrow? Start simple.

(20:40):
Look at your data. Actually, look at it. Don't just assume you know what's in there.
Create a small golden data set of queries and expected results.
Measure your current retrieval quality. Then experiment with that two-stage approach,
broad retrieval followed by LLM re-ranking. And stop cramming everything into the context window,
hoping for magic. Right. Jeff's research shows that less can literally be more when it comes to

(21:05):
context. Quality curation beats quantity dumping every time. This feels like one of those insights
that's obvious in retrospect, but revolutionary when you first hear it. Like, of course, context
quality matters more than context quantity. But somehow the entire industry got distracted by
bigger and bigger context windows. And that's the power of having someone like Jeff who's deeply

(21:29):
embedded in the practical side of AI infrastructure, but also thinking philosophically about where this
is all headed. He's seeing the problems that developers face every day and translating that
into better abstractions and tools. Well, I know I'm going to be thinking about context engineering
the next time I'm frustrated with an AI system giving me irrelevant results. Thanks for bringing

(21:50):
this to us, Yakov. Thanks for having me, Alex. Next time your AI assistant gives you a terrible
answer, remember? It might not be the model that's the problem. It might be the context engineering.
And that's a wrap on today's innovation pulse. Until next time, keep building the future.

(22:12):
We've covered NASA and Google's collaboration on Space Mission AI, Google's notebook LM
enhancements, DeepSeq's latest model and Meta's AI challenges, plus NBC Universal's privacy options,
OpenAI's chat GPT go in India, Apple's AirPods updates and insights into AI context

(22:32):
engineering from Chroma. Don't forget to like, subscribe and share this episode with your friends
and colleagues so they can also stay updated on the latest news and gain powerful insights.
Stay tuned for more updates.

(22:55):
Thanks for watching.

All Episodes

Episode Transcript

Popular Podcasts

24/7 News: The Latest

Dateline NBC

The Clay Travis and Buck Sexton Show

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}The Context Engineering Revolution in AI

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}24/7 News: The Latest

Dateline NBC

The Clay Travis and Buck Sexton Show

All Episodes

The Context Engineering Revolution in AI

24/7 News: The Latest