All Episodes

November 17, 2025 • 21 mins
OpenAI Launches GPT-5.1 with Enhanced Conversational Skills and Customizable Communication Styles OpenAI readies ChatGPT Group Chats with custom controls Google to Enhance Product Integration with Secure Generative AI and Private AI Compute NotebookLM's Deep Research Automates Tasks, Integrates Findings, and Supports Multiple File Types Nano Banana 2 Launches November 11 with Enhanced Features and Creative Flexibility Beyond Words - Why AI Needs to Think in 3D #AI, #ChatGPT, #OpenAI, #GenerativeAI, #3DThinking, #GoogleAI, #TechLaunch
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Welcome to Innovation Pulse, your quick, no-nonsense update on the latest in AI.

(00:10):
First, we will cover the latest news.
Open AI's GPT 5.1 models enhance chat GPT with conversational warmth and group chat features.
Google Boost Privacy in AI and NanoBanana 2 advances image generation.
After this, we will dive deep into AI's spatial intelligence challenges

(00:32):
and how world models could revolutionize our interaction with physical spaces.
Open AI recently launched GPT 5.1 Instant and GPT 5.1,
thinking updated AI models in chat GPT designed to be warmer, more conversational, and instruction following.
These releases come amid earlier criticism of overly cheerful interactions

(00:57):
and concerns about changes made after lawsuits related to mental health.
The models aim for better technical performance and offer eight communication styles,
like professional and friendly, to suit user preferences while maintaining core capabilities.
Open AI plans a phased rollout starting with paid users

(01:19):
and will later integrate both models into its API.
With over 800 million users, Open AI's CEO emphasized personalization
but warned against excessive customization that could limit users' growth.
Open AI is addressing potential unhealthy attachments to AI with expert input,

(01:40):
balancing engagement with user safety.
The new models aim to cater to diverse needs while avoiding potential harm.
Open AI is developing a group chats feature for chat GPT aimed at launching in December.
This feature will allow multiple users to join a shared conversation,
interacting with each other and the AI within a single chat.

(02:04):
It offers customizable controls such as setting system prompts
and managing when the AI should respond, enhancing structured collaboration.
This setup is designed to facilitate teamwork, brainstorming and problem solving
by reducing the need for separate chats and minimizing context switching.

(02:25):
Unlike Microsoft's Co-Pilot, which focuses on basic user invites,
Open AI's approach introduces more detailed management of AI participation,
appealing to professional and academic users.
Though still under development, this feature is expected to enhance chat GPT's
utility as a collaborative tool, aligning with Open AI's tradition of releasing features in December.

(02:53):
Join us as we explore the integration of AI.
Google is integrating generative AI into its products, emphasizing privacy and security
with its new private AI compute system.
This system uses custom tensor processing units, TPUs
and a trusted execution environment to encrypt and protect user data.

(03:15):
Google claims it offers security comparable to local processing
but with enhanced power from the cloud, enabling advanced AI models like Gemini.
The Pixel 10 phone benefits from these advancements, especially in its MagicQ feature,
which generates suggestions based on screen context using cloud power.

(03:36):
Although MagicQ is still developing, Google plans to enhance its functionality
with the private AI compute system.
Despite the cloud's processing power, local AI processing remains valuable
for its lower latency and reliability without internet dependency.
Google's hybrid approach aims to balance the strengths of both local and cloud AI processing.

(03:59):
Notebook LM is enhancing research capabilities with new features for finding and using sources effectively.
The introduction of deep research automates complex online research, acting as a personal researcher.
It creates detailed reports and suggests relevant articles, refining searches as it learns.
Users can add these reports and sources directly into their notebooks,

(04:24):
building a comprehensive knowledge base without interrupting workflow.
Additionally, Notebook LM now supports more file types for research,
including Google Sheets, Drive Files as URLs, Images, PDFs from Google Drive and Microsoft Word Documents.
These features will roll out in the coming weeks, providing users with a richer research experience.

(04:50):
Whether quick searches or in-depth studies, Notebook LM aims to streamline the process,
allowing users to focus on gaining insights and understanding topics more thoroughly.
The NanoBanana 2 is set to launch on November 11, offering notable improvements over its predecessor.

(05:11):
Built on Gemini 3.0 Pro, it will feature enhanced data and data storage.
Also, it will feature enhanced capabilities like precise coloring, advanced control over views,
and better text correction in image generation.
Early previews show significant quality enhancements, addressing previous limitations.

(05:35):
Despite expectations, it might still rely on Gemini 2.5 Flash initially, with future upgrades planned.
The model introduces a new multi-step process for generating and refining images, ensuring higher accuracy.
Rebranded as NanoBanana Pro, it promises up to three times greater accuracy and consistency.

(05:59):
New features include expanded aspect ratios and resolutions, including 1K, 2K and 4K,
offering flexibility for various creative applications.
As Google prepares for a broad release, the update is expected to transform workflows for AI users across different sectors.
And now, pivot our discussion towards the main AI topic.

(06:29):
Welcome back to Innovation Pulse.
I'm Alex, and today I've got something that's going to completely reshape how you think about artificial intelligence.
We're talking about a capability so fundamental to how humans understand reality that we don't even realize we're using it constantly,
but current AI, it's basically blind to it.

(06:50):
And joining me to unpack this is Yakov Lasker, who's been diving deep into the latest developments in AI.
Yakov, welcome.
Thanks for having me, Alex.
And look, I know people are getting tired of hearing that AI is about to change everything again,
but here's something wild.
Right now, at this very moment, you could ask ChatGPT to describe what happens when you toss your keys across a room,

(07:16):
and it'll give you a beautiful, eloquent answer.
But if you actually asked it to predict the trajectory, to understand the physics of that motion in three-dimensional space,
it would fail spectacularly.
Wait, really?
Because these models can write code, analyze complex documents, even generate images now.
Exactly. That's what makes this so fascinating.

(07:38):
We've built these incredibly sophisticated language models, but they're what Fei-Fei Li calls wordsmiths in the dark.
Elegant, but fundamentally ungrounded in physical reality.
And she should know.
She's the Stanford professor who created ImageNet, which basically launched modern AI as we know it.
Oh, the ImageNet that everyone credits with the deep learning revolution.

(08:01):
The very same. And now she's saying, we've hit a wall.
She founded a company called World Labs, early last year specifically, to tackle what she believes is AI's next frontier.
Spatial intelligence.
Okay, but spatial intelligence. Break that down for me, because it sounds like just fancy terminology for understanding space.
It is, but think about how profound that actually is.

(08:24):
You know when you park your car and you instinctively know whether you can fit into that tight spot?
You're not doing math in your head.
You're not calculating angles and distances.
Your brain is running this incredibly sophisticated simulation of the physical world.
You're imagining the narrowing gap between your bumper and the curb.
Huh. I've literally never thought about that consciously, but you're right. It's completely automatic.

(08:49):
Exactly. And it's not just parking. It's catching those keys you mentioned.
Navigating a crowded sidewalk without bumping into people.
Pouring coffee into your mug while half asleep.
Every single day, we're processing this three-dimensional understanding of how objects relate to each other in space.
How they move, how they interact.
But here's where I'm stuck. We have computer vision now.

(09:12):
AI can recognize objects and images, right? What's the difference?
Oh, this is where it gets interesting.
Current AI can look at a picture and tell you, that's a chair or that's a dog.
But ask it to estimate the distance between two objects in that picture?
Or predict what happens if someone pushes that chair?
The accuracy drops to basically chance.

(09:33):
These models have no real understanding of three-dimensional space, physics, or how the world actually works.
That's actually kind of shocking, because I've seen those AI-generated videos that look incredibly realistic.
You mean the ones that lose coherence after about three seconds?
Yeah, that's exactly the problem.
They can create something that looks visually convincing in the moment,

(09:55):
but they can't maintain consistency because they don't understand underlying spatial structure.
It's like the difference between memorizing a speech phonetically in a language you don't speak versus actually understanding what you're saying.
Okay, so give me the why should I care moment here?
Why does this matter beyond just making better AI videos?
Think about everything we can't do with AI right now, despite all the hype.

(10:18):
Autonomous robots that can actually function in normal homes?
Still mostly science fiction.
AI that could help accelerate drug discovery by modeling how molecules interact in three dimensions?
We're barely scratching the surface.
Immersive educational experiences where students could actually walk through a human cell and see how it works?

(10:39):
Not really happening yet.
And spatial intelligence is the missing piece.
It's THE missing piece.
Fei-Fei Li traces this back to evolution, actually.
She points out that long before animals could communicate with language or build civilizations,
the simple act of sensing the environment sparked the entire journey toward intelligence.
Perception and action.

(11:01):
That loop is what drove the evolution of nervous systems, of brains, of everything we think of as intelligence.
So, intelligence didn't start with thinking, it started with moving through space.
Exactly.
And here's where the argument gets really compelling.
She talks about how spatial intelligence isn't just about moving around.
It's foundational to human creativity and imagination.

(11:23):
When filmmakers create worlds, when art text design buildings, when kids play Minecraft,
they're all using the spatial intelligence to imagine and create.
Oh, that reminds me of something.
She mentions Watson and Crick discovering DNA's structure, right?
How'd spatial intelligence play into that?
Perfect example.
They didn't figure out DNA's double helix structure by writing equations.

(11:46):
They built physical three-dimensional models with metal plates and wire,
manipulating them until the spatial arrangement of the base pairs clicked into place.
They had to physically work with objects in space to make that breakthrough.
That's wild.
So what's the solution?
How do you teach AI to think spatially?
This is where World Labs comes in.
Fei-Fei Li and her co-founders are building something called world models.

(12:10):
And this is not just incremental improvement.
This is a fundamentally different approach to AI.
Different how?
So language models like chat GPT are trained on sequential data, one token after another,
predicting what comes next in a sentence.
World models need to understand and generate entire three-dimensional environments
that obey the laws of physics, maintain geometric consistency,

(12:32):
and can interact with inputs in real time.
The complexity is orders of magnitude higher.
Hold up, are you telling me they're trying to teach AI to simulate reality?
Not just simulate, understand, reason about, and interact with reality.
Li defines world models through three essential capabilities.
First, they need to be generative, meaning they can create diverse virtual worlds

(12:56):
that are geometrically and physically consistent.
Second, they need to be multimodal, processing images, videos, text instructions,
even gestures just like humans do.
And third, this is the game changer.
They need to be interactive.
Interactive meaning what? Exactly.
Meaning if you give the model an action or a goal, it should predict the next state of the world.

(13:18):
If you tell it the robot picks up the cup, it needs to understand not just what that looks like,
but how the physics works, how the environment changes, what happens next.
It's predicting the future state of a three-dimensional world based on actions taken within it.
Okay, but let's get real for a second.
This sounds incredibly difficult. What are the actual technical hurdles?

(13:40):
Oh, it's massive.
First, they need to figure out the training function.
With language models, it's relatively simple.
Predict the next word.
But how do you create an equivalent for predicting the next state of a three-dimensional world?
Then there's the data problem.
Language models train on text, scraped from the internet.
World models need to extract spatial information from two-dimensional images and videos,

(14:04):
which is like trying to reconstruct a sculpture by only looking at photographs of it.
That's a great analogy. So how are they approaching this?
Multiple fronts, they're working on new architectures that go beyond the current video diffusion models,
which just tokenize everything into sequences.
They're exploring spatially grounded representations, treating space and time as actual dimensions,

(14:26):
rather than flattening everything.
They've already published research on something called RTFM,
a real-time generative model that uses spatial frames as a form of memory.
Wait, they're already building this? I thought this was all theoretical.
No, this is where it gets exciting.
World Labs has already released a limited preview of something called marble,

(14:48):
the first world model that can take multimodal inputs
and generate consistent three-dimensional environments that people can explore and interact with.
So what can you actually do with marble right now?
By sharing, creators and storytellers can generate explorable 3D worlds without traditional 3D design software.
You give it text descriptions, images, whatever prompts you want,

(15:10):
and it creates this geometrically consistent environment that maintains persistence.
Meaning if you turn around and look back, things are still where they were.
That's actually huge for game designers and filmmakers, right?
Because normally creating 3D environments is incredibly time-intensive.
Exactly, and this connects to something Lee is really passionate about.
She keeps emphasizing that AI should augment human capability, not replace it.

(15:35):
Marble isn't replacing the creative vision of a filmmaker.
It's giving them a tool to rapidly prototype and explore ideas that would have been cost-prohibitive before.
Okay, but let's play Devil's Advocate for a second.
We've seen lots of impressive AI demos that don't really translate to practical applications.
What makes this different?
Fair question, but think about the applications beyond just creative tools.

(15:58):
Lee maps out a timeline.
Creative tools like marble are emerging right now.
Robotics is the midterm horizon, and spatial intelligence is absolutely critical there.
You can't have robots functioning in real-world environments without understanding 3-dimensional space and physics.
Oh, that's the missing link for why robots are still so clunky in unstructured environments.

(16:19):
Exactly.
Current robots work great in highly controlled factory settings,
but put them in a normal home where furniture moves, lighting changes, and unexpected things happen?
They struggle because they lack true spatial understanding.
World models could change that by generating massive amounts of simulation data for training.
Closing the gap between virtual practice and real-world performance.

(16:42):
This actually connects to something bigger about AI training, doesn't it?
Like language models have the entire internet to learn from.
But robots.
Right.
There's a fundamental data scarcity problem in robotics.
You can't just scrape millions of examples of robots doing tasks because they barely exist yet.
But if you have world models that can simulate realistic environments and interactions,

(17:04):
suddenly you can generate all the training data you need.
You can test thousands of scenarios and simulation before ever touching a physical robot.
So what does the longer-term future look like?
Because Lee talks about scientific applications too, right?
She does, and this is where the vision gets really ambitious.
In scientific research, spatial intelligence could simulate experiments that are too dangerous,

(17:28):
too expensive, or literally impossible to run in reality.
Modeling climate systems, designing new materials at the molecular level,
exploring environments humans can't physically access, deep oceans, distant planets,
and healthcare.
I feel like there's huge potential there.
Massive potential.
Drug discovery is a big one.
Modeling how molecules interact in three dimensions to find new treatments faster.

(17:52):
But she also talks about ambient monitoring systems in hospitals and elder care facilities.
Her students at Stanford have worked extensively with healthcare settings,
and she's convinced that spatial intelligence can help caregivers and patients
without replacing the human connection that healing requires.
That's an interesting distinction.
Augmenting rather than replacing.

(18:15):
It's central to her entire philosophy.
She's very explicit that extreme narratives about AI, either utopia or apocalypse, miss the point.
AI is developed by people, used by people, governed by people.
It should extend our capabilities, make us more creative, more productive, more fulfilled.
That's what drives her work.
This is fascinating, but I want to bring it back to something concrete for our listeners.

(18:41):
What's the practical takeaway here?
Here's what strikes me.
We're at this inflection point where the next generation of AI won't just be about processing language
or recognizing patterns and data.
It'll be about understanding and interacting with the physical world
in ways that actually mirror how humans think.
When that happens, the applications touch everything.

(19:04):
How we learn, how we create, how we design products, how we conduct research,
how we receive medical care.
And for someone listening right now, maybe working in tech or just following AI developments,
what should they be watching for?
Watch what happens with world models over the next few years.
World Labs is making marble available to the public soon.

(19:25):
And that's just their first step.
But beyond any single company, this represents a fundamental shift in what AI can do.
We've had the language revolution chat GPT prove that out.
The next revolution is spatial.
And when it arrives, it's going to change which problems AI can actually solve.
Because right now, AI can talk about reality, but it can't actually understand or interact with reality.

(19:48):
Precisely.
It's the difference between reading about swimming and actually getting in the water.
And once AI can get in the water, so to speak, everything changes.
So next time you effortlessly pour that morning coffee without looking or parallel park in a tight spot,
or catch something someone tosses you,
remember that you're using a form of intelligence that took nature half a billion years to evolve.

(20:13):
And that we're just now figuring out how to build into machines.
That's spatial intelligence.
And it might just be the most important capability we haven't given AI yet.
And honestly, I can't wait to see what people build with it.
Yakov, thanks for breaking this down.
This has been Innovation Pulse, and we'll catch you next time with more ideas reshaping our world.

(20:37):
That's the end of today's podcast where we explored OpenAI's new advancements in GPT 5.1
and group chats for enhanced communication.
And Google's focus on integrating AI with privacy in mind,
alongside a discussion on spatial intelligence and its potential to transform AI interaction with the physical world.

(21:01):
Don't forget to like, subscribe, and share this episode with your friends and colleagues
so they can also stay updated on the latest news and gain powerful insights.
Stay tuned for more updates.
Advertise With Us

Popular Podcasts

Stuff You Should Know
Las Culturistas with Matt Rogers and Bowen Yang

Las Culturistas with Matt Rogers and Bowen Yang

Ding dong! Join your culture consultants, Matt Rogers and Bowen Yang, on an unforgettable journey into the beating heart of CULTURE. Alongside sizzling special guests, they GET INTO the hottest pop-culture moments of the day and the formative cultural experiences that turned them into Culturistas. Produced by the Big Money Players Network and iHeartRadio.

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.