All Episodes

May 19, 2025 13 mins
In this episode, we start by introducing Codex by OpenAI, exploring its launch, integration, and practical applications. We discuss the appointment of Fidji Simo as the new CEO of Applications at OpenAI, highlighting her potential impact. The episode delves into the launch of the Safety Evaluations Hub, focusing on addressing AI challenges. We also cover the UAE-US AI Campus collaboration, examining its vision for future advancements. The expansion of OpenAI's Stargate project is discussed, emphasizing its significance. We conclude with closing remarks and a sign-off, encapsulating the episode's main themes.
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Could AI soon handle your coding tasks aseasily as ordering a pizza!?

(00:06):
Welcome to The OpenAI Daily Brief, your go-tofor the latest AI updates.
Today is Monday, May 19, 2025.
Here’s what you need to know about OpenAI'slatest game-changer, Codex, a powerful new
software engineering agent.
Let’s dive in.

(00:26):
OpenAI has just launched Codex, a researchpreview of a cloud-based software engineering
agent that promises to automate some of themost common development tasks.
We're talking about writing code, debugging,testing, and even generating pull requests.
Imagine having an AI assistant that can handlethese tasks, freeing up developers to focus on

(00:49):
more creative and complex challenges.
Integrated into ChatGPT for Pro, Team, andEnterprise users, Codex operates in a secure
sandbox environment that's already set up withthe user's codebase.
It's like having a virtual developer who canwork within your existing setup without any
hiccups.
Codex is powered by OpenAI's o3 model,optimized for programming tasks, and trained

(01:15):
using reinforcement learning on real-worldexamples.
This means it can generate code that alignswith human conventions, ensuring that the
output is not only functional but also familiarto developers.
What's really intriguing is how Codexiteratively runs code and tests until it
reaches a correct solution.

(01:35):
Once a task is completed, it commits thechanges within the sandbox and provides test
outputs and terminal logs for completetransparency.
This level of detail is crucial for developerswho want to ensure quality and reliability in
their software.
Now, you might be wondering about thepracticalities.
Codex supports AGENTS.md files, which arerepository-level instructions that guide the

(02:00):
agent through project-specific practices andtesting procedures.
Plus, there's the Codex CLI, a command-linecompanion interface that's open source and uses
API credits.
Fouad Matin, a member of the technical staff atOpenAI, has clarified that Codex access within
ChatGPT is included with Pro, Team, andEnterprise subscriptions.

(02:24):
While it doesn't yet support full applicationtesting with live user interfaces, it's a
promising start.
Some users have raised questions about howCodex handles web development tasks, given the
complexity of separate layers, environmentvariables, and user interface interfaces.
Currently, Codex runs in an isolated containerwithout internet access or user interface

(02:47):
execution capabilities.
It can handle test suites, linters, and typecheckers, but final verification and
integration remain the responsibility of humandevelopers.
In addition to Codex, OpenAI has introducedCodex mini, a lighter model designed for faster
interactions and lower latency.

(03:08):
It's now the default engine in Codex CLI and isavailable via API.
This move reflects OpenAI’s broader strategy toeventually support both real-time AI coding
assistants and asynchronous agent workflows.
While Codex currently connects with GitHub andis accessible from ChatGPT, OpenAI envisions

(03:29):
deeper integrations in the future.
This includes support for assigning tasks fromCodex CLI, ChatGPT Desktop, and tools like
issue trackers or continuous integrationsystems.
It's clear that OpenAI is paving the way for anew era of AI-assisted software engineering.
OpenAI is making waves by naming Fidji Simo,former vice president at Facebook and current

(03:54):
chief executive officer of Instacart, as itsnew chief executive officer of Applications.
This is a significant move as Simo will bejoining OpenAI full-time later this year, after
serving on its board for the past year.
Reporting directly to OpenAI's chief executiveofficer, Sam Altman, Simo's new role will focus

(04:14):
on scaling OpenAI’s "traditional" companyfunctions as it gears up for its next phase of
growth.
Simo brings a wealth of experience from hertenure at Instacart and her time at Facebook,
where she held senior positions such as head ofthe Facebook app and vice president of video,
games, and monetization.
Her appointment is a strategic step for OpenAIas it continues to expand its influence and

(04:39):
capabilities in the artificial intelligencelandscape.
In a statement, Simo expressed her excitementabout joining OpenAI at such a pivotal moment.
"Joining OpenAI at this critical moment is anincredible privilege and responsibility," she
said.
"This organization has the potential ofaccelerating human potential at a pace never

(05:00):
seen before, and I am deeply committed toshaping these applications toward the public
good."
Sam Altman, OpenAI's chief executive officer,emphasized the importance of Simo's leadership
in the company’s future.
He stated, "Fidji’s leadership makes me evenmore optimistic about our future as we continue
advancing toward becoming the superintelligencecompany we aim to be." Altman will continue to

(05:25):
oversee the core pillars of OpenAI, ensuringalignment across Research, Compute, and
Applications.
OpenAI is taking bold steps to address thegrowing concerns around artificial intelligence
safety with its new Safety Evaluations Hub.
This platform is all about transparency, givingus a window into how OpenAI's AI models are

(05:48):
assessed and secured.
As AI models become more capable and adaptable,the traditional methods of evaluation just do
not cut it anymore.
OpenAI acknowledges this shift, recognizingthat older methods often become outdated or
ineffective, a phenomenon they refer to assaturation.

(06:08):
So, they are regularly updating theirevaluation techniques to keep up with new
modalities and emerging risks.
The Safety Evaluations Hub is particularlyfocused on how AI models handle inappropriate
or dangerous prompts, like hate speech orillegal activities.
OpenAI employs an automated evaluation systemthey call an autograder to assess these

(06:30):
responses.
Most of their models are hitting it out of thepark, scoring close to perfect at 0.99 for
declining harmful prompts.
However, some models like GPT-4o-2024-08-16,GPT-4o-2024-05-13, and GPT-4-Turbo did not

(06:51):
quite make that cut.
But what’s fascinating is that the models areless consistent when it comes to more benign
queries.
The top performer here was OpenAI’s o3-mini,which scored 0.80, while others ranged between
0.65 and 0.79.
Another area the hub tackles is the resistanceto jailbreak attempts.

(07:14):
Jailbreaking is when users try to trick AI intoproducing restricted or unsafe content.
To test the models' resilience, OpenAI uses theStrongReject benchmark, targeting common
automated jailbreak techniques, alongsidehuman-generated jailbreak prompts.
The models showed a range of vulnerabilities,scoring between 0.23 and 0.85 against

(07:39):
StrongReject.
Interestingly, they fared much better againsthuman-generated attacks, scoring from 0.90 to
1.00.
This suggests that while the models aregenerally robust against manual exploits, they
still have some work to do with automated ones.
Then there’s the issue of hallucinations, whichis AI-speak for when models generate inaccurate

(08:04):
or nonsensical responses.

OpenAI tested this using two benchmarks: SimpleQA and PersonQA. (08:06):
undefined
The results showed accuracy scores ranging from0.09 to 0.59 for SimpleQA, with hallucination
rates from 0.41 to 0.86.
In the PersonQA evaluations, accuracy scoreswere between 0.17 and 0.70, and hallucination

(08:34):
rates ranged from 0.13 to 0.52.
These findings highlight the ongoing challengeAI faces in providing reliable, accurate
responses, particularly to straightforwardquestions.
The hub also looks at how models balanceconflicting instructions, such as those from

(08:54):
system, developer, and user-generated messages.
Here, the scores showed variability, withsystem-versus-user instruction conflicts
scoring between 0.50 and 0.85,developer-versus-user conflicts from 0.15 to
0.77, and system-versus-developer conflictsranging from 0.55 to 0.93.

(09:20):
This suggests a general respect for establishedhierarchies, particularly system instructions,
but there are still inconsistencies in handlingdeveloper instructions relative to user
directives.
What is really exciting about the SafetyEvaluations Hub is how it is driving
improvements in AI safety.
The insights gained from this initiative aredirectly shaping how OpenAI refines its current

(09:45):
models and plans future developments.
It is a step toward more accountable andtransparent AI advancements, highlighting
weaknesses and charting a path for improvement.
For users, it means a unique opportunity tounderstand the safety protocols behind the AI
technologies they interact with every day,giving them more confidence in these powerful

(10:06):
tools.
Nvidia, Cisco, Oracle, and OpenAI are teamingup to take on a massive project in the Middle
East.
This collaboration is part of the newlyannounced UAE-US AI Campus, a groundbreaking
initiative that was unveiled by United StatesPresident Donald Trump and his Emirati
counterpart.

(10:27):
The campus is set to be a powerhouse, spanningan impressive 5 gigawatts, with its first phase
kicking off at 1 gigawatt.
This is a major development in the AIinfrastructure world, and it’s catching
everyone's attention.

Picture this (10:42):
a sprawling 10 square mile plot in the United Arab Emirates, bustling with
cutting-edge technology and innovation.
That’s the vision behind the UAE-US AI Campus,where tech giants like Nvidia, Cisco, Oracle,
and OpenAI are setting up shop.
It’s not just about the space; it's about theambition to create a hub for AI development and

(11:05):
deployment, particularly with OpenAI's'Stargate' concept leading the charge.
Now, why does this matter?
Well, the UAE-US AI Campus is poised to becomea critical node in the global AI network,
serving as a regional megacampus under OpenAI’s'Stargate for Countries' initiative.

(11:26):
This means that the campus will not onlyenhance AI capabilities in the Middle East but
also position the region as a key player in theAI arena.
The involvement of major players like Nvidiaand Cisco ensures that the infrastructure will
be top-notch, leveraging Nvidia’s GPUs andCisco’s networking prowess.
In a bold move, OpenAI is expanding itsStargate project beyond the United States,

(11:51):
where it’s already a $500 billion endeavor.
By partnering with countries like the UAE,OpenAI is laying the groundwork for a network
of data centers that’ll support AI training andinference on a global scale.
This initiative is not just about buildinginfrastructure; it’s about fostering
international cooperation and innovation in AI.

(12:13):
The UAE-US AI Campus is more than just a techhub; it’s a symbol of collaboration between
nations and companies aiming to push theboundaries of what's possible in AI.
As OpenAI looks to extend its reach withStargate, the campus is set to be a beacon of
technological advancement, driving bothregional and global AI development forward.

(12:36):
That’s it for today’s OpenAI Daily Brief.
The UAE-US AI Campus is setting the stage forunprecedented AI innovation, with giants like
Nvidia, Cisco, Oracle, and OpenAI at the helm.
Thanks for tuning in—subscribe to stay updated.
This is Bob, signing off.
Until next time.
Advertise With Us

Popular Podcasts

Stuff You Should Know
The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Special Summer Offer: Exclusively on Apple Podcasts, try our Dateline Premium subscription completely free for one month! With Dateline Premium, you get every episode ad-free plus exclusive bonus content.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.