Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Hi everyone.
(00:00):
Andy here and welcome back to the AI breakdown.
Yesterday was open AI's Dev Day 2025, and Sam Altman and the team dropped some big announcements and whilst there was no big model upgrades, the sheer volume of what they shipped is still kind of staggering.
Before we dive into the details, let me give you a sense of scale here.
OpenAI now has 4 million developers building on their platform.
(00:24):
That number has doubled in the past two years.
They've got 800 million weekly chat GPT users.
And their API is processing 6 billion tokens per minute.
Just to put that in perspective, two years ago, that number was 300 million tokens per minute.
That's a 20 fold increase in API usage.
So yeah, OpenAI isn't just growing.
They're operating at a completely different scale to almost anyone else in this space.
(00:49):
Today, I'm going to walk you through the key announcements from dev Day.
Share my take on what actually matters.
And help you figure out which of these tools might be worth your attention.
So let's dive in.
17
00:01:09,321.424525857 --> 00:01:13,11.424525857
First up is something, OpenAI is calling apps within chat, GPT.
18
00:01:13,701.424525857 --> 00:01:20,61.424525857
This is probably the most consumer facing announcement from the event, though it's got real implications for developers too.
19
00:01:20,751.424525857 --> 00:01:22,491.424525857
The idea is simple but powerful.
20
00:01:22,851.424525857 --> 00:01:28,671.424525857
You can now access third party services directly inside chat, GPT, without ever leaving the interface.
21
00:01:29,31.424525857 --> 00:01:38,481.424525857
Think Spotify, Zillow, Canva, Expedia instead of chat GPT, just giving you information or suggestions, it can now actually do things on these platforms.
22
00:01:38,901.424525857 --> 00:01:40,386.42452586
So let's say you want to create a playlist.
23
00:01:41,196.42452586 --> 00:01:52,596.42452586
Chat, GPT is pretty decent at suggesting songs or putting together themed playlists, but now with the Spotify app integration, it can go ahead and create that playlist in your actual Spotify account.
24
00:01:52,956.42452586 --> 00:01:55,926.42452586
You don't need to copy and paste anything or switch apps.
25
00:01:56,436.42452586 --> 00:01:58,176.42452586
It's the same thing with property search.
26
00:01:58,596.42452586 --> 00:02:06,726.42452586
Ask chat GPT about homes in a specific area and they can pull live data from Zillow complete with prices, photos, and neighborhood details.
27
00:02:07,386.42452586 --> 00:02:13,776.42452586
Or if you need to knock together a quick graphic for social media, you can work with Canva right there in the chat interface.
28
00:02:14,406.42452586 --> 00:02:15,996.42452586
Now, here's where it gets interesting.
29
00:02:15,996.42452586 --> 00:02:23,526.42452586
From a developer perspective, OpenAI has released what they're calling the apps, SDK, which is built on something called MCP.
30
00:02:24,66.42452586 --> 00:02:31,26.42452586
This is basically an open standard that lets developers build these integrations and eventually publish them for everyone to use.
31
00:02:31,476.42452586 --> 00:02:36,366.42452586
What I find clever about this approach is that OpenAI is positioning chat, GBT.
32
00:02:36,711.42452586 --> 00:02:44,421.42452586
Not just as a chatbot, but as a platform, and they want to be the place where you do everything, not just the place where you ask questions.
33
00:02:44,751.42452586 --> 00:02:47,571.42452586
This is smart positioning, but it's also risky.
34
00:02:48,411.42452586 --> 00:02:55,431.42452586
We've seen this playbook before with platforms like Facebook and others who tried to keep users inside their walled gardens.
35
00:02:55,431.42452586 --> 00:02:58,731.42452586
It works brilliantly when the platform is growing and dominant.
36
00:02:59,46.42452586 --> 00:03:05,46.42452586
But users can get frustrated if integrations feel clunky or limited compared to just using the native apps.
37
00:03:05,466.42452586 --> 00:03:12,726.42452586
The real test will be how seamless and useful these integrations actually are, and whether developers embrace the SD km.
38
00:03:13,236.42452586 --> 00:03:15,786.42452586
If the experience is smooth and the adoption is strong.
39
00:03:16,116.42452586 --> 00:03:19,296.42452586
This could genuinely change how people interact with Chat GPT.
40
00:03:19,776.42452586 --> 00:03:25,686.42452586
But if it feels like a half-baked feature, people will just keep switching between apps like they always have.
41
00:03:27,167.61136774 --> 00:03:29,687.61136774
Now, this is the one that got developers really excited.
42
00:03:30,107.61136774 --> 00:03:38,327.61136774
OpenAI announced Agent Kit, which is essentially a complete platform for building AI agents without needing to write a single line of code.
43
00:03:38,867.61136774 --> 00:03:44,567.61136774
Think of it like N eight N or zam, but specifically designed for orchestrating AI agents.
44
00:03:44,957.61136774 --> 00:03:52,337.61136774
You get a visual interface where you can drag and drop different components, set up conditional logic, and build out entire agent workflows.
45
00:03:52,862.61136774 --> 00:03:58,562.61136774
Sam Altman positioned this as a way to take agents from prototype to production with way less friction.
46
00:03:58,862.61136774 --> 00:04:10,22.61136774
And honestly, that's exactly what the market needs right now because while everyone's been talking about AI agents for the past year or so, actually building reliable production ready agents has been quite difficult.
47
00:04:10,472.61136774 --> 00:04:12,422.61136774
Agent Kit includes a few key pieces.
48
00:04:12,752.61136774 --> 00:04:15,967.61136774
There's Agent Builder, which is basically a visual workflow creator.
49
00:04:16,652.61136774 --> 00:04:24,92.61136774
You can map out your agent's logic, define different paths based on conditions and tie in various tools and capabilities.
50
00:04:24,632.61136774 --> 00:04:30,902.61136774
Then there's chat kit, which gives you an embeddable chat interface that you can customize and drop into your own applications.
51
00:04:31,292.61136774 --> 00:04:37,802.61136774
They've also included what they call guardrails, which is safety screening tools for both inputs and outputs.
52
00:04:38,132.61136774 --> 00:04:45,572.61136774
This is crucial because one of the biggest challenges with the agents is making sure they don't go off the rails and do something unexpected or harmful.
53
00:04:46,172.61136774 --> 00:04:49,232.61136774
And finally, there's an evaluation system built right in.
54
00:04:49,622.61136774 --> 00:04:57,32.61136774
You can create data sets, trace what your agent is doing, and even have the system automatically optimize your prompts based on performance.
55
00:04:57,452.61136774 --> 00:04:59,702.61136774
Now here's where I think this gets really interesting.
56
00:05:00,212.61136774 --> 00:05:04,112.61136774
OpenAI is clearly trying to own the entire agent development life cycle.
57
00:05:04,472.61136774 --> 00:05:12,392.61136774
They want you to prototype in agent kit, refine an agent kit, deploy through their infrastructure and monitor everything using their tools.
58
00:05:12,902.61136774 --> 00:05:15,662.61136774
From a business perspective, this makes total sense.
59
00:05:16,22.61136774 --> 00:05:21,122.61136774
The more they can embed themselves into your development workflow, the stickier their platform becomes.
60
00:05:21,482.61136774 --> 00:05:25,172.61136774
But from a developer perspective, you've gotta think carefully about lock-in.
61
00:05:25,562.61136774 --> 00:05:32,457.61136774
If you build your entire agent infrastructure on open AI's tools, switching to another platform later becomes significantly harder.
62
00:05:33,392.61136774 --> 00:05:38,222.61136774
Said, I think for most teams the productivity gains will outweigh the lock-in concerns.
63
00:05:38,612.61136774 --> 00:05:40,682.61136774
Building agents from scratch can be hard.
64
00:05:41,42.61136774 --> 00:05:46,922.61136774
If Agent Kit can genuinely simplify that process, it's probably worth the trade-off for most use cases.
65
00:05:47,552.61136774 --> 00:05:53,852.61136774
I'm planning to spend some time with the agent kit, so I'll share more detailed thoughts once I've actually built something with it.
66
00:05:55,418.01034689 --> 00:06:01,543.01034689
Now we get to SOA two open AI's video generation model, which is now available through the API.
67
00:06:02,333.01034689 --> 00:06:07,43.01034689
This is huge for anyone building applications that need to generate video content.
68
00:06:07,523.01034689 --> 00:06:14,3.01034689
SOAR two can create realistic video clips from text prompts or images, and it includes synchronized audio.
69
00:06:14,363.01034689 --> 00:06:20,573.01034689
We're talking richly detailed dynamic scenes with a proper understanding of 3D space and motion.
70
00:06:20,993.01034689 --> 00:06:25,313.01034689
The model can generate clips that maintain scene continuity and realistic physics.
71
00:06:25,643.01034689 --> 00:06:30,173.01034689
From a technical perspective, accessing this through the API is straightforward.
72
00:06:30,683.01034689 --> 00:06:34,283.01034689
You pick your model, there's SOA two and SOA two Pro.
73
00:06:34,673.01034689 --> 00:06:41,3.01034689
You pass in your prompt and then you pull for the status of the generation because obviously generating video takes time.
74
00:06:41,423.01034689 --> 00:06:51,668.01034689
The obvious use cases are marketing content, social media, product demos, educational videos, anywhere you need quick video content without the time and cost of traditional video production.
75
00:06:52,433.01034689 --> 00:06:54,23.01034689
But here's what I'm curious about.
76
00:06:54,383.01034689 --> 00:07:02,483.01034689
How good is it really? Because we've seen impressive demos before that don't quite live up to expectations when you start using them at scale.
77
00:07:02,843.01034689 --> 00:07:19,823.01034689
Can it maintain consistency across multiple generated clips? How well does it handle specific brand guidelines or visual styles? These are the questions I want answers to, and I suspect we'll start seeing real world reviews and case studies over the coming weeks as developers get their hands on it.
78
00:07:21,320.5944032 --> 00:07:26,960.5944032
Although there weren't any big model upgrades announced, GPT five PRO is now available through the API.
79
00:07:27,380.5944032 --> 00:07:35,600.5944032
This is open AI's most powerful model, and they're positioning it as the absolute top tier for when you need maximum intelligence and capability.
80
00:07:36,230.5944032 --> 00:07:37,820.5944032
The context window is massive.
81
00:07:38,300.5944032 --> 00:07:45,945.5944032
You can pass in up to 400,000 tokens of input and they can generate up to 272,000 tokens of output.
82
00:07:46,790.5944032 --> 00:07:51,140.5944032
For context, that's enough to handle entire code bases or very long documents.
83
00:07:51,500.5944032 --> 00:07:55,700.5944032
But let's talk about the pricing because this is where it gets eye watering.
84
00:07:56,90.5944032 --> 00:08:02,60.5944032
It's $15 per million input tokens and $120 per million output tokens.
85
00:08:02,390.5944032 --> 00:08:04,550.5944032
That's expensive relative to other models.
86
00:08:04,820.5944032 --> 00:08:10,520.5944032
And to put that in perspective, regular GPT five costs $1 25 per million input tokens.
87
00:08:11,60.5944032 --> 00:08:13,700.5944032
And $10 per million output tokens.
88
00:08:14,150.5944032 --> 00:08:18,260.5944032
So GPT five PRO is roughly 12 times more expensive.
89
00:08:18,650.5944032 --> 00:08:23,510.5944032
This pricing positions GPT five PRO as a model you use very selectively.
90
00:08:23,990.5944032 --> 00:08:27,560.5944032
This isn't for general chatbot applications or high volume processing.
91
00:08:28,10.5944032 --> 00:08:36,680.5944032
This is for when you absolutely need the best possible performance on complex reasoning tasks, where the cost is justified by the value of getting the right answer.
92
00:08:37,445.5944032 --> 00:08:47,825.5944032
Think high stakes decision support, complex code generation, detailed research analysis, use cases where you're willing to pay a premium because the alternative is much more expensive.
93
00:08:47,825.5944032 --> 00:08:50,675.5944032
Human time or the cost of getting it wrong is high.
94
00:08:51,5.5944032 --> 00:08:58,715.5944032
For most applications, regular GPT five or even GPT five mini will probably be more than sufficient.
95
00:08:58,920.5944032 --> 00:09:05,525.5944032
But having GPT five PRO available is valuable for those edge cases where you need maximum capability.
96
00:09:06,972.43039779 --> 00:09:12,612.43039779
OpenAI also announced GPT Realtime Mini, which is a cheaper version of their realtime voice, API.
97
00:09:13,212.43039779 --> 00:09:20,892.43039779
It's 70% cheaper than the standard realtime API, which makes it significantly more accessible for building voice applications.
98
00:09:21,282.43039779 --> 00:09:34,482.43039779
If you are building anything with voice interfaces, conversational ai, phone systems, or interactive voice experiences, this is worth paying attention to the real time API, lets you capture all those natural intonations.
99
00:09:34,572.43039779 --> 00:09:35,652.43039779
Understand tone.
100
00:09:36,42.43039779 --> 00:09:37,512.43039779
Handle interruptions properly.
101
00:09:37,842.43039779 --> 00:09:42,222.43039779
All the things that make voice interactions feel natural rather than robotic.
102
00:09:42,672.43039779 --> 00:09:49,542.43039779
The fact that they have made it 70% cheaper means this technology is now viable for a much wider range of applications.
103
00:09:50,112.43039779 --> 00:09:54,342.43039779
Previously, the cost might have been prohibitive for anything except high value use cases.
104
00:09:54,702.43039779 --> 00:10:01,692.43039779
Now you could realistically build voice interfaces for customer support, productivity tools, or accessibility features.
105
00:10:02,112.43039779 --> 00:10:03,912.43039779
They also announced Doy one mini.
106
00:10:04,257.43039779 --> 00:10:07,467.43039779
Which is a more affordable version of their image generation model.
107
00:10:08,7.43039779 --> 00:10:09,207.43039779
Again, same pattern.
108
00:10:09,507.43039779 --> 00:10:14,397.43039779
Take the existing technology, make it cheaper, open it up to more use cases.
109
00:10:14,637.43039779 --> 00:10:17,97.43039779
I think this is a smart strategy from OpenAI.
110
00:10:17,397.43039779 --> 00:10:25,47.43039779
The top tier models are where they showcase capability and push the boundaries, but the mini versions are where they drive volume and adoption.
111
00:10:25,647.43039779 --> 00:10:33,327.43039779
And ultimately getting millions of developers building on your platform is more valuable than having a handful of customers paying premium prices.
112
00:10:34,890.33676544 --> 00:10:39,900.33676544
Finally, OpenAI announced a bunch of updates to Codex, which is their AI coding assistant.
113
00:10:40,350.33676544 --> 00:10:46,410.33676544
You can now access Codex through Slack, which is a nice quality of life improvement for teams that live in Slack.
114
00:10:47,70.33676544 --> 00:10:52,770.33676544
They've also released a Codex SDK, which lets you build custom agents on top of Codex.
115
00:10:53,220.33676544 --> 00:11:00,450.33676544
The example they gave was building your own version of something like Lovable or Bolt, those Vibe, coding app builder tools.
116
00:11:00,885.33676544 --> 00:11:16,545.33676544
Basically, if you want to create a custom coding agent that's tailored to your specific workflow or tech stack, you can now do that with the Codex SDK, and they've added usage analytics so you can actually see how your team is using Codex and where it's providing the most value.
117
00:11:17,55.33676544 --> 00:11:24,135.33676544
This is one of those features that seems small, but is actually really important for teams trying to measure our OI on AI tools.
118
00:11:26,174.63581751 --> 00:11:30,224.63581751
So that's the rundown of open AI's dev day announcements, apps and chat.
119
00:11:30,224.63581751 --> 00:11:32,924.63581751
GPT Agent Kit for building agents.
120
00:11:33,254.63581751 --> 00:11:41,654.63581751
So two API access GPT five PRO for maximum capability, cheaper voice and image models and Codex improvements.
121
00:11:42,104.63581751 --> 00:11:45,104.63581751
Looking at all of this together, a few themes emerge.
122
00:11:45,524.63581751 --> 00:11:49,634.63581751
First, OpenAI is clearly trying to build a complete developer ecosystem.
123
00:11:50,24.63581751 --> 00:11:51,704.63581751
They don't just want to provide models.
124
00:11:51,914.63581751 --> 00:11:57,314.63581751
They want to provide the entire stack for building, deploying, and monetizing AI applications.
125
00:11:57,884.63581751 --> 00:12:02,714.63581751
Second, they're pursuing a strategy of offering different tiers at different price points.
126
00:12:03,104.63581751 --> 00:12:05,849.63581751
You've got your premium models for when you need maximum capability.
127
00:12:06,554.63581751 --> 00:12:09,344.63581751
You've got your mini models for when you need to keep costs down.
128
00:12:09,734.63581751 --> 00:12:14,384.63581751
This gives developers flexibility to choose the right tool for each specific use case.
129
00:12:14,744.63581751 --> 00:12:20,384.63581751
Third, they're making big bets on agents and on keeping users inside chat, GPT.
130
00:12:20,714.63581751 --> 00:12:29,894.63581751
They clearly believe the future is about AI systems that can actually do things, not just chat, and they want chat GBT to be the interface where all of that happens.
131
00:12:30,404.63581751 --> 00:12:34,694.63581751
Now, is all of this going to work? I think some of it will land better than others.
132
00:12:35,84.63581751 --> 00:12:38,894.63581751
Agent kit and the model improvements feel like solid, valuable additions.
133
00:12:39,284.63581751 --> 00:12:45,524.63581751
Apps within chat GPT could be transformative if the integrations are good, but it could also feel gimmicky if they're not.
134
00:12:45,974.63581751 --> 00:12:58,64.63581751
And the pricing on GPT five PRO is gonna limit its adoption to very specific use cases, which is probably intentional, but it also means most developers won't be able to justify using it.
135
00:12:58,724.63581751 --> 00:13:03,944.63581751
The bigger question is whether OpenAI can maintain its lead because while they're shipping all of this.
136
00:13:04,334.63581751 --> 00:13:06,224.63581751
Anthropic is pushing Claude forward.
137
00:13:06,494.63581751 --> 00:13:11,204.63581751
Google has Gemini and there are increasingly capable open source alternatives.
138
00:13:11,654.63581751 --> 00:13:17,264.63581751
The AI market is moving incredibly fast and what works today might be obsolete in six months.
139
00:13:17,714.63581751 --> 00:13:25,754.63581751
So open AI needs to keep shipping, keep improving, and keep giving developers reasons to build on their platform rather than someone else's.
140
00:13:26,114.63581751 --> 00:13:26,929.63581751
That's the breakdown.
141
00:13:27,229.63581751 --> 00:13:29,609.63581751
Thanks for listening and catch you next time.