Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Ejaaz:
All right josh the ai nerds are (00:03):
undefined
Ejaaz:
fighting again this past weekend there was (00:06):
undefined
Ejaaz:
a very prestigious competition called the international math olympiad which (00:09):
undefined
Ejaaz:
hosts some of the brightest smartest mathematicians of our time and they're (00:14):
undefined
Ejaaz:
typically high schoolers and basically they come together and they take a really (00:18):
undefined
Ejaaz:
hard math test this is like four to five hours and those that score the highest, get medals. (00:22):
undefined
Ejaaz:
You can get bronze, silver, and the highest scorers get gold medals. (00:27):
undefined
Ejaaz:
So what's this going to do with AI? (00:31):
undefined
Ejaaz:
Well, recently, over the last couple of years, the organizers of this International (00:33):
undefined
Ejaaz:
Math Olympiad decided to start inviting AI models to participate as contestants. (00:37):
undefined
Ejaaz:
And they did terribly. Like, no one's come even near the human geniuses. (00:43):
undefined
Ejaaz:
Except this year, Josh, where they came to play and not one, (00:50):
undefined
Ejaaz:
but two AI models achieved not silver, but gold medals, which is just an insane thing, right? (00:53):
undefined
Ejaaz:
So it should be all fun and games, right? What a fairytale story. (01:01):
undefined
Ejaaz:
Well, unfortunately, OpenAI and Google got into an online spat where they started (01:05):
undefined
Ejaaz:
accusing each other of cheating. (01:11):
undefined
Ejaaz:
Now, remember, these are trillion dollar companies. So essentially, (01:13):
undefined
Ejaaz:
Josh, I was teleported this weekend back to my high school days where I felt (01:17):
undefined
Ejaaz:
like the teacher had to come in, separate the kids from arguing over some kind (01:20):
undefined
Ejaaz:
of random homework problem and get them to chill out. (01:25):
undefined
Josh:
We will look back at this episode and laugh at it like it's a joke because these (01:28):
undefined
Josh:
AIs, they're competing against high schoolers. That's so lame. (01:31):
undefined
Josh:
Only high schoolers? Like, come on, and you're just barely getting gold. (01:35):
undefined
Ejaaz:
Well, in their defense, Josh, these are some pretty smart high schoolers, (01:38):
undefined
Ejaaz:
man. Like I was looking at some of these math problems. (01:42):
undefined
Ejaaz:
I don't know if you can see my screen here. I'm sharing the official site. (01:44):
undefined
Ejaaz:
And if you look at some of these problems, here we go. (01:48):
undefined
Ejaaz:
And then like, okay, so they have basically, they host this competition in a (01:53):
undefined
Ejaaz:
different country each year. (01:56):
undefined
Ejaaz:
And you can kind of like download the test yourselves after the fact to see (01:57):
undefined
Ejaaz:
how well you could do it. I had a look at this one, Josh from the Afrikaans. (02:01):
undefined
Ejaaz:
I basically don't understand anything. One second. All right, (02:07):
undefined
Ejaaz:
take a look at that. Take a look at this. (02:10):
undefined
Josh:
That looks like quite a bit of squiggly lines on a page. (02:13):
undefined
Ejaaz:
You know what? That could be mistaken for a piece of art in a gallery if you (02:17):
undefined
Ejaaz:
didn't peer too closely at it. This looks insane. (02:22):
undefined
Josh:
Okay, so I take it back. So the high schoolers are probably pretty smart then. (02:25):
undefined
Josh:
And I guess the AI performing as well as the high schoolers is probably a pretty big deal, right? (02:28):
undefined
Josh:
Because that looks like very complicated math problems that I'm assuming most (02:33):
undefined
Josh:
of the smartest people in the world cannot solve. (02:37):
undefined
Ejaaz:
Exactly. Yeah. This is like something that is technically set for high schoolers (02:39):
undefined
Ejaaz:
and sometimes college kids, but is meant to demonstrate prowess in the field. (02:44):
undefined
Ejaaz:
So there's a lot of university academics, which obviously do math degrees and (02:50):
undefined
Ejaaz:
they do PhDs, but those are in very specific problems. So you kind of like in (02:54):
undefined
Ejaaz:
science, you just need to kind of pick and choose your lane and then dedicate your life to it. (02:58):
undefined
Ejaaz:
High schoolers is kind of college kids are kind of like the last point before (03:03):
undefined
Ejaaz:
you jump into your specialization. (03:07):
undefined
Ejaaz:
So really, if you're the best at generalized maths, you're going to compete in this competition. (03:09):
undefined
Ejaaz:
And what's so interesting is typically AI models haven't been able to perform (03:13):
undefined
Ejaaz:
very well because they needed a lot of context beforehand about the problem, Josh. (03:17):
undefined
Ejaaz:
So they needed to know that, you know, there was certain, you know, (03:22):
undefined
Ejaaz:
X equals something and Y equals something. (03:27):
undefined
Ejaaz:
And they had to have defined parameters to kind of figure out the problem. (03:30):
undefined
Ejaaz:
But this was the first time that AI models basically were just given a blank (03:33):
undefined
Ejaaz:
sheet of paper or not a blank sheet of paper. (03:37):
undefined
Ejaaz:
But they stared at the problem just as we just looked at it just now and had (03:38):
undefined
Ejaaz:
to read the words, read the characters, interpret what that meant in the context (03:42):
undefined
Ejaaz:
of that situation and the way that the question was framed and then figure it out themselves. (03:47):
undefined
Ejaaz:
So it's as if the AI models had a camera that looked at a paper, (03:51):
undefined
Ejaaz:
similar way that we look at test papers as kids through our eyes and figure it out themselves. (03:55):
undefined
Josh:
So what changed? What happened in the last year that made it so much better? (04:01):
undefined
Josh:
Because it went from, what, basically zero of six to now six or five of six questions answered. (04:06):
undefined
Josh:
Now it's a gold medalist. So what happened? (04:11):
undefined
Ejaaz:
So listen, I'm not going to try and explain it, but maybe you and I can decipher (04:14):
undefined
Ejaaz:
it through the legends themselves that built these models, right? (04:18):
undefined
Ejaaz:
Okay, so let me paint the scene for you, Josh. (04:22):
undefined
Ejaaz:
It is Saturday evening. (04:24):
undefined
Ejaaz:
You know, normal people are usually out and about. They're having fun. (04:27):
undefined
Ejaaz:
They're probably having dinner, catching up with friends or chilling at home, watching a movie. (04:30):
undefined
Ejaaz:
And this guy called Alexander Wei, who is OpenAI's head of reasoning. (04:34):
undefined
Ejaaz:
Reasoning is basically this new fancy technique that AI models have typically (04:39):
undefined
Ejaaz:
demonstrated, which has brought them up to like the frontier level of AI models. (04:43):
undefined
Ejaaz:
Basically, if your model can do reasoning, it's typically a pretty smart model, right? (04:47):
undefined
Ejaaz:
And he posts this tweet saying, I'm excited to share that our latest OpenAI (04:52):
undefined
Ejaaz:
Experimental Reasoning LLM has achieved a longstanding grand challenge in AI, (04:57):
undefined
Ejaaz:
a gold medal level performance on the world's most prestigious math competition, (05:02):
undefined
Ejaaz:
the International Math Olympiad. (05:07):
undefined
Ejaaz:
And he goes on to describe, you know, how the model basically took on each problem (05:09):
undefined
Ejaaz:
in its own regard and solved it and how this is a massive success and win for (05:14):
undefined
Ejaaz:
AI models and how, most importantly. (05:19):
undefined
Ejaaz:
OpenAI was the first ever model to complete this. (05:21):
undefined
Ejaaz:
And not too long after he posts that tweet, Josh, Sam Altman jumps in here, right? (05:25):
undefined
Ejaaz:
And he goes, again, he kind of echoes similar thoughts. We achieved gold medal (05:31):
undefined
Ejaaz:
level performance on the 2025 IMO competition with general purpose reasoning. (05:34):
undefined
Ejaaz:
And then he kind of like shells GPT-5 at the end. Basically, (05:38):
undefined
Ejaaz:
it's like a promotive thing for OpenAI. (05:41):
undefined
Ejaaz:
And I will say that this is really cool because what they've achieved is something (05:44):
undefined
Ejaaz:
that hasn't been done before, right? So very impressive feat. (05:49):
undefined
Ejaaz:
And in terms of how this works specifically, Cheryl Su here gives a really good breakdown. (05:53):
undefined
Ejaaz:
She says, the model solves these problems without tools like coding or Lean, (05:58):
undefined
Ejaaz:
which is another coding tool. (06:04):
undefined
Ejaaz:
It just uses natural language. So as I said earlier, It kind of reads the paper (06:05):
undefined
Ejaaz:
and just kind of interprets what it thinks it means. (06:09):
undefined
Ejaaz:
And it also has the same amount of time to do the test as other kits, so 4.5 hours. (06:12):
undefined
Ejaaz:
And she says, we see the model reason at a very high level, trying out different (06:17):
undefined
Ejaaz:
strategies, making observations from examples, and testing different hypotheses out. (06:22):
undefined
Ejaaz:
And she says, it's crazy how we've gone from 12% on the AIME test, (06:27):
undefined
Ejaaz:
which is what GPT-4O, which is OpenAI's early model, got to IMO gold, (06:32):
undefined
Ejaaz:
International Math Olympiad gold medal in 15 months. (06:38):
undefined
Ejaaz:
So just to set that in context, Josh, that is a crazy leap in 15 months. (06:41):
undefined
Ejaaz:
Imagine going from eighth grade level math to the best. (06:45):
undefined
Ejaaz:
Mathematician in the world in 15 months. It's a pretty insane thing. (06:51):
undefined
Ejaaz:
Yeah, I'd say so. So essentially the breakthrough that Cheryl is highlighting (06:55):
undefined
Ejaaz:
here is number one, the model didn't need any context. (06:58):
undefined
Ejaaz:
Number two, it used really high level reasoning to figure out the problems from first principles. (07:03):
undefined
Ejaaz:
And number three, it was able to test out multiple hypotheses at the same time (07:08):
undefined
Ejaaz:
instead of trying to one shot the problem. (07:14):
undefined
Ejaaz:
Typically in the past when AI models have been given a prompt or a problem, (07:15):
undefined
Ejaaz:
it tries to just like give it its best shot and give you one solution, Josh. (07:19):
undefined
Ejaaz:
Whereas what these models, these reasoning models do really well is they are (07:23):
undefined
Ejaaz:
able to hypothetically entertain many different scenarios and then pick the (07:27):
undefined
Ejaaz:
best one of which it thought it was an answer. (07:30):
undefined
Ejaaz:
And it ended up with the gold medal, which is insane, right? (07:32):
undefined
Ejaaz:
But it wasn't entirely without a few glitches here and there, Josh. (07:34):
undefined
Ejaaz:
So if you look at this post from Jasper, he read through the entire kind of (07:38):
undefined
Ejaaz:
like problem set that OpenAI's model went through. and he points out that some weird anomalies. (07:42):
undefined
Ejaaz:
So he kind of like talks about like how it kind of like analyzed and a bunch of things. (07:49):
undefined
Ejaaz:
And he goes, however, the write-up is kind of messy. He goes, (07:52):
undefined
Ejaaz:
it overuses shorthand and sentence fragments. (07:55):
undefined
Ejaaz:
It introduces new terms without definitions, for example, forbidden and sunny partners. (07:58):
undefined
Ejaaz:
I have no idea what either of those terms could mean, but it was just apparently (08:04):
undefined
Ejaaz:
just interspersing these phrases during its analysis. (08:10):
undefined
Ejaaz:
And so as a reviewer, or as an examiner, they were reading this, (08:13):
undefined
Ejaaz:
they were like, sorry, wait, what is it talking about? (08:17):
undefined
Ejaaz:
It got to the right answer, but what is it talking about, right? (08:20):
undefined
Ejaaz:
The other key point from this post is it was unable to solve one problem, problem six. (08:23):
undefined
Ejaaz:
And I'm not even gonna try and get into why it failed on that problem, (08:29):
undefined
Ejaaz:
but it was just particularly hard for it to figure out. (08:33):
undefined
Ejaaz:
But it still scored a high enough percentage that it got a gold medal. (08:36):
undefined
Ejaaz:
So it's basically a win for OpenAI, but that's when the drama starts unfolding. (08:40):
undefined
Ejaaz:
So I've got this post up from Mikhail Samin, which kind of like sparks this entire fight, Josh. (08:44):
undefined
Ejaaz:
He goes, according to a friend, the IMO, which is the International Math Olympiad. (08:51):
undefined
Ejaaz:
Asked AI companies not to steal the spotlight from kids and to wait a week after (08:55):
undefined
Ejaaz:
the closing ceremony to announce the results. (09:01):
undefined
Ejaaz:
OpenAI instead announced the results before the closing ceremony. Yeah. (09:04):
undefined
Ejaaz:
And then he goes on to basically say how this is essentially like some kind (09:09):
undefined
Ejaaz:
of clout chasing move from OpenAI. (09:13):
undefined
Ejaaz:
And OK, I tried to evaluate this, Josh, from OpenAI's kind of perspective, (09:16):
undefined
Ejaaz:
which is they basically want to steal the limelight, (09:20):
undefined
Ejaaz:
but also say that they were the first AI model to ever achieve gold on this (09:23):
undefined
Ejaaz:
competition, which puts them in a good light and makes users want to choose (09:27):
undefined
Ejaaz:
OpenAI and solidify the branding that OpenAI is the best. right? (09:31):
undefined
Ejaaz:
But on the other side, you know, they're kind of like stealing the spotlight (09:35):
undefined
Ejaaz:
from the kids, as this post says. But that's not actually the main trope. (09:39):
undefined
Ejaaz:
The main trope here, Josh, is OpenAI wasn't the only model to achieve a goal, right? (09:44):
undefined
Ejaaz:
At the same time, during the same testing period, you had Google achieving the exact same score. (09:50):
undefined
Ejaaz:
So then the question becomes, okay, well, it was whoever was ethical about announcing their own result. (09:58):
undefined
Ejaaz:
This post from Demis Hassabis, which is Google's head of AI, (10:04):
undefined
Ejaaz:
basically posts, and I'll note two days later, Official results are in. (10:09):
undefined
Ejaaz:
Gemini, which is their flagship model, achieved gold medal level in the International Math Olympiad. (10:14):
undefined
Ejaaz:
An advanced version was able to solve five out of six problems. (10:20):
undefined
Ejaaz:
So same as OpenAI, same thing, struggled on the sixth problem. (10:23):
undefined
Ejaaz:
Incredible progress. Huge congrats to the team. (10:26):
undefined
Ejaaz:
And a tweet here says that Google (10:29):
undefined
Ejaaz:
basically had to wait for marketing to approve the tweet until Monday. (10:31):
undefined
Ejaaz:
But OpenAI shared theirs first at 1 a.m. (10:35):
undefined
Ejaaz:
On Saturday and stole the spotlight. (10:38):
undefined
Ejaaz:
And we see the screenshot from Demis Hassabis, which, you know, (10:40):
undefined
Ejaaz:
he further clarifies this, basically saying, (10:44):
undefined
Ejaaz:
by the way, as an aside, we didn't announce on Friday because we respected the (10:46):
undefined
Ejaaz:
IMO's board's original request that all AI labs share the results only after (10:50):
undefined
Ejaaz:
the official results have been verified. (10:55):
undefined
Ejaaz:
Now that we've been given permission to share, blah, blah, blah, (10:57):
undefined
Ejaaz:
he shares. So Demis is playing the like good Samaritan here. (10:59):
undefined
Ejaaz:
He's like, ah, you know, we also have the good model, but we, (11:02):
undefined
Ejaaz:
you know, we have some pride and some manners about how we deal with these things. (11:06):
undefined
Ejaaz:
That's where it starts to get a little uglier, Josh, because we have OpenAI (11:10):
undefined
Ejaaz:
chiming in to this tweet, which basically says, and this is some random commenting (11:15):
undefined
Ejaaz:
on OpenAI and this entire situation. (11:21):
undefined
Ejaaz:
So OpenAI basically has zero advantages except the size of the team, (11:24):
undefined
Ejaaz:
aka the OpenAI team was claimed to be smaller than Google Gemini's team. (11:30):
undefined
Ejaaz:
So what he's inferring here is there's no real difference between OpenAI's models (11:34):
undefined
Ejaaz:
and Google Gemini's models. You can pretty much use either or. (11:38):
undefined
Ejaaz:
OpenAI maybe has a smaller team to build that model, but who the hell cares? (11:42):
undefined
Ejaaz:
And then one of the AI model researchers at OpenAI basically comes in and says, (11:46):
undefined
Ejaaz:
well, I think it's also interesting that they they (11:52):
undefined
Ejaaz:
being google curated and provided useful context (11:55):
undefined
Ejaaz:
to the model which we did not feels like (11:59):
undefined
Ejaaz:
taking your tutor's cheat sheet with you into the exam so shots basically being (12:02):
undefined
Ejaaz:
fired from open ai saying hey um you cheated you gave context to your model (12:07):
undefined
Ejaaz:
and that was why it was able to achieve gold we open ai didn't provide any of (12:12):
undefined
Ejaaz:
that context and it was able to reason from first principles, there you have it. (12:17):
undefined
Ejaaz:
But then directly beneath it, Vinay Rameshes, who is a Google DeepMind AI researcher, responds, (12:21):
undefined
Ejaaz:
it's worth noting actually that a deep think system, which is Google's AI system (12:27):
undefined
Ejaaz:
with no access to this corpus, so no context, also got gold. (12:32):
undefined
Ejaaz:
Again, according to the official graders, and he puts this in brackets because (12:36):
undefined
Ejaaz:
OpenAI didn't wait for the official graders to mark their score, (12:40):
undefined
Ejaaz:
with exactly the same score. (12:44):
undefined
Ejaaz:
So basically, this is like a pissing contest between two of the top AI model providers. (12:45):
undefined
Ejaaz:
Here's my take, Josh. And then I really want to kind of lean into what you think (12:52):
undefined
Ejaaz:
about this whole debacle. (12:56):
undefined
Ejaaz:
Number one, this seems so childish to me. (12:57):
undefined
Ejaaz:
Like, eventually, AI models were eventually going to get smarter or smart enough (13:00):
undefined
Ejaaz:
to solve these mathematical problems. (13:05):
undefined
Ejaaz:
And I think you said this earlier on. (13:07):
undefined
Ejaaz:
This is something that they're going to probably laugh about 10 years from now, (13:10):
undefined
Ejaaz:
right? that they were able to solve whatever, the most complex mathematic problems (13:14):
undefined
Ejaaz:
for humans, mere humans. (13:17):
undefined
Ejaaz:
And now AI is off creating wonderful scientific discoveries for us that we would (13:19):
undefined
Ejaaz:
have never comprehended or figured out ourselves, right? (13:24):
undefined
Ejaaz:
So firstly, you're arguing over something that's so silly. (13:27):
undefined
Ejaaz:
But number two, this kind of seems desperate on the open AI side. (13:31):
undefined
Ejaaz:
And maybe I'm being biased, but I'm just going to give you my take. (13:36):
undefined
Ejaaz:
Open AI has kind of had a series of stumbles recently. (13:39):
undefined
Ejaaz:
They claimed that they were going to release gpt5 which (13:43):
undefined
Ejaaz:
is their brand new frontier model but they've delayed it many months (13:45):
undefined
Ejaaz:
now um they got outperformed by (13:49):
undefined
Ejaaz:
grok 4 from xai uh so now (13:52):
undefined
Ejaaz:
they have a new benchmark that they need to beat a new model that they basically (13:55):
undefined
Ejaaz:
need to outcompete uh they claimed that they were going to release a new open (13:58):
undefined
Ejaaz:
source model and then delayed it after a chinese open source model was released (14:02):
undefined
Ejaaz:
and had one trillion parameters and outperformed not just their model, (14:07):
undefined
Ejaaz:
but any other open source model out there. (14:11):
undefined
Ejaaz:
And so I feel like they're looking (14:13):
undefined
Ejaaz:
for a win, right? They released their agent this week or last week. (14:16):
undefined
Ejaaz:
And so, you know, that had mixed review, mixed feedback. (14:21):
undefined
Ejaaz:
So I feel like Sam is desperate for a win. (14:24):
undefined
Ejaaz:
People are criticizing consistently their moat, asking what has OpenAI got? (14:26):
undefined
Ejaaz:
They've lost a ton of researchers to Meta and other companies. (14:32):
undefined
Ejaaz:
I feel like their back's against the wall. (14:35):
undefined
Ejaaz:
Sam's scared and he basically needs to grab any kind of win. (14:37):
undefined
Ejaaz:
So it reeks of desperation. (14:41):
undefined
Ejaaz:
What's your take, Josh? (14:43):
undefined
Josh:
I do empathize with the team. They've been coming under fire from every single angle. (14:44):
undefined
Josh:
I mean, you have Zuck poaching all of their talent, and then all of the other (14:49):
undefined
Josh:
open-source AI models are beating them at their own game. (14:54):
undefined
Josh:
And they're just kind of, they're really getting beat up now. (14:57):
undefined
Josh:
And I think that they're looking to get some footing. I'm sure this probably plays a role in it. (15:00):
undefined
Josh:
But I'm sure behind the scenes, they're really trying to fight hard to put their (15:04):
undefined
Josh:
feet back on stable ground, to get GPT-5 out the door, to build Project Stargate (15:09):
undefined
Josh:
and make this big infrastructure network. (15:13):
undefined
Josh:
They need some wins. So sure, this was probably an attempt to get ahead, (15:14):
undefined
Josh:
make them look good, win over some more hearts and minds. (15:18):
undefined
Josh:
But I think the most interesting part of the whole story is less the drama and (15:21):
undefined
Josh:
more the fact that these models were able to accomplish a really impressive (15:25):
undefined
Josh:
feat over such a short period of time. (15:28):
undefined
Josh:
From what I understand, previously when they attempted to solve these problems, (15:30):
undefined
Josh:
they used a custom training data set. (15:35):
undefined
Josh:
They used custom tool sets. It was mostly a model trained on solving mathematical problems. (15:37):
undefined
Josh:
And with this version, both the OpenAI version and the Gemini models, (15:42):
undefined
Josh:
they were both general purpose models. (15:49):
undefined
Josh:
They were not trained specifically with the intention of solving mathematical problems. (15:51):
undefined
Josh:
These are the general models that people day to day are using. (15:55):
undefined
Josh:
They're just now able to solve these math problems using this new general intelligence. (15:58):
undefined
Josh:
So it's a really interesting breakthrough that I think we get from reinforcement (16:02):
undefined
Josh:
learning that now there is not so much of an advantage to training a model specific (16:05):
undefined
Josh:
to one's skill set when you could just make it great at everything. (16:11):
undefined
Josh:
There was one thing that I noticed that some people call it cheating, other people don't. (16:14):
undefined
Josh:
But so with the mathematical, with the actual test that high school was had (16:19):
undefined
Josh:
to take, they're not allowed to use tools and they have a limited amount of (16:23):
undefined
Josh:
time per question to answer. (16:26):
undefined
Josh:
The models that, the OpenAI model and the Gemini model, they had infinite amount (16:28):
undefined
Josh:
of time to answer and they were allowed to use tools. (16:32):
undefined
Josh:
So there still are small differences in these. (16:34):
undefined
Ejaaz:
Were they allowed to like use the internet? (16:37):
undefined
Josh:
I don't know the specifics. I would imagine at least calculators, (16:39):
undefined
Josh:
at most probably the full repertoire of what we have currently available to (16:42):
undefined
Josh:
us, which is full internet search, code writing abilities. They could do their (16:46):
undefined
Josh:
own mathematical checks. (16:49):
undefined
Josh:
So I would just assume the minimum amount of constraints possible. (16:51):
undefined
Josh:
So there was much less constraints on the models, But they did solve the questions. (16:54):
undefined
Josh:
And I think that's super impressive. They got five out of six right. (16:58):
undefined
Josh:
Which was gold and better than almost every student, if I'm not mistaken. (17:02):
undefined
Josh:
Only a few students got the six out of six completely correct. (17:06):
undefined
Josh:
It's just cool to see the rate of progress of these models getting better. (17:10):
undefined
Josh:
That over the course of the last 15 months or so, they went from horrible and (17:12):
undefined
Josh:
narrowly trained to incredible and generally trained. (17:17):
undefined
Josh:
And as long as that trend keeps going, I think the drama matters less than the (17:20):
undefined
Josh:
output, which is models are getting really good at solving really hard math problems. (17:24):
undefined
Josh:
And original ones too, that the world has never seen before. (17:28):
undefined
Ejaaz:
Yeah, well, that last point is actually the main takeaway that I had, (17:31):
undefined
Ejaaz:
Josh, which is it's original, never-before-seen problems. (17:35):
undefined
Ejaaz:
Typically, these AI models are trained on things that they've seen before, as you said, right? (17:39):
undefined
Ejaaz:
They're trained on data sets. So they've already seen the problem, (17:44):
undefined
Ejaaz:
and then they have to work out, they know the answer, and they have to work (17:46):
undefined
Ejaaz:
out how to get there, right? So they kind of have a leading factor. (17:49):
undefined
Ejaaz:
Here, it's just kind of like completely unknown. (17:52):
undefined
Ejaaz:
The other thing is, this is kind of like the culmination of a trend, (17:56):
undefined
Ejaaz:
Josh, which is these AI models are really good at doing kind of binary tasks. (18:01):
undefined
Ejaaz:
And I don't want to reduce mathematics to binary tasks, but technically it's (18:07):
undefined
Ejaaz:
numbers, sequential formulas, that kind of stuff, right? (18:12):
undefined
Ejaaz:
So if you can run enough compute at a thing, and if you can get that AI model (18:18):
undefined
Ejaaz:
to consider all different decision parts, It's going to eventually get to the answer, right? (18:23):
undefined
Ejaaz:
But it's always a specific answer at the end of that, right? (18:28):
undefined
Ejaaz:
Whereas when it comes to more subjective things, more human experiential things, (18:32):
undefined
Ejaaz:
AI has typically struggled to... (18:37):
undefined
Ejaaz:
Improve at the same rate that it has for like all these different scientific (18:40):
undefined
Ejaaz:
and math problems so i'm glad that we've reached this pinnacle feat i think (18:43):
undefined
Ejaaz:
ai models have are really good at one thing and not so great at other things (18:47):
undefined
Ejaaz:
and i'm excited to see how like they kind of like try to start leapfrogging (18:53):
undefined
Ejaaz:
each other over the next couple of years. (18:57):
undefined
Josh:
Yeah it's it's that directional progress that we like (18:59):
undefined
Josh:
math is clearly the first because you can write down (19:02):
undefined
Josh:
proofs and you could check your work and there is an actual verifiable solution (19:05):
undefined
Josh:
and i think that's why we're seeing a lot of the progress start early (19:08):
undefined
Josh:
in math and then hopefully go on to these other places but (19:12):
undefined
Josh:
what we are seeing is these first signs of (19:15):
undefined
Josh:
new knowledge breakthroughs where it's solving a (19:18):
undefined
Josh:
new and novel problem that hasn't been (19:21):
undefined
Josh:
released before based on its previous data set (19:24):
undefined
Josh:
so it's not just pattern matching like you mentioned earlier where it has (19:28):
undefined
Josh:
this data set of questions it's kind of finding the right examples and (19:31):
undefined
Josh:
then applying that logic to the question it's actually (19:34):
undefined
Josh:
reasoning and it's it's reasoning in many instances and (19:37):
undefined
Josh:
then it's comparing its work and it's it's coming to a conclusion (19:40):
undefined
Josh:
and we saw this with the grok heavy model last week too when (19:44):
undefined
Josh:
it released um where i think the the new (19:46):
undefined
Josh:
meta is many instances solving hard (19:49):
undefined
Josh:
problems and then comparing so you lower that error rate more (19:52):
undefined
Josh:
and more and more each time and what we're seeing is great progress so (19:55):
undefined
Josh:
i mean although open ai and google are fighting again they're both they're both (19:59):
undefined
Josh:
fighting over over exciting progress and sure maybe one tried to sweep in and (20:04):
undefined
Josh:
steal the valor but they both did an excellent job in actually completing these (20:08):
undefined
Josh:
problems and placing gold in a test that was previously not possible to do from an ai model you (20:13):
undefined
Ejaaz:
Know who the real winners are here out of this josh. (20:19):
undefined
Josh:
Who's that high school kids (20:21):
undefined
Ejaaz:
Who now have an AI model that can do all their math homework for them. (20:24):
undefined
Josh:
Isn't that incredible? Like, man, think about it. (20:28):
undefined
Ejaaz:
I wish I had that. (20:30):
undefined
Josh:
You have an AI model that is as smart as the smartest people on planet Earth (20:31):
undefined
Josh:
in high school. If it could solve those math problems, it could solve anything. (20:35):
undefined
Ejaaz:
It sounds human as well, Josh. So, like, your teacher is going to struggle unless (20:38):
undefined
Ejaaz:
they use AI themselves to figure out whether you just did that yourself or completely (20:43):
undefined
Ejaaz:
just ran that through GPT, your mom's GPT subscription. (20:48):
undefined
Josh:
It really forces you to re-evaluate the school model, right? (20:51):
undefined
Josh:
Because now that this information is so readily accessible, it's so easy to solve these problems. (20:54):
undefined
Josh:
Is that the actual thing worth learning? Or is it how to use these tools that's (20:59):
undefined
Josh:
more important to get to the answer? (21:04):
undefined
Josh:
And there's this there's this dual pronged approach and we see we see (21:06):
undefined
Josh:
developers and programmers talk about this a lot where as soon (21:09):
undefined
Josh:
as they start to rely too heavily on the tools they start (21:12):
undefined
Josh:
to lose their touch they start to lose their ability to to deeply (21:14):
undefined
Josh:
understand how it reaches conclusions um but (21:18):
undefined
Josh:
is that worth it in exchange for getting to the answer much quicker and then (21:22):
undefined
Josh:
being able to seek many more answers i don't know it's weird dynamic if i was (21:25):
undefined
Josh:
a teacher i'd be worried because i mean similar to what we saw with the calculator (21:29):
undefined
Josh:
it just replace the thinking process and just yield you an answer and (21:32):
undefined
Ejaaz:
The thing with the calculator is like you you're (21:38):
undefined
Ejaaz:
using the calculator so it figures out the answer for you but you kind of (21:42):
undefined
Ejaaz:
loosely understand how it is working right you (21:44):
undefined
Ejaaz:
know what numbers it's crunching to get to that answer and then typically you (21:48):
undefined
Ejaaz:
do a few things on a calculator and then you get to your eventual answer for (21:52):
undefined
Ejaaz:
whatever the original question was the issue with or the concern that you're (21:57):
undefined
Ejaaz:
highlighting here with AI is it's doing really complex problems, (22:01):
undefined
Ejaaz:
which kids don't even need to understand in the first place just to get an answer, (22:07):
undefined
Ejaaz:
which they can then give to their teacher, get a grade and then go to university. (22:11):
undefined
Ejaaz:
But the kids don't actually learn actively in that process. (22:15):
undefined
Ejaaz:
And it's going to be a concerning trend if we see kids just trying to go from (22:19):
undefined
Ejaaz:
zero to 100% without understanding anything in between. (22:24):
undefined
Ejaaz:
A trend to watch. (22:29):
undefined
Josh:
This is our episode from a few weeks ago. Is AI making you dumber? (22:31):
undefined
Josh:
Yes. And I think that's just going to continue to be the question. (22:34):
undefined
Josh:
Oh, God. And I think the answer is it's all dependent on how you choose to use (22:37):
undefined
Josh:
the tools that you're given. (22:41):
undefined
Josh:
And if you use these tools as further leverage. So I'm sure these math olympiads (22:42):
undefined
Josh:
who can actually complete the problems would love to have this model to check (22:46):
undefined
Josh:
the problems and to work through the problems and to figure out shortcuts on (22:50):
undefined
Josh:
solving these problems. (22:53):
undefined
Josh:
Where if you deeply understand it, then this becomes an amazing tool to check (22:54):
undefined
Josh:
your work, to generate new questions for you. (22:57):
undefined
Josh:
It's a great study, buddy. or if you are not an olympiad and you still want (22:59):
undefined
Josh:
to get to the answer well you just kind of cheat your way through and you just (23:04):
undefined
Josh:
ask it for exactly what you want so it's that it's that split again and it's (23:07):
undefined
Josh:
up to the person to take their own agency solve their own problems and try to (23:11):
undefined
Josh:
use these for for tools of leverage instead of just problem solving machines that (23:14):
undefined
Ejaaz:
Actually reminds me of this tweet i saw yesterday josh um so what you're looking (23:19):
undefined
Ejaaz:
at here is a tweet from dave white dave White is a very prestigious investment (23:23):
undefined
Ejaaz:
slash research advisor at this fund called Paradigm, (23:28):
undefined
Ejaaz:
which basically it's a crypto fund, but it is one of the wealthiest funds out there. (23:32):
undefined
Ejaaz:
So a lot of the investments they made were massive wins. And a lot of the reasoning (23:37):
undefined
Ejaaz:
of those wins was from Dave White's analysis. (23:42):
undefined
Ejaaz:
He is a deeply thoughtful mathematician at his core, and he is famed for doing (23:44):
undefined
Ejaaz:
a lot of analyses on companies, mathematical analyses that have ended up, you know. (23:50):
undefined
Ejaaz:
Determining whether a fund puts $100 million in a company or zero, right? (23:57):
undefined
Ejaaz:
So a very important job worth hundreds of millions of dollars, right? (24:01):
undefined
Ejaaz:
And what he says here, basically, is him having an identity crisis, (24:04):
undefined
Ejaaz:
because he has looked up to the IMO, the International Math Olympiad. (24:08):
undefined
Ejaaz:
And he goes on to say in this tweet that subconsciously, whenever he's met a (24:13):
undefined
Ejaaz:
gold medalist IMO champion, he's always subconsciously thought that they were (24:18):
undefined
Ejaaz:
smarter than him, that he is more respecting of them. (24:22):
undefined
Ejaaz:
And now with this news that AI models basically can do his job for him, (24:25):
undefined
Ejaaz:
can reason better than him at some of these math problems, he now has an identity crisis. (24:30):
undefined
Ejaaz:
He doesn't know kind of where to go from this. And if people like Dave White (24:35):
undefined
Ejaaz:
is having this kind of like disillusioned sentiment from how smart AI is, (24:39):
undefined
Ejaaz:
you can imagine how this is going to happen for everyone else in all of the (24:45):
undefined
Ejaaz:
other sectors, Josh, right? (24:49):
undefined
Ejaaz:
It doesn't matter if you're a mathematician or an investment research advisor, (24:50):
undefined
Ejaaz:
you could be a technician in some kind of engineering industrial role, (24:54):
undefined
Ejaaz:
or you could be a teacher, or you could be a kid or a high schooler. (24:59):
undefined
Ejaaz:
I think this disillusionment is going to spread. And I think it's super important (25:02):
undefined
Ejaaz:
for people to kind of like evolve their thinking, like you said, (25:06):
undefined
Ejaaz:
Josh, and learn how to leverage these tools versus just consume. (25:09):
undefined
Josh:
Yeah, this is, I mean, this is crazy. There's a lot of people that are going (25:13):
undefined
Josh:
to have to adapt to this new world order of intelligence, where if you build (25:16):
undefined
Josh:
up your entire identity around being intelligent, well, perhaps you're going to have to alter the way (25:21):
undefined
Josh:
present yourself as intelligent because the meaning of intelligence is becoming (25:26):
undefined
Josh:
commoditized among these tools that are now reduced down to a single chat box. (25:29):
undefined
Ejaaz:
Yep. Benchmarks are going to have to reset themselves completely. (25:34):
undefined
Ejaaz:
But folks, that is the end of this episode. Thank you so much for tuning in again. (25:37):
undefined
Ejaaz:
Josh and I are going hammer and tong at Limitless. (25:42):
undefined
Ejaaz:
Our goal is to get you the hottest and trending topics and news fresh out the (25:46):
undefined
Ejaaz:
door, give you our commentary, our thoughts, and hopefully some useful insights for you. (25:51):
undefined
Ejaaz:
If you enjoyed this episode if you enjoyed any of our previous episodes please (25:55):
undefined
Ejaaz:
continue to share and spread them with all your friends and family and whoever (25:58):
undefined
Ejaaz:
you think might be interested in this we are getting tons of feedback from you (26:02):
undefined
Ejaaz:
guys and with every episode that we release we're getting better so please remember (26:05):
undefined
Ejaaz:
to like subscribe follow us it's hugely appreciative and helpful for us and (26:09):
undefined
Ejaaz:
we'll see you on the next one. (26:13):
undefined