All Episodes

July 24, 2025 ‱ 26 mins

In this episode, we celebrate OpenAI and Google's historic gold medal wins at the International Math Olympiad, showcasing significant advancements in problem-solving abilities.

We discuss the technological breakthroughs enabling these achievements and the implications for education as AI challenges traditional notions of intelligence.

However, the competition was not without its share of AI drama, as the giants continue to compete at all costs in the AI game of thrones.

------
đŸ’« LIMITLESS | SUBSCRIBE & FOLLOW
https://limitless.bankless.com/
https://x.com/LimitlessFT

------
TIMESTAMPS

0:00 Intro
1:35 AI vs. Math Olympiad
4:11 OpenAI's Breakthrough
6:54 The Gold Medal Debate
8:39 The Controversy Unfolds
12:51 The Google OpenAI Drama
13:42 OpenAI's Desperate Moves
15:20 The Models' Progress
17:38 A New Era of Intelligence
21:05 The Impact on Education
25:13 Redefining Intelligence
25:37 Conclusion and Farewell

------
RESOURCES

Josh: https://x.com/Josh_Kale

Ejaaz: https://x.com/cryptopunk7213

------
Not financial or tax advice. See our investment disclosures here:
https://www.bankless.com/disclosures⁠

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Ejaaz: All right josh the ai nerds are (00:03):
undefined

Ejaaz: fighting again this past weekend there was (00:06):
undefined

Ejaaz: a very prestigious competition called the international math olympiad which (00:09):
undefined

Ejaaz: hosts some of the brightest smartest mathematicians of our time and they're (00:14):
undefined

Ejaaz: typically high schoolers and basically they come together and they take a really (00:18):
undefined

Ejaaz: hard math test this is like four to five hours and those that score the highest, get medals. (00:22):
undefined

Ejaaz: You can get bronze, silver, and the highest scorers get gold medals. (00:27):
undefined

Ejaaz: So what's this going to do with AI? (00:31):
undefined

Ejaaz: Well, recently, over the last couple of years, the organizers of this International (00:33):
undefined

Ejaaz: Math Olympiad decided to start inviting AI models to participate as contestants. (00:37):
undefined

Ejaaz: And they did terribly. Like, no one's come even near the human geniuses. (00:43):
undefined

Ejaaz: Except this year, Josh, where they came to play and not one, (00:50):
undefined

Ejaaz: but two AI models achieved not silver, but gold medals, which is just an insane thing, right? (00:53):
undefined

Ejaaz: So it should be all fun and games, right? What a fairytale story. (01:01):
undefined

Ejaaz: Well, unfortunately, OpenAI and Google got into an online spat where they started (01:05):
undefined

Ejaaz: accusing each other of cheating. (01:11):
undefined

Ejaaz: Now, remember, these are trillion dollar companies. So essentially, (01:13):
undefined

Ejaaz: Josh, I was teleported this weekend back to my high school days where I felt (01:17):
undefined

Ejaaz: like the teacher had to come in, separate the kids from arguing over some kind (01:20):
undefined

Ejaaz: of random homework problem and get them to chill out. (01:25):
undefined

Josh: We will look back at this episode and laugh at it like it's a joke because these (01:28):
undefined

Josh: AIs, they're competing against high schoolers. That's so lame. (01:31):
undefined

Josh: Only high schoolers? Like, come on, and you're just barely getting gold. (01:35):
undefined

Ejaaz: Well, in their defense, Josh, these are some pretty smart high schoolers, (01:38):
undefined

Ejaaz: man. Like I was looking at some of these math problems. (01:42):
undefined

Ejaaz: I don't know if you can see my screen here. I'm sharing the official site. (01:44):
undefined

Ejaaz: And if you look at some of these problems, here we go. (01:48):
undefined

Ejaaz: And then like, okay, so they have basically, they host this competition in a (01:53):
undefined

Ejaaz: different country each year. (01:56):
undefined

Ejaaz: And you can kind of like download the test yourselves after the fact to see (01:57):
undefined

Ejaaz: how well you could do it. I had a look at this one, Josh from the Afrikaans. (02:01):
undefined

Ejaaz: I basically don't understand anything. One second. All right, (02:07):
undefined

Ejaaz: take a look at that. Take a look at this. (02:10):
undefined

Josh: That looks like quite a bit of squiggly lines on a page. (02:13):
undefined

Ejaaz: You know what? That could be mistaken for a piece of art in a gallery if you (02:17):
undefined

Ejaaz: didn't peer too closely at it. This looks insane. (02:22):
undefined

Josh: Okay, so I take it back. So the high schoolers are probably pretty smart then. (02:25):
undefined

Josh: And I guess the AI performing as well as the high schoolers is probably a pretty big deal, right? (02:28):
undefined

Josh: Because that looks like very complicated math problems that I'm assuming most (02:33):
undefined

Josh: of the smartest people in the world cannot solve. (02:37):
undefined

Ejaaz: Exactly. Yeah. This is like something that is technically set for high schoolers (02:39):
undefined

Ejaaz: and sometimes college kids, but is meant to demonstrate prowess in the field. (02:44):
undefined

Ejaaz: So there's a lot of university academics, which obviously do math degrees and (02:50):
undefined

Ejaaz: they do PhDs, but those are in very specific problems. So you kind of like in (02:54):
undefined

Ejaaz: science, you just need to kind of pick and choose your lane and then dedicate your life to it. (02:58):
undefined

Ejaaz: High schoolers is kind of college kids are kind of like the last point before (03:03):
undefined

Ejaaz: you jump into your specialization. (03:07):
undefined

Ejaaz: So really, if you're the best at generalized maths, you're going to compete in this competition. (03:09):
undefined

Ejaaz: And what's so interesting is typically AI models haven't been able to perform (03:13):
undefined

Ejaaz: very well because they needed a lot of context beforehand about the problem, Josh. (03:17):
undefined

Ejaaz: So they needed to know that, you know, there was certain, you know, (03:22):
undefined

Ejaaz: X equals something and Y equals something. (03:27):
undefined

Ejaaz: And they had to have defined parameters to kind of figure out the problem. (03:30):
undefined

Ejaaz: But this was the first time that AI models basically were just given a blank (03:33):
undefined

Ejaaz: sheet of paper or not a blank sheet of paper. (03:37):
undefined

Ejaaz: But they stared at the problem just as we just looked at it just now and had (03:38):
undefined

Ejaaz: to read the words, read the characters, interpret what that meant in the context (03:42):
undefined

Ejaaz: of that situation and the way that the question was framed and then figure it out themselves. (03:47):
undefined

Ejaaz: So it's as if the AI models had a camera that looked at a paper, (03:51):
undefined

Ejaaz: similar way that we look at test papers as kids through our eyes and figure it out themselves. (03:55):
undefined

Josh: So what changed? What happened in the last year that made it so much better? (04:01):
undefined

Josh: Because it went from, what, basically zero of six to now six or five of six questions answered. (04:06):
undefined

Josh: Now it's a gold medalist. So what happened? (04:11):
undefined

Ejaaz: So listen, I'm not going to try and explain it, but maybe you and I can decipher (04:14):
undefined

Ejaaz: it through the legends themselves that built these models, right? (04:18):
undefined

Ejaaz: Okay, so let me paint the scene for you, Josh. (04:22):
undefined

Ejaaz: It is Saturday evening. (04:24):
undefined

Ejaaz: You know, normal people are usually out and about. They're having fun. (04:27):
undefined

Ejaaz: They're probably having dinner, catching up with friends or chilling at home, watching a movie. (04:30):
undefined

Ejaaz: And this guy called Alexander Wei, who is OpenAI's head of reasoning. (04:34):
undefined

Ejaaz: Reasoning is basically this new fancy technique that AI models have typically (04:39):
undefined

Ejaaz: demonstrated, which has brought them up to like the frontier level of AI models. (04:43):
undefined

Ejaaz: Basically, if your model can do reasoning, it's typically a pretty smart model, right? (04:47):
undefined

Ejaaz: And he posts this tweet saying, I'm excited to share that our latest OpenAI (04:52):
undefined

Ejaaz: Experimental Reasoning LLM has achieved a longstanding grand challenge in AI, (04:57):
undefined

Ejaaz: a gold medal level performance on the world's most prestigious math competition, (05:02):
undefined

Ejaaz: the International Math Olympiad. (05:07):
undefined

Ejaaz: And he goes on to describe, you know, how the model basically took on each problem (05:09):
undefined

Ejaaz: in its own regard and solved it and how this is a massive success and win for (05:14):
undefined

Ejaaz: AI models and how, most importantly. (05:19):
undefined

Ejaaz: OpenAI was the first ever model to complete this. (05:21):
undefined

Ejaaz: And not too long after he posts that tweet, Josh, Sam Altman jumps in here, right? (05:25):
undefined

Ejaaz: And he goes, again, he kind of echoes similar thoughts. We achieved gold medal (05:31):
undefined

Ejaaz: level performance on the 2025 IMO competition with general purpose reasoning. (05:34):
undefined

Ejaaz: And then he kind of like shells GPT-5 at the end. Basically, (05:38):
undefined

Ejaaz: it's like a promotive thing for OpenAI. (05:41):
undefined

Ejaaz: And I will say that this is really cool because what they've achieved is something (05:44):
undefined

Ejaaz: that hasn't been done before, right? So very impressive feat. (05:49):
undefined

Ejaaz: And in terms of how this works specifically, Cheryl Su here gives a really good breakdown. (05:53):
undefined

Ejaaz: She says, the model solves these problems without tools like coding or Lean, (05:58):
undefined

Ejaaz: which is another coding tool. (06:04):
undefined

Ejaaz: It just uses natural language. So as I said earlier, It kind of reads the paper (06:05):
undefined

Ejaaz: and just kind of interprets what it thinks it means. (06:09):
undefined

Ejaaz: And it also has the same amount of time to do the test as other kits, so 4.5 hours. (06:12):
undefined

Ejaaz: And she says, we see the model reason at a very high level, trying out different (06:17):
undefined

Ejaaz: strategies, making observations from examples, and testing different hypotheses out. (06:22):
undefined

Ejaaz: And she says, it's crazy how we've gone from 12% on the AIME test, (06:27):
undefined

Ejaaz: which is what GPT-4O, which is OpenAI's early model, got to IMO gold, (06:32):
undefined

Ejaaz: International Math Olympiad gold medal in 15 months. (06:38):
undefined

Ejaaz: So just to set that in context, Josh, that is a crazy leap in 15 months. (06:41):
undefined

Ejaaz: Imagine going from eighth grade level math to the best. (06:45):
undefined

Ejaaz: Mathematician in the world in 15 months. It's a pretty insane thing. (06:51):
undefined

Ejaaz: Yeah, I'd say so. So essentially the breakthrough that Cheryl is highlighting (06:55):
undefined

Ejaaz: here is number one, the model didn't need any context. (06:58):
undefined

Ejaaz: Number two, it used really high level reasoning to figure out the problems from first principles. (07:03):
undefined

Ejaaz: And number three, it was able to test out multiple hypotheses at the same time (07:08):
undefined

Ejaaz: instead of trying to one shot the problem. (07:14):
undefined

Ejaaz: Typically in the past when AI models have been given a prompt or a problem, (07:15):
undefined

Ejaaz: it tries to just like give it its best shot and give you one solution, Josh. (07:19):
undefined

Ejaaz: Whereas what these models, these reasoning models do really well is they are (07:23):
undefined

Ejaaz: able to hypothetically entertain many different scenarios and then pick the (07:27):
undefined

Ejaaz: best one of which it thought it was an answer. (07:30):
undefined

Ejaaz: And it ended up with the gold medal, which is insane, right? (07:32):
undefined

Ejaaz: But it wasn't entirely without a few glitches here and there, Josh. (07:34):
undefined

Ejaaz: So if you look at this post from Jasper, he read through the entire kind of (07:38):
undefined

Ejaaz: like problem set that OpenAI's model went through. and he points out that some weird anomalies. (07:42):
undefined

Ejaaz: So he kind of like talks about like how it kind of like analyzed and a bunch of things. (07:49):
undefined

Ejaaz: And he goes, however, the write-up is kind of messy. He goes, (07:52):
undefined

Ejaaz: it overuses shorthand and sentence fragments. (07:55):
undefined

Ejaaz: It introduces new terms without definitions, for example, forbidden and sunny partners. (07:58):
undefined

Ejaaz: I have no idea what either of those terms could mean, but it was just apparently (08:04):
undefined

Ejaaz: just interspersing these phrases during its analysis. (08:10):
undefined

Ejaaz: And so as a reviewer, or as an examiner, they were reading this, (08:13):
undefined

Ejaaz: they were like, sorry, wait, what is it talking about? (08:17):
undefined

Ejaaz: It got to the right answer, but what is it talking about, right? (08:20):
undefined

Ejaaz: The other key point from this post is it was unable to solve one problem, problem six. (08:23):
undefined

Ejaaz: And I'm not even gonna try and get into why it failed on that problem, (08:29):
undefined

Ejaaz: but it was just particularly hard for it to figure out. (08:33):
undefined

Ejaaz: But it still scored a high enough percentage that it got a gold medal. (08:36):
undefined

Ejaaz: So it's basically a win for OpenAI, but that's when the drama starts unfolding. (08:40):
undefined

Ejaaz: So I've got this post up from Mikhail Samin, which kind of like sparks this entire fight, Josh. (08:44):
undefined

Ejaaz: He goes, according to a friend, the IMO, which is the International Math Olympiad. (08:51):
undefined

Ejaaz: Asked AI companies not to steal the spotlight from kids and to wait a week after (08:55):
undefined

Ejaaz: the closing ceremony to announce the results. (09:01):
undefined

Ejaaz: OpenAI instead announced the results before the closing ceremony. Yeah. (09:04):
undefined

Ejaaz: And then he goes on to basically say how this is essentially like some kind (09:09):
undefined

Ejaaz: of clout chasing move from OpenAI. (09:13):
undefined

Ejaaz: And OK, I tried to evaluate this, Josh, from OpenAI's kind of perspective, (09:16):
undefined

Ejaaz: which is they basically want to steal the limelight, (09:20):
undefined

Ejaaz: but also say that they were the first AI model to ever achieve gold on this (09:23):
undefined

Ejaaz: competition, which puts them in a good light and makes users want to choose (09:27):
undefined

Ejaaz: OpenAI and solidify the branding that OpenAI is the best. right? (09:31):
undefined

Ejaaz: But on the other side, you know, they're kind of like stealing the spotlight (09:35):
undefined

Ejaaz: from the kids, as this post says. But that's not actually the main trope. (09:39):
undefined

Ejaaz: The main trope here, Josh, is OpenAI wasn't the only model to achieve a goal, right? (09:44):
undefined

Ejaaz: At the same time, during the same testing period, you had Google achieving the exact same score. (09:50):
undefined

Ejaaz: So then the question becomes, okay, well, it was whoever was ethical about announcing their own result. (09:58):
undefined

Ejaaz: This post from Demis Hassabis, which is Google's head of AI, (10:04):
undefined

Ejaaz: basically posts, and I'll note two days later, Official results are in. (10:09):
undefined

Ejaaz: Gemini, which is their flagship model, achieved gold medal level in the International Math Olympiad. (10:14):
undefined

Ejaaz: An advanced version was able to solve five out of six problems. (10:20):
undefined

Ejaaz: So same as OpenAI, same thing, struggled on the sixth problem. (10:23):
undefined

Ejaaz: Incredible progress. Huge congrats to the team. (10:26):
undefined

Ejaaz: And a tweet here says that Google (10:29):
undefined

Ejaaz: basically had to wait for marketing to approve the tweet until Monday. (10:31):
undefined

Ejaaz: But OpenAI shared theirs first at 1 a.m. (10:35):
undefined

Ejaaz: On Saturday and stole the spotlight. (10:38):
undefined

Ejaaz: And we see the screenshot from Demis Hassabis, which, you know, (10:40):
undefined

Ejaaz: he further clarifies this, basically saying, (10:44):
undefined

Ejaaz: by the way, as an aside, we didn't announce on Friday because we respected the (10:46):
undefined

Ejaaz: IMO's board's original request that all AI labs share the results only after (10:50):
undefined

Ejaaz: the official results have been verified. (10:55):
undefined

Ejaaz: Now that we've been given permission to share, blah, blah, blah, (10:57):
undefined

Ejaaz: he shares. So Demis is playing the like good Samaritan here. (10:59):
undefined

Ejaaz: He's like, ah, you know, we also have the good model, but we, (11:02):
undefined

Ejaaz: you know, we have some pride and some manners about how we deal with these things. (11:06):
undefined

Ejaaz: That's where it starts to get a little uglier, Josh, because we have OpenAI (11:10):
undefined

Ejaaz: chiming in to this tweet, which basically says, and this is some random commenting (11:15):
undefined

Ejaaz: on OpenAI and this entire situation. (11:21):
undefined

Ejaaz: So OpenAI basically has zero advantages except the size of the team, (11:24):
undefined

Ejaaz: aka the OpenAI team was claimed to be smaller than Google Gemini's team. (11:30):
undefined

Ejaaz: So what he's inferring here is there's no real difference between OpenAI's models (11:34):
undefined

Ejaaz: and Google Gemini's models. You can pretty much use either or. (11:38):
undefined

Ejaaz: OpenAI maybe has a smaller team to build that model, but who the hell cares? (11:42):
undefined

Ejaaz: And then one of the AI model researchers at OpenAI basically comes in and says, (11:46):
undefined

Ejaaz: well, I think it's also interesting that they they (11:52):
undefined

Ejaaz: being google curated and provided useful context (11:55):
undefined

Ejaaz: to the model which we did not feels like (11:59):
undefined

Ejaaz: taking your tutor's cheat sheet with you into the exam so shots basically being (12:02):
undefined

Ejaaz: fired from open ai saying hey um you cheated you gave context to your model (12:07):
undefined

Ejaaz: and that was why it was able to achieve gold we open ai didn't provide any of (12:12):
undefined

Ejaaz: that context and it was able to reason from first principles, there you have it. (12:17):
undefined

Ejaaz: But then directly beneath it, Vinay Rameshes, who is a Google DeepMind AI researcher, responds, (12:21):
undefined

Ejaaz: it's worth noting actually that a deep think system, which is Google's AI system (12:27):
undefined

Ejaaz: with no access to this corpus, so no context, also got gold. (12:32):
undefined

Ejaaz: Again, according to the official graders, and he puts this in brackets because (12:36):
undefined

Ejaaz: OpenAI didn't wait for the official graders to mark their score, (12:40):
undefined

Ejaaz: with exactly the same score. (12:44):
undefined

Ejaaz: So basically, this is like a pissing contest between two of the top AI model providers. (12:45):
undefined

Ejaaz: Here's my take, Josh. And then I really want to kind of lean into what you think (12:52):
undefined

Ejaaz: about this whole debacle. (12:56):
undefined

Ejaaz: Number one, this seems so childish to me. (12:57):
undefined

Ejaaz: Like, eventually, AI models were eventually going to get smarter or smart enough (13:00):
undefined

Ejaaz: to solve these mathematical problems. (13:05):
undefined

Ejaaz: And I think you said this earlier on. (13:07):
undefined

Ejaaz: This is something that they're going to probably laugh about 10 years from now, (13:10):
undefined

Ejaaz: right? that they were able to solve whatever, the most complex mathematic problems (13:14):
undefined

Ejaaz: for humans, mere humans. (13:17):
undefined

Ejaaz: And now AI is off creating wonderful scientific discoveries for us that we would (13:19):
undefined

Ejaaz: have never comprehended or figured out ourselves, right? (13:24):
undefined

Ejaaz: So firstly, you're arguing over something that's so silly. (13:27):
undefined

Ejaaz: But number two, this kind of seems desperate on the open AI side. (13:31):
undefined

Ejaaz: And maybe I'm being biased, but I'm just going to give you my take. (13:36):
undefined

Ejaaz: Open AI has kind of had a series of stumbles recently. (13:39):
undefined

Ejaaz: They claimed that they were going to release gpt5 which (13:43):
undefined

Ejaaz: is their brand new frontier model but they've delayed it many months (13:45):
undefined

Ejaaz: now um they got outperformed by (13:49):
undefined

Ejaaz: grok 4 from xai uh so now (13:52):
undefined

Ejaaz: they have a new benchmark that they need to beat a new model that they basically (13:55):
undefined

Ejaaz: need to outcompete uh they claimed that they were going to release a new open (13:58):
undefined

Ejaaz: source model and then delayed it after a chinese open source model was released (14:02):
undefined

Ejaaz: and had one trillion parameters and outperformed not just their model, (14:07):
undefined

Ejaaz: but any other open source model out there. (14:11):
undefined

Ejaaz: And so I feel like they're looking (14:13):
undefined

Ejaaz: for a win, right? They released their agent this week or last week. (14:16):
undefined

Ejaaz: And so, you know, that had mixed review, mixed feedback. (14:21):
undefined

Ejaaz: So I feel like Sam is desperate for a win. (14:24):
undefined

Ejaaz: People are criticizing consistently their moat, asking what has OpenAI got? (14:26):
undefined

Ejaaz: They've lost a ton of researchers to Meta and other companies. (14:32):
undefined

Ejaaz: I feel like their back's against the wall. (14:35):
undefined

Ejaaz: Sam's scared and he basically needs to grab any kind of win. (14:37):
undefined

Ejaaz: So it reeks of desperation. (14:41):
undefined

Ejaaz: What's your take, Josh? (14:43):
undefined

Josh: I do empathize with the team. They've been coming under fire from every single angle. (14:44):
undefined

Josh: I mean, you have Zuck poaching all of their talent, and then all of the other (14:49):
undefined

Josh: open-source AI models are beating them at their own game. (14:54):
undefined

Josh: And they're just kind of, they're really getting beat up now. (14:57):
undefined

Josh: And I think that they're looking to get some footing. I'm sure this probably plays a role in it. (15:00):
undefined

Josh: But I'm sure behind the scenes, they're really trying to fight hard to put their (15:04):
undefined

Josh: feet back on stable ground, to get GPT-5 out the door, to build Project Stargate (15:09):
undefined

Josh: and make this big infrastructure network. (15:13):
undefined

Josh: They need some wins. So sure, this was probably an attempt to get ahead, (15:14):
undefined

Josh: make them look good, win over some more hearts and minds. (15:18):
undefined

Josh: But I think the most interesting part of the whole story is less the drama and (15:21):
undefined

Josh: more the fact that these models were able to accomplish a really impressive (15:25):
undefined

Josh: feat over such a short period of time. (15:28):
undefined

Josh: From what I understand, previously when they attempted to solve these problems, (15:30):
undefined

Josh: they used a custom training data set. (15:35):
undefined

Josh: They used custom tool sets. It was mostly a model trained on solving mathematical problems. (15:37):
undefined

Josh: And with this version, both the OpenAI version and the Gemini models, (15:42):
undefined

Josh: they were both general purpose models. (15:49):
undefined

Josh: They were not trained specifically with the intention of solving mathematical problems. (15:51):
undefined

Josh: These are the general models that people day to day are using. (15:55):
undefined

Josh: They're just now able to solve these math problems using this new general intelligence. (15:58):
undefined

Josh: So it's a really interesting breakthrough that I think we get from reinforcement (16:02):
undefined

Josh: learning that now there is not so much of an advantage to training a model specific (16:05):
undefined

Josh: to one's skill set when you could just make it great at everything. (16:11):
undefined

Josh: There was one thing that I noticed that some people call it cheating, other people don't. (16:14):
undefined

Josh: But so with the mathematical, with the actual test that high school was had (16:19):
undefined

Josh: to take, they're not allowed to use tools and they have a limited amount of (16:23):
undefined

Josh: time per question to answer. (16:26):
undefined

Josh: The models that, the OpenAI model and the Gemini model, they had infinite amount (16:28):
undefined

Josh: of time to answer and they were allowed to use tools. (16:32):
undefined

Josh: So there still are small differences in these. (16:34):
undefined

Ejaaz: Were they allowed to like use the internet? (16:37):
undefined

Josh: I don't know the specifics. I would imagine at least calculators, (16:39):
undefined

Josh: at most probably the full repertoire of what we have currently available to (16:42):
undefined

Josh: us, which is full internet search, code writing abilities. They could do their (16:46):
undefined

Josh: own mathematical checks. (16:49):
undefined

Josh: So I would just assume the minimum amount of constraints possible. (16:51):
undefined

Josh: So there was much less constraints on the models, But they did solve the questions. (16:54):
undefined

Josh: And I think that's super impressive. They got five out of six right. (16:58):
undefined

Josh: Which was gold and better than almost every student, if I'm not mistaken. (17:02):
undefined

Josh: Only a few students got the six out of six completely correct. (17:06):
undefined

Josh: It's just cool to see the rate of progress of these models getting better. (17:10):
undefined

Josh: That over the course of the last 15 months or so, they went from horrible and (17:12):
undefined

Josh: narrowly trained to incredible and generally trained. (17:17):
undefined

Josh: And as long as that trend keeps going, I think the drama matters less than the (17:20):
undefined

Josh: output, which is models are getting really good at solving really hard math problems. (17:24):
undefined

Josh: And original ones too, that the world has never seen before. (17:28):
undefined

Ejaaz: Yeah, well, that last point is actually the main takeaway that I had, (17:31):
undefined

Ejaaz: Josh, which is it's original, never-before-seen problems. (17:35):
undefined

Ejaaz: Typically, these AI models are trained on things that they've seen before, as you said, right? (17:39):
undefined

Ejaaz: They're trained on data sets. So they've already seen the problem, (17:44):
undefined

Ejaaz: and then they have to work out, they know the answer, and they have to work (17:46):
undefined

Ejaaz: out how to get there, right? So they kind of have a leading factor. (17:49):
undefined

Ejaaz: Here, it's just kind of like completely unknown. (17:52):
undefined

Ejaaz: The other thing is, this is kind of like the culmination of a trend, (17:56):
undefined

Ejaaz: Josh, which is these AI models are really good at doing kind of binary tasks. (18:01):
undefined

Ejaaz: And I don't want to reduce mathematics to binary tasks, but technically it's (18:07):
undefined

Ejaaz: numbers, sequential formulas, that kind of stuff, right? (18:12):
undefined

Ejaaz: So if you can run enough compute at a thing, and if you can get that AI model (18:18):
undefined

Ejaaz: to consider all different decision parts, It's going to eventually get to the answer, right? (18:23):
undefined

Ejaaz: But it's always a specific answer at the end of that, right? (18:28):
undefined

Ejaaz: Whereas when it comes to more subjective things, more human experiential things, (18:32):
undefined

Ejaaz: AI has typically struggled to... (18:37):
undefined

Ejaaz: Improve at the same rate that it has for like all these different scientific (18:40):
undefined

Ejaaz: and math problems so i'm glad that we've reached this pinnacle feat i think (18:43):
undefined

Ejaaz: ai models have are really good at one thing and not so great at other things (18:47):
undefined

Ejaaz: and i'm excited to see how like they kind of like try to start leapfrogging (18:53):
undefined

Ejaaz: each other over the next couple of years. (18:57):
undefined

Josh: Yeah it's it's that directional progress that we like (18:59):
undefined

Josh: math is clearly the first because you can write down (19:02):
undefined

Josh: proofs and you could check your work and there is an actual verifiable solution (19:05):
undefined

Josh: and i think that's why we're seeing a lot of the progress start early (19:08):
undefined

Josh: in math and then hopefully go on to these other places but (19:12):
undefined

Josh: what we are seeing is these first signs of (19:15):
undefined

Josh: new knowledge breakthroughs where it's solving a (19:18):
undefined

Josh: new and novel problem that hasn't been (19:21):
undefined

Josh: released before based on its previous data set (19:24):
undefined

Josh: so it's not just pattern matching like you mentioned earlier where it has (19:28):
undefined

Josh: this data set of questions it's kind of finding the right examples and (19:31):
undefined

Josh: then applying that logic to the question it's actually (19:34):
undefined

Josh: reasoning and it's it's reasoning in many instances and (19:37):
undefined

Josh: then it's comparing its work and it's it's coming to a conclusion (19:40):
undefined

Josh: and we saw this with the grok heavy model last week too when (19:44):
undefined

Josh: it released um where i think the the new (19:46):
undefined

Josh: meta is many instances solving hard (19:49):
undefined

Josh: problems and then comparing so you lower that error rate more (19:52):
undefined

Josh: and more and more each time and what we're seeing is great progress so (19:55):
undefined

Josh: i mean although open ai and google are fighting again they're both they're both (19:59):
undefined

Josh: fighting over over exciting progress and sure maybe one tried to sweep in and (20:04):
undefined

Josh: steal the valor but they both did an excellent job in actually completing these (20:08):
undefined

Josh: problems and placing gold in a test that was previously not possible to do from an ai model you (20:13):
undefined

Ejaaz: Know who the real winners are here out of this josh. (20:19):
undefined

Josh: Who's that high school kids (20:21):
undefined

Ejaaz: Who now have an AI model that can do all their math homework for them. (20:24):
undefined

Josh: Isn't that incredible? Like, man, think about it. (20:28):
undefined

Ejaaz: I wish I had that. (20:30):
undefined

Josh: You have an AI model that is as smart as the smartest people on planet Earth (20:31):
undefined

Josh: in high school. If it could solve those math problems, it could solve anything. (20:35):
undefined

Ejaaz: It sounds human as well, Josh. So, like, your teacher is going to struggle unless (20:38):
undefined

Ejaaz: they use AI themselves to figure out whether you just did that yourself or completely (20:43):
undefined

Ejaaz: just ran that through GPT, your mom's GPT subscription. (20:48):
undefined

Josh: It really forces you to re-evaluate the school model, right? (20:51):
undefined

Josh: Because now that this information is so readily accessible, it's so easy to solve these problems. (20:54):
undefined

Josh: Is that the actual thing worth learning? Or is it how to use these tools that's (20:59):
undefined

Josh: more important to get to the answer? (21:04):
undefined

Josh: And there's this there's this dual pronged approach and we see we see (21:06):
undefined

Josh: developers and programmers talk about this a lot where as soon (21:09):
undefined

Josh: as they start to rely too heavily on the tools they start (21:12):
undefined

Josh: to lose their touch they start to lose their ability to to deeply (21:14):
undefined

Josh: understand how it reaches conclusions um but (21:18):
undefined

Josh: is that worth it in exchange for getting to the answer much quicker and then (21:22):
undefined

Josh: being able to seek many more answers i don't know it's weird dynamic if i was (21:25):
undefined

Josh: a teacher i'd be worried because i mean similar to what we saw with the calculator (21:29):
undefined

Josh: it just replace the thinking process and just yield you an answer and (21:32):
undefined

Ejaaz: The thing with the calculator is like you you're (21:38):
undefined

Ejaaz: using the calculator so it figures out the answer for you but you kind of (21:42):
undefined

Ejaaz: loosely understand how it is working right you (21:44):
undefined

Ejaaz: know what numbers it's crunching to get to that answer and then typically you (21:48):
undefined

Ejaaz: do a few things on a calculator and then you get to your eventual answer for (21:52):
undefined

Ejaaz: whatever the original question was the issue with or the concern that you're (21:57):
undefined

Ejaaz: highlighting here with AI is it's doing really complex problems, (22:01):
undefined

Ejaaz: which kids don't even need to understand in the first place just to get an answer, (22:07):
undefined

Ejaaz: which they can then give to their teacher, get a grade and then go to university. (22:11):
undefined

Ejaaz: But the kids don't actually learn actively in that process. (22:15):
undefined

Ejaaz: And it's going to be a concerning trend if we see kids just trying to go from (22:19):
undefined

Ejaaz: zero to 100% without understanding anything in between. (22:24):
undefined

Ejaaz: A trend to watch. (22:29):
undefined

Josh: This is our episode from a few weeks ago. Is AI making you dumber? (22:31):
undefined

Josh: Yes. And I think that's just going to continue to be the question. (22:34):
undefined

Josh: Oh, God. And I think the answer is it's all dependent on how you choose to use (22:37):
undefined

Josh: the tools that you're given. (22:41):
undefined

Josh: And if you use these tools as further leverage. So I'm sure these math olympiads (22:42):
undefined

Josh: who can actually complete the problems would love to have this model to check (22:46):
undefined

Josh: the problems and to work through the problems and to figure out shortcuts on (22:50):
undefined

Josh: solving these problems. (22:53):
undefined

Josh: Where if you deeply understand it, then this becomes an amazing tool to check (22:54):
undefined

Josh: your work, to generate new questions for you. (22:57):
undefined

Josh: It's a great study, buddy. or if you are not an olympiad and you still want (22:59):
undefined

Josh: to get to the answer well you just kind of cheat your way through and you just (23:04):
undefined

Josh: ask it for exactly what you want so it's that it's that split again and it's (23:07):
undefined

Josh: up to the person to take their own agency solve their own problems and try to (23:11):
undefined

Josh: use these for for tools of leverage instead of just problem solving machines that (23:14):
undefined

Ejaaz: Actually reminds me of this tweet i saw yesterday josh um so what you're looking (23:19):
undefined

Ejaaz: at here is a tweet from dave white dave White is a very prestigious investment (23:23):
undefined

Ejaaz: slash research advisor at this fund called Paradigm, (23:28):
undefined

Ejaaz: which basically it's a crypto fund, but it is one of the wealthiest funds out there. (23:32):
undefined

Ejaaz: So a lot of the investments they made were massive wins. And a lot of the reasoning (23:37):
undefined

Ejaaz: of those wins was from Dave White's analysis. (23:42):
undefined

Ejaaz: He is a deeply thoughtful mathematician at his core, and he is famed for doing (23:44):
undefined

Ejaaz: a lot of analyses on companies, mathematical analyses that have ended up, you know. (23:50):
undefined

Ejaaz: Determining whether a fund puts $100 million in a company or zero, right? (23:57):
undefined

Ejaaz: So a very important job worth hundreds of millions of dollars, right? (24:01):
undefined

Ejaaz: And what he says here, basically, is him having an identity crisis, (24:04):
undefined

Ejaaz: because he has looked up to the IMO, the International Math Olympiad. (24:08):
undefined

Ejaaz: And he goes on to say in this tweet that subconsciously, whenever he's met a (24:13):
undefined

Ejaaz: gold medalist IMO champion, he's always subconsciously thought that they were (24:18):
undefined

Ejaaz: smarter than him, that he is more respecting of them. (24:22):
undefined

Ejaaz: And now with this news that AI models basically can do his job for him, (24:25):
undefined

Ejaaz: can reason better than him at some of these math problems, he now has an identity crisis. (24:30):
undefined

Ejaaz: He doesn't know kind of where to go from this. And if people like Dave White (24:35):
undefined

Ejaaz: is having this kind of like disillusioned sentiment from how smart AI is, (24:39):
undefined

Ejaaz: you can imagine how this is going to happen for everyone else in all of the (24:45):
undefined

Ejaaz: other sectors, Josh, right? (24:49):
undefined

Ejaaz: It doesn't matter if you're a mathematician or an investment research advisor, (24:50):
undefined

Ejaaz: you could be a technician in some kind of engineering industrial role, (24:54):
undefined

Ejaaz: or you could be a teacher, or you could be a kid or a high schooler. (24:59):
undefined

Ejaaz: I think this disillusionment is going to spread. And I think it's super important (25:02):
undefined

Ejaaz: for people to kind of like evolve their thinking, like you said, (25:06):
undefined

Ejaaz: Josh, and learn how to leverage these tools versus just consume. (25:09):
undefined

Josh: Yeah, this is, I mean, this is crazy. There's a lot of people that are going (25:13):
undefined

Josh: to have to adapt to this new world order of intelligence, where if you build (25:16):
undefined

Josh: up your entire identity around being intelligent, well, perhaps you're going to have to alter the way (25:21):
undefined

Josh: present yourself as intelligent because the meaning of intelligence is becoming (25:26):
undefined

Josh: commoditized among these tools that are now reduced down to a single chat box. (25:29):
undefined

Ejaaz: Yep. Benchmarks are going to have to reset themselves completely. (25:34):
undefined

Ejaaz: But folks, that is the end of this episode. Thank you so much for tuning in again. (25:37):
undefined

Ejaaz: Josh and I are going hammer and tong at Limitless. (25:42):
undefined

Ejaaz: Our goal is to get you the hottest and trending topics and news fresh out the (25:46):
undefined

Ejaaz: door, give you our commentary, our thoughts, and hopefully some useful insights for you. (25:51):
undefined

Ejaaz: If you enjoyed this episode if you enjoyed any of our previous episodes please (25:55):
undefined

Ejaaz: continue to share and spread them with all your friends and family and whoever (25:58):
undefined

Ejaaz: you think might be interested in this we are getting tons of feedback from you (26:02):
undefined

Ejaaz: guys and with every episode that we release we're getting better so please remember (26:05):
undefined

Ejaaz: to like subscribe follow us it's hugely appreciative and helpful for us and (26:09):
undefined

Ejaaz: we'll see you on the next one. (26:13):
undefined
Advertise With Us

Popular Podcasts

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Law & Order: Criminal Justice System - Season 1 & Season 2

Law & Order: Criminal Justice System - Season 1 & Season 2

Season Two Out Now! Law & Order: Criminal Justice System tells the real stories behind the landmark cases that have shaped how the most dangerous and influential criminals in America are prosecuted. In its second season, the series tackles the threat of terrorism in the United States. From the rise of extremist political groups in the 60s to domestic lone wolves in the modern day, we explore how organizations like the FBI and Joint Terrorism Take Force have evolved to fight back against a multitude of terrorist threats.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.