OpenAI and Google Just Beat the World's Smartest Mathematicians

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Ejaaz: All right josh the ai nerds are (00:03):
undefined

Ejaaz: fighting again this past weekend there was (00:06):
undefined

Ejaaz: a very prestigious competition called the international math olympiad which (00:09):
undefined

Ejaaz: hosts some of the brightest smartest mathematicians of our time and they're (00:14):
undefined

Ejaaz: typically high schoolers and basically they come together and they take a really (00:18):
undefined

Ejaaz: hard math test this is like four to five hours and those that score the highest, get medals. (00:22):
undefined

Ejaaz: You can get bronze, silver, and the highest scorers get gold medals. (00:27):
undefined

Ejaaz: So what's this going to do with AI? (00:31):
undefined

Ejaaz: Well, recently, over the last couple of years, the organizers of this International (00:33):
undefined

Ejaaz: Math Olympiad decided to start inviting AI models to participate as contestants. (00:37):
undefined

Ejaaz: And they did terribly. Like, no one's come even near the human geniuses. (00:43):
undefined

Ejaaz: Except this year, Josh, where they came to play and not one, (00:50):
undefined

Ejaaz: but two AI models achieved not silver, but gold medals, which is just an insane thing, right? (00:53):
undefined

Ejaaz: So it should be all fun and games, right? What a fairytale story. (01:01):
undefined

Ejaaz: Well, unfortunately, OpenAI and Google got into an online spat where they started (01:05):
undefined

Ejaaz: accusing each other of cheating. (01:11):
undefined

Ejaaz: Now, remember, these are trillion dollar companies. So essentially, (01:13):
undefined

Ejaaz: Josh, I was teleported this weekend back to my high school days where I felt (01:17):
undefined

Ejaaz: like the teacher had to come in, separate the kids from arguing over some kind (01:20):
undefined

Ejaaz: of random homework problem and get them to chill out. (01:25):
undefined

Josh: We will look back at this episode and laugh at it like it's a joke because these (01:28):
undefined

Josh: AIs, they're competing against high schoolers. That's so lame. (01:31):
undefined

Josh: Only high schoolers? Like, come on, and you're just barely getting gold. (01:35):
undefined

Ejaaz: Well, in their defense, Josh, these are some pretty smart high schoolers, (01:38):
undefined

Ejaaz: man. Like I was looking at some of these math problems. (01:42):
undefined

Ejaaz: I don't know if you can see my screen here. I'm sharing the official site. (01:44):
undefined

Ejaaz: And if you look at some of these problems, here we go. (01:48):
undefined

Ejaaz: And then like, okay, so they have basically, they host this competition in a (01:53):
undefined

Ejaaz: different country each year. (01:56):
undefined

Ejaaz: And you can kind of like download the test yourselves after the fact to see (01:57):
undefined

Ejaaz: how well you could do it. I had a look at this one, Josh from the Afrikaans. (02:01):
undefined

Ejaaz: I basically don't understand anything. One second. All right, (02:07):
undefined

Ejaaz: take a look at that. Take a look at this. (02:10):
undefined

Josh: That looks like quite a bit of squiggly lines on a page. (02:13):
undefined

Ejaaz: You know what? That could be mistaken for a piece of art in a gallery if you (02:17):
undefined

Ejaaz: didn't peer too closely at it. This looks insane. (02:22):
undefined

Josh: Okay, so I take it back. So the high schoolers are probably pretty smart then. (02:25):
undefined

Josh: And I guess the AI performing as well as the high schoolers is probably a pretty big deal, right? (02:28):
undefined

Josh: Because that looks like very complicated math problems that I'm assuming most (02:33):
undefined

Josh: of the smartest people in the world cannot solve. (02:37):
undefined

Ejaaz: Exactly. Yeah. This is like something that is technically set for high schoolers (02:39):
undefined

Ejaaz: and sometimes college kids, but is meant to demonstrate prowess in the field. (02:44):
undefined

Ejaaz: So there's a lot of university academics, which obviously do math degrees and (02:50):
undefined

Ejaaz: they do PhDs, but those are in very specific problems. So you kind of like in (02:54):
undefined

Ejaaz: science, you just need to kind of pick and choose your lane and then dedicate your life to it. (02:58):
undefined

Ejaaz: High schoolers is kind of college kids are kind of like the last point before (03:03):
undefined

Ejaaz: you jump into your specialization. (03:07):
undefined

Ejaaz: So really, if you're the best at generalized maths, you're going to compete in this competition. (03:09):
undefined

Ejaaz: And what's so interesting is typically AI models haven't been able to perform (03:13):
undefined

Ejaaz: very well because they needed a lot of context beforehand about the problem, Josh. (03:17):
undefined

Ejaaz: So they needed to know that, you know, there was certain, you know, (03:22):
undefined

Ejaaz: X equals something and Y equals something. (03:27):
undefined

Ejaaz: And they had to have defined parameters to kind of figure out the problem. (03:30):
undefined

Ejaaz: But this was the first time that AI models basically were just given a blank (03:33):
undefined

Ejaaz: sheet of paper or not a blank sheet of paper. (03:37):
undefined

Ejaaz: But they stared at the problem just as we just looked at it just now and had (03:38):
undefined

Ejaaz: to read the words, read the characters, interpret what that meant in the context (03:42):
undefined

Ejaaz: of that situation and the way that the question was framed and then figure it out themselves. (03:47):
undefined

Ejaaz: So it's as if the AI models had a camera that looked at a paper, (03:51):
undefined

Ejaaz: similar way that we look at test papers as kids through our eyes and figure it out themselves. (03:55):
undefined

Josh: So what changed? What happened in the last year that made it so much better? (04:01):
undefined

Josh: Because it went from, what, basically zero of six to now six or five of six questions answered. (04:06):
undefined

Josh: Now it's a gold medalist. So what happened? (04:11):
undefined

Ejaaz: So listen, I'm not going to try and explain it, but maybe you and I can decipher (04:14):
undefined

Ejaaz: it through the legends themselves that built these models, right? (04:18):
undefined

Ejaaz: Okay, so let me paint the scene for you, Josh. (04:22):
undefined

Ejaaz: It is Saturday evening. (04:24):
undefined

Ejaaz: You know, normal people are usually out and about. They're having fun. (04:27):
undefined

Ejaaz: They're probably having dinner, catching up with friends or chilling at home, watching a movie. (04:30):
undefined

Ejaaz: And this guy called Alexander Wei, who is OpenAI's head of reasoning. (04:34):
undefined

Ejaaz: Reasoning is basically this new fancy technique that AI models have typically (04:39):
undefined

Ejaaz: demonstrated, which has brought them up to like the frontier level of AI models. (04:43):
undefined

Ejaaz: Basically, if your model can do reasoning, it's typically a pretty smart model, right? (04:47):
undefined

Ejaaz: And he posts this tweet saying, I'm excited to share that our latest OpenAI (04:52):
undefined

Ejaaz: Experimental Reasoning LLM has achieved a longstanding grand challenge in AI, (04:57):
undefined

Ejaaz: a gold medal level performance on the world's most prestigious math competition, (05:02):
undefined

Ejaaz: the International Math Olympiad. (05:07):
undefined

Ejaaz: And he goes on to describe, you know, how the model basically took on each problem (05:09):
undefined

Ejaaz: in its own regard and solved it and how this is a massive success and win for (05:14):
undefined

Ejaaz: AI models and how, most importantly. (05:19):
undefined

Ejaaz: OpenAI was the first ever model to complete this. (05:21):
undefined

Ejaaz: And not too long after he posts that tweet, Josh, Sam Altman jumps in here, right? (05:25):
undefined

Ejaaz: And he goes, again, he kind of echoes similar thoughts. We achieved gold medal (05:31):
undefined

Ejaaz: level performance on the 2025 IMO competition with general purpose reasoning. (05:34):
undefined

Ejaaz: And then he kind of like shells GPT-5 at the end. Basically, (05:38):
undefined

Ejaaz: it's like a promotive thing for OpenAI. (05:41):
undefined

Ejaaz: And I will say that this is really cool because what they've achieved is something (05:44):
undefined

Ejaaz: that hasn't been done before, right? So very impressive feat. (05:49):
undefined

Ejaaz: And in terms of how this works specifically, Cheryl Su here gives a really good breakdown. (05:53):
undefined

Ejaaz: She says, the model solves these problems without tools like coding or Lean, (05:58):
undefined

Ejaaz: which is another coding tool. (06:04):
undefined

Ejaaz: It just uses natural language. So as I said earlier, It kind of reads the paper (06:05):
undefined

Ejaaz: and just kind of interprets what it thinks it means. (06:09):
undefined

Ejaaz: And it also has the same amount of time to do the test as other kits, so 4.5 hours. (06:12):
undefined

Ejaaz: And she says, we see the model reason at a very high level, trying out different (06:17):
undefined

Ejaaz: strategies, making observations from examples, and testing different hypotheses out. (06:22):
undefined

Ejaaz: And she says, it's crazy how we've gone from 12% on the AIME test, (06:27):
undefined

Ejaaz: which is what GPT-4O, which is OpenAI's early model, got to IMO gold, (06:32):
undefined

Ejaaz: International Math Olympiad gold medal in 15 months. (06:38):
undefined

Ejaaz: So just to set that in context, Josh, that is a crazy leap in 15 months. (06:41):
undefined

Ejaaz: Imagine going from eighth grade level math to the best. (06:45):
undefined

Ejaaz: Mathematician in the world in 15 months. It's a pretty insane thing. (06:51):
undefined

Ejaaz: Yeah, I'd say so. So essentially the breakthrough that Cheryl is highlighting (06:55):
undefined

Ejaaz: here is number one, the model didn't need any context. (06:58):
undefined

Ejaaz: Number two, it used really high level reasoning to figure out the problems from first principles. (07:03):
undefined

Ejaaz: And number three, it was able to test out multiple hypotheses at the same time (07:08):
undefined

Ejaaz: instead of trying to one shot the problem. (07:14):
undefined

Ejaaz: Typically in the past when AI models have been given a prompt or a problem, (07:15):
undefined

Ejaaz: it tries to just like give it its best shot and give you one solution, Josh. (07:19):
undefined

Ejaaz: Whereas what these models, these reasoning models do really well is they are (07:23):
undefined

Ejaaz: able to hypothetically entertain many different scenarios and then pick the (07:27):
undefined

Ejaaz: best one of which it thought it was an answer. (07:30):
undefined

Ejaaz: And it ended up with the gold medal, which is insane, right? (07:32):
undefined

Ejaaz: But it wasn't entirely without a few glitches here and there, Josh. (07:34):
undefined

Ejaaz: So if you look at this post from Jasper, he read through the entire kind of (07:38):
undefined

Ejaaz: like problem set that OpenAI's model went through. and he points out that some weird anomalies. (07:42):
undefined

Ejaaz: So he kind of like talks about like how it kind of like analyzed and a bunch of things. (07:49):
undefined

Ejaaz: And he goes, however, the write-up is kind of messy. He goes, (07:52):
undefined

Ejaaz: it overuses shorthand and sentence fragments. (07:55):
undefined

Ejaaz: It introduces new terms without definitions, for example, forbidden and sunny partners. (07:58):
undefined

Ejaaz: I have no idea what either of those terms could mean, but it was just apparently (08:04):
undefined

Ejaaz: just interspersing these phrases during its analysis. (08:10):
undefined

Ejaaz: And so as a reviewer, or as an examiner, they were reading this, (08:13):
undefined

Ejaaz: they were like, sorry, wait, what is it talking about? (08:17):
undefined

Ejaaz: It got to the right answer, but what is it talking about, right? (08:20):
undefined

Ejaaz: The other key point from this post is it was unable to solve one problem, problem six. (08:23):
undefined

Ejaaz: And I'm not even gonna try and get into why it failed on that problem, (08:29):
undefined

Ejaaz: but it was just particularly hard for it to figure out. (08:33):
undefined

Ejaaz: But it still scored a high enough percentage that it got a gold medal. (08:36):
undefined

Ejaaz: So it's basically a win for OpenAI, but that's when the drama starts unfolding. (08:40):
undefined

Ejaaz: So I've got this post up from Mikhail Samin, which kind of like sparks this entire fight, Josh. (08:44):
undefined

Ejaaz: He goes, according to a friend, the IMO, which is the International Math Olympiad. (08:51):
undefined

Ejaaz: Asked AI companies not to steal the spotlight from kids and to wait a week after (08:55):
undefined

Ejaaz: the closing ceremony to announce the results. (09:01):
undefined

Ejaaz: OpenAI instead announced the results before the closing ceremony. Yeah. (09:04):
undefined

Ejaaz: And then he goes on to basically say how this is essentially like some kind (09:09):
undefined

Ejaaz: of clout chasing move from OpenAI. (09:13):
undefined

Ejaaz: And OK, I tried to evaluate this, Josh, from OpenAI's kind of perspective, (09:16):
undefined

Ejaaz: which is they basically want to steal the limelight, (09:20):
undefined

Ejaaz: but also say that they were the first AI model to ever achieve gold on this (09:23):
undefined

Ejaaz: competition, which puts them in a good light and makes users want to choose (09:27):
undefined

Ejaaz: OpenAI and solidify the branding that OpenAI is the best. right? (09:31):
undefined

Ejaaz: But on the other side, you know, they're kind of like stealing the spotlight (09:35):
undefined

Ejaaz: from the kids, as this post says. But that's not actually the main trope. (09:39):
undefined

Ejaaz: The main trope here, Josh, is OpenAI wasn't the only model to achieve a goal, right? (09:44):
undefined

Ejaaz: At the same time, during the same testing period, you had Google achieving the exact same score. (09:50):
undefined

Ejaaz: So then the question becomes, okay, well, it was whoever was ethical about announcing their own result. (09:58):
undefined

Ejaaz: This post from Demis Hassabis, which is Google's head of AI, (10:04):
undefined

Ejaaz: basically posts, and I'll note two days later, Official results are in. (10:09):
undefined

Ejaaz: Gemini, which is their flagship model, achieved gold medal level in the International Math Olympiad. (10:14):
undefined

Ejaaz: An advanced version was able to solve five out of six problems. (10:20):
undefined

Ejaaz: So same as OpenAI, same thing, struggled on the sixth problem. (10:23):
undefined

Ejaaz: Incredible progress. Huge congrats to the team. (10:26):
undefined

Ejaaz: And a tweet here says that Google (10:29):
undefined

Ejaaz: basically had to wait for marketing to approve the tweet until Monday. (10:31):
undefined

Ejaaz: But OpenAI shared theirs first at 1 a.m. (10:35):
undefined

Ejaaz: On Saturday and stole the spotlight. (10:38):
undefined

Ejaaz: And we see the screenshot from Demis Hassabis, which, you know, (10:40):
undefined

Ejaaz: he further clarifies this, basically saying, (10:44):
undefined

Ejaaz: by the way, as an aside, we didn't announce on Friday because we respected the (10:46):
undefined

Ejaaz: IMO's board's original request that all AI labs share the results only after (10:50):
undefined

Ejaaz: the official results have been verified. (10:55):
undefined

Ejaaz: Now that we've been given permission to share, blah, blah, blah, (10:57):
undefined

Ejaaz: he shares. So Demis is playing the like good Samaritan here. (10:59):
undefined

Ejaaz: He's like, ah, you know, we also have the good model, but we, (11:02):
undefined

Ejaaz: you know, we have some pride and some manners about how we deal with these things. (11:06):
undefined

Ejaaz: That's where it starts to get a little uglier, Josh, because we have OpenAI (11:10):
undefined

Ejaaz: chiming in to this tweet, which basically says, and this is some random commenting (11:15):
undefined

Ejaaz: on OpenAI and this entire situation. (11:21):
undefined

Ejaaz: So OpenAI basically has zero advantages except the size of the team, (11:24):
undefined

Ejaaz: aka the OpenAI team was claimed to be smaller than Google Gemini's team. (11:30):
undefined

Ejaaz: So what he's inferring here is there's no real difference between OpenAI's models (11:34):
undefined

Ejaaz: and Google Gemini's models. You can pretty much use either or. (11:38):
undefined

Ejaaz: OpenAI maybe has a smaller team to build that model, but who the hell cares? (11:42):
undefined

Ejaaz: And then one of the AI model researchers at OpenAI basically comes in and says, (11:46):
undefined

Ejaaz: well, I think it's also interesting that they they (11:52):
undefined

Ejaaz: being google curated and provided useful context (11:55):
undefined

Ejaaz: to the model which we did not feels like (11:59):
undefined

Ejaaz: taking your tutor's cheat sheet with you into the exam so shots basically being (12:02):
undefined

Ejaaz: fired from open ai saying hey um you cheated you gave context to your model (12:07):
undefined

Ejaaz: and that was why it was able to achieve gold we open ai didn't provide any of (12:12):
undefined

Ejaaz: that context and it was able to reason from first principles, there you have it. (12:17):
undefined

Ejaaz: But then directly beneath it, Vinay Rameshes, who is a Google DeepMind AI researcher, responds, (12:21):
undefined

Ejaaz: it's worth noting actually that a deep think system, which is Google's AI system (12:27):
undefined

Ejaaz: with no access to this corpus, so no context, also got gold. (12:32):
undefined

Ejaaz: Again, according to the official graders, and he puts this in brackets because (12:36):
undefined

Ejaaz: OpenAI didn't wait for the official graders to mark their score, (12:40):
undefined

Ejaaz: with exactly the same score. (12:44):
undefined

Ejaaz: So basically, this is like a pissing contest between two of the top AI model providers. (12:45):
undefined

Ejaaz: Here's my take, Josh. And then I really want to kind of lean into what you think (12:52):
undefined

Ejaaz: about this whole debacle. (12:56):
undefined

Ejaaz: Number one, this seems so childish to me. (12:57):
undefined

Ejaaz: Like, eventually, AI models were eventually going to get smarter or smart enough (13:00):
undefined

Ejaaz: to solve these mathematical problems. (13:05):
undefined

Ejaaz: And I think you said this earlier on. (13:07):
undefined

Ejaaz: This is something that they're going to probably laugh about 10 years from now, (13:10):
undefined

Ejaaz: right? that they were able to solve whatever, the most complex mathematic problems (13:14):
undefined

Ejaaz: for humans, mere humans. (13:17):
undefined

Ejaaz: And now AI is off creating wonderful scientific discoveries for us that we would (13:19):
undefined

Ejaaz: have never comprehended or figured out ourselves, right? (13:24):
undefined

Ejaaz: So firstly, you're arguing over something that's so silly. (13:27):
undefined

Ejaaz: But number two, this kind of seems desperate on the open AI side. (13:31):
undefined

Ejaaz: And maybe I'm being biased, but I'm just going to give you my take. (13:36):
undefined

Ejaaz: Open AI has kind of had a series of stumbles recently. (13:39):
undefined

Ejaaz: They claimed that they were going to release gpt5 which (13:43):
undefined

Ejaaz: is their brand new frontier model but they've delayed it many months (13:45):
undefined

Ejaaz: now um they got outperformed by (13:49):
undefined

Ejaaz: grok 4 from xai uh so now (13:52):
undefined

Ejaaz: they have a new benchmark that they need to beat a new model that they basically (13:55):
undefined

Ejaaz: need to outcompete uh they claimed that they were going to release a new open (13:58):
undefined

Ejaaz: source model and then delayed it after a chinese open source model was released (14:02):
undefined

Ejaaz: and had one trillion parameters and outperformed not just their model, (14:07):
undefined

Ejaaz: but any other open source model out there. (14:11):
undefined

Ejaaz: And so I feel like they're looking (14:13):
undefined

Ejaaz: for a win, right? They released their agent this week or last week. (14:16):
undefined

Ejaaz: And so, you know, that had mixed review, mixed feedback. (14:21):
undefined

Ejaaz: So I feel like Sam is desperate for a win. (14:24):
undefined

Ejaaz: People are criticizing consistently their moat, asking what has OpenAI got? (14:26):
undefined

Ejaaz: They've lost a ton of researchers to Meta and other companies. (14:32):
undefined

Ejaaz: I feel like their back's against the wall. (14:35):
undefined

Ejaaz: Sam's scared and he basically needs to grab any kind of win. (14:37):
undefined

Ejaaz: So it reeks of desperation. (14:41):
undefined

Ejaaz: What's your take, Josh? (14:43):
undefined

Josh: I do empathize with the team. They've been coming under fire from every single angle. (14:44):
undefined

Josh: I mean, you have Zuck poaching all of their talent, and then all of the other (14:49):
undefined

Josh: open-source AI models are beating them at their own game. (14:54):
undefined

Josh: And they're just kind of, they're really getting beat up now. (14:57):
undefined

Josh: And I think that they're looking to get some footing. I'm sure this probably plays a role in it. (15:00):
undefined

Josh: But I'm sure behind the scenes, they're really trying to fight hard to put their (15:04):
undefined

Josh: feet back on stable ground, to get GPT-5 out the door, to build Project Stargate (15:09):
undefined

Josh: and make this big infrastructure network. (15:13):
undefined

Josh: They need some wins. So sure, this was probably an attempt to get ahead, (15:14):
undefined

Josh: make them look good, win over some more hearts and minds. (15:18):
undefined

Josh: But I think the most interesting part of the whole story is less the drama and (15:21):
undefined

Josh: more the fact that these models were able to accomplish a really impressive (15:25):
undefined

Josh: feat over such a short period of time. (15:28):
undefined

Josh: From what I understand, previously when they attempted to solve these problems, (15:30):
undefined

Josh: they used a custom training data set. (15:35):
undefined

Josh: They used custom tool sets. It was mostly a model trained on solving mathematical problems. (15:37):
undefined

Josh: And with this version, both the OpenAI version and the Gemini models, (15:42):
undefined

Josh: they were both general purpose models. (15:49):
undefined

Josh: They were not trained specifically with the intention of solving mathematical problems. (15:51):
undefined

Josh: These are the general models that people day to day are using. (15:55):
undefined

Josh: They're just now able to solve these math problems using this new general intelligence. (15:58):
undefined

Josh: So it's a really interesting breakthrough that I think we get from reinforcement (16:02):
undefined

Josh: learning that now there is not so much of an advantage to training a model specific (16:05):
undefined

Josh: to one's skill set when you could just make it great at everything. (16:11):
undefined

Josh: There was one thing that I noticed that some people call it cheating, other people don't. (16:14):
undefined

Josh: But so with the mathematical, with the actual test that high school was had (16:19):
undefined

Josh: to take, they're not allowed to use tools and they have a limited amount of (16:23):
undefined

Josh: time per question to answer. (16:26):
undefined

Josh: The models that, the OpenAI model and the Gemini model, they had infinite amount (16:28):
undefined

Josh: of time to answer and they were allowed to use tools. (16:32):
undefined

Josh: So there still are small differences in these. (16:34):
undefined

Ejaaz: Were they allowed to like use the internet? (16:37):
undefined

Josh: I don't know the specifics. I would imagine at least calculators, (16:39):
undefined

Josh: at most probably the full repertoire of what we have currently available to (16:42):
undefined

Josh: us, which is full internet search, code writing abilities. They could do their (16:46):
undefined

Josh: own mathematical checks. (16:49):
undefined

Josh: So I would just assume the minimum amount of constraints possible. (16:51):
undefined

Josh: So there was much less constraints on the models, But they did solve the questions. (16:54):
undefined

Josh: And I think that's super impressive. They got five out of six right. (16:58):
undefined

Josh: Which was gold and better than almost every student, if I'm not mistaken. (17:02):
undefined

Josh: Only a few students got the six out of six completely correct. (17:06):
undefined

Josh: It's just cool to see the rate of progress of these models getting better. (17:10):
undefined

Josh: That over the course of the last 15 months or so, they went from horrible and (17:12):
undefined

Josh: narrowly trained to incredible and generally trained. (17:17):
undefined

Josh: And as long as that trend keeps going, I think the drama matters less than the (17:20):
undefined

Josh: output, which is models are getting really good at solving really hard math problems. (17:24):
undefined

Josh: And original ones too, that the world has never seen before. (17:28):
undefined

Ejaaz: Yeah, well, that last point is actually the main takeaway that I had, (17:31):
undefined

Ejaaz: Josh, which is it's original, never-before-seen problems. (17:35):
undefined

Ejaaz: Typically, these AI models are trained on things that they've seen before, as you said, right? (17:39):
undefined

Ejaaz: They're trained on data sets. So they've already seen the problem, (17:44):
undefined

Ejaaz: and then they have to work out, they know the answer, and they have to work (17:46):
undefined

Ejaaz: out how to get there, right? So they kind of have a leading factor. (17:49):
undefined

Ejaaz: Here, it's just kind of like completely unknown. (17:52):
undefined

Ejaaz: The other thing is, this is kind of like the culmination of a trend, (17:56):
undefined

Ejaaz: Josh, which is these AI models are really good at doing kind of binary tasks. (18:01):
undefined

Ejaaz: And I don't want to reduce mathematics to binary tasks, but technically it's (18:07):
undefined

Ejaaz: numbers, sequential formulas, that kind of stuff, right? (18:12):
undefined

Ejaaz: So if you can run enough compute at a thing, and if you can get that AI model (18:18):
undefined

Ejaaz: to consider all different decision parts, It's going to eventually get to the answer, right? (18:23):
undefined

Ejaaz: But it's always a specific answer at the end of that, right? (18:28):
undefined

Ejaaz: Whereas when it comes to more subjective things, more human experiential things, (18:32):
undefined

Ejaaz: AI has typically struggled to... (18:37):
undefined

Ejaaz: Improve at the same rate that it has for like all these different scientific (18:40):
undefined

Ejaaz: and math problems so i'm glad that we've reached this pinnacle feat i think (18:43):
undefined

Ejaaz: ai models have are really good at one thing and not so great at other things (18:47):
undefined

Ejaaz: and i'm excited to see how like they kind of like try to start leapfrogging (18:53):
undefined

Ejaaz: each other over the next couple of years. (18:57):
undefined

Josh: Yeah it's it's that directional progress that we like (18:59):
undefined

Josh: math is clearly the first because you can write down (19:02):
undefined

Josh: proofs and you could check your work and there is an actual verifiable solution (19:05):
undefined

Josh: and i think that's why we're seeing a lot of the progress start early (19:08):
undefined

Josh: in math and then hopefully go on to these other places but (19:12):
undefined

Josh: what we are seeing is these first signs of (19:15):
undefined

Josh: new knowledge breakthroughs where it's solving a (19:18):
undefined

Josh: new and novel problem that hasn't been (19:21):
undefined

Josh: released before based on its previous data set (19:24):
undefined

Josh: so it's not just pattern matching like you mentioned earlier where it has (19:28):
undefined

Josh: this data set of questions it's kind of finding the right examples and (19:31):
undefined

Josh: then applying that logic to the question it's actually (19:34):
undefined

Josh: reasoning and it's it's reasoning in many instances and (19:37):
undefined

Josh: then it's comparing its work and it's it's coming to a conclusion (19:40):
undefined

Josh: and we saw this with the grok heavy model last week too when (19:44):
undefined

Josh: it released um where i think the the new (19:46):
undefined

Josh: meta is many instances solving hard (19:49):
undefined

Josh: problems and then comparing so you lower that error rate more (19:52):
undefined

Josh: and more and more each time and what we're seeing is great progress so (19:55):
undefined

Josh: i mean although open ai and google are fighting again they're both they're both (19:59):
undefined

Josh: fighting over over exciting progress and sure maybe one tried to sweep in and (20:04):
undefined

Josh: steal the valor but they both did an excellent job in actually completing these (20:08):
undefined

Josh: problems and placing gold in a test that was previously not possible to do from an ai model you (20:13):
undefined

Ejaaz: Know who the real winners are here out of this josh. (20:19):
undefined

Josh: Who's that high school kids (20:21):
undefined

Ejaaz: Who now have an AI model that can do all their math homework for them. (20:24):
undefined

Josh: Isn't that incredible? Like, man, think about it. (20:28):
undefined

Ejaaz: I wish I had that. (20:30):
undefined

Josh: You have an AI model that is as smart as the smartest people on planet Earth (20:31):
undefined

Josh: in high school. If it could solve those math problems, it could solve anything. (20:35):
undefined

Ejaaz: It sounds human as well, Josh. So, like, your teacher is going to struggle unless (20:38):
undefined

Ejaaz: they use AI themselves to figure out whether you just did that yourself or completely (20:43):
undefined

Ejaaz: just ran that through GPT, your mom's GPT subscription. (20:48):
undefined

Josh: It really forces you to re-evaluate the school model, right? (20:51):
undefined

Josh: Because now that this information is so readily accessible, it's so easy to solve these problems. (20:54):
undefined

Josh: Is that the actual thing worth learning? Or is it how to use these tools that's (20:59):
undefined

Josh: more important to get to the answer? (21:04):
undefined

Josh: And there's this there's this dual pronged approach and we see we see (21:06):
undefined

Josh: developers and programmers talk about this a lot where as soon (21:09):
undefined

Josh: as they start to rely too heavily on the tools they start (21:12):
undefined

Josh: to lose their touch they start to lose their ability to to deeply (21:14):
undefined

Josh: understand how it reaches conclusions um but (21:18):
undefined

Josh: is that worth it in exchange for getting to the answer much quicker and then (21:22):
undefined

Josh: being able to seek many more answers i don't know it's weird dynamic if i was (21:25):
undefined

Josh: a teacher i'd be worried because i mean similar to what we saw with the calculator (21:29):
undefined

Josh: it just replace the thinking process and just yield you an answer and (21:32):
undefined

Ejaaz: The thing with the calculator is like you you're (21:38):
undefined

Ejaaz: using the calculator so it figures out the answer for you but you kind of (21:42):
undefined

Ejaaz: loosely understand how it is working right you (21:44):
undefined

Ejaaz: know what numbers it's crunching to get to that answer and then typically you (21:48):
undefined

Ejaaz: do a few things on a calculator and then you get to your eventual answer for (21:52):
undefined

Ejaaz: whatever the original question was the issue with or the concern that you're (21:57):
undefined

Ejaaz: highlighting here with AI is it's doing really complex problems, (22:01):
undefined

Ejaaz: which kids don't even need to understand in the first place just to get an answer, (22:07):
undefined

Ejaaz: which they can then give to their teacher, get a grade and then go to university. (22:11):
undefined

Ejaaz: But the kids don't actually learn actively in that process. (22:15):
undefined

Ejaaz: And it's going to be a concerning trend if we see kids just trying to go from (22:19):
undefined

Ejaaz: zero to 100% without understanding anything in between. (22:24):
undefined

Ejaaz: A trend to watch. (22:29):
undefined

Josh: This is our episode from a few weeks ago. Is AI making you dumber? (22:31):
undefined

Josh: Yes. And I think that's just going to continue to be the question. (22:34):
undefined

Josh: Oh, God. And I think the answer is it's all dependent on how you choose to use (22:37):
undefined

Josh: the tools that you're given. (22:41):
undefined

Josh: And if you use these tools as further leverage. So I'm sure these math olympiads (22:42):
undefined

Josh: who can actually complete the problems would love to have this model to check (22:46):
undefined

Josh: the problems and to work through the problems and to figure out shortcuts on (22:50):
undefined

Josh: solving these problems. (22:53):
undefined

Josh: Where if you deeply understand it, then this becomes an amazing tool to check (22:54):
undefined

Josh: your work, to generate new questions for you. (22:57):
undefined

Josh: It's a great study, buddy. or if you are not an olympiad and you still want (22:59):
undefined

Josh: to get to the answer well you just kind of cheat your way through and you just (23:04):
undefined

Josh: ask it for exactly what you want so it's that it's that split again and it's (23:07):
undefined

Josh: up to the person to take their own agency solve their own problems and try to (23:11):
undefined

Josh: use these for for tools of leverage instead of just problem solving machines that (23:14):
undefined

Ejaaz: Actually reminds me of this tweet i saw yesterday josh um so what you're looking (23:19):
undefined

Ejaaz: at here is a tweet from dave white dave White is a very prestigious investment (23:23):
undefined

Ejaaz: slash research advisor at this fund called Paradigm, (23:28):
undefined

Ejaaz: which basically it's a crypto fund, but it is one of the wealthiest funds out there. (23:32):
undefined

Ejaaz: So a lot of the investments they made were massive wins. And a lot of the reasoning (23:37):
undefined

Ejaaz: of those wins was from Dave White's analysis. (23:42):
undefined

Ejaaz: He is a deeply thoughtful mathematician at his core, and he is famed for doing (23:44):
undefined

Ejaaz: a lot of analyses on companies, mathematical analyses that have ended up, you know. (23:50):
undefined

Ejaaz: Determining whether a fund puts $100 million in a company or zero, right? (23:57):
undefined

Ejaaz: So a very important job worth hundreds of millions of dollars, right? (24:01):
undefined

Ejaaz: And what he says here, basically, is him having an identity crisis, (24:04):
undefined

Ejaaz: because he has looked up to the IMO, the International Math Olympiad. (24:08):
undefined

Ejaaz: And he goes on to say in this tweet that subconsciously, whenever he's met a (24:13):
undefined

Ejaaz: gold medalist IMO champion, he's always subconsciously thought that they were (24:18):
undefined

Ejaaz: smarter than him, that he is more respecting of them. (24:22):
undefined

Ejaaz: And now with this news that AI models basically can do his job for him, (24:25):
undefined

Ejaaz: can reason better than him at some of these math problems, he now has an identity crisis. (24:30):
undefined

Ejaaz: He doesn't know kind of where to go from this. And if people like Dave White (24:35):
undefined

Ejaaz: is having this kind of like disillusioned sentiment from how smart AI is, (24:39):
undefined

Ejaaz: you can imagine how this is going to happen for everyone else in all of the (24:45):
undefined

Ejaaz: other sectors, Josh, right? (24:49):
undefined

Ejaaz: It doesn't matter if you're a mathematician or an investment research advisor, (24:50):
undefined

Ejaaz: you could be a technician in some kind of engineering industrial role, (24:54):
undefined

Ejaaz: or you could be a teacher, or you could be a kid or a high schooler. (24:59):
undefined

Ejaaz: I think this disillusionment is going to spread. And I think it's super important (25:02):
undefined

Ejaaz: for people to kind of like evolve their thinking, like you said, (25:06):
undefined

Ejaaz: Josh, and learn how to leverage these tools versus just consume. (25:09):
undefined

Josh: Yeah, this is, I mean, this is crazy. There's a lot of people that are going (25:13):
undefined

Josh: to have to adapt to this new world order of intelligence, where if you build (25:16):
undefined

Josh: up your entire identity around being intelligent, well, perhaps you're going to have to alter the way (25:21):
undefined

Josh: present yourself as intelligent because the meaning of intelligence is becoming (25:26):
undefined

Josh: commoditized among these tools that are now reduced down to a single chat box. (25:29):
undefined

Ejaaz: Yep. Benchmarks are going to have to reset themselves completely. (25:34):
undefined

Ejaaz: But folks, that is the end of this episode. Thank you so much for tuning in again. (25:37):
undefined

Ejaaz: Josh and I are going hammer and tong at Limitless. (25:42):
undefined

Ejaaz: Our goal is to get you the hottest and trending topics and news fresh out the (25:46):
undefined

Ejaaz: door, give you our commentary, our thoughts, and hopefully some useful insights for you. (25:51):
undefined

Ejaaz: If you enjoyed this episode if you enjoyed any of our previous episodes please (25:55):
undefined

Ejaaz: continue to share and spread them with all your friends and family and whoever (25:58):
undefined

Ejaaz: you think might be interested in this we are getting tons of feedback from you (26:02):
undefined

Ejaaz: guys and with every episode that we release we're getting better so please remember (26:05):
undefined

Ejaaz: to like subscribe follow us it's hugely appreciative and helpful for us and (26:09):
undefined

Ejaaz: we'll see you on the next one. (26:13):
undefined

All Episodes

Episode Transcript

Popular Podcasts

Dateline NBC

Stuff You Should Know

Law & Order: Criminal Justice System - Season 1 & Season 2

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}OpenAI and Google Just Beat the World's Smartest Mathematicians

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Dateline NBC

Stuff You Should Know

Law & Order: Criminal Justice System - Season 1 & Season 2

All Episodes

OpenAI and Google Just Beat the World's Smartest Mathematicians

Dateline NBC