Kimi K2 is the Open Source Claude-Killer | US vs China AI - The Limitless Podcast

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Ejaaz: A bunch of AI researchers from China just released a brand new AI model called (00:03):
undefined

Ejaaz: Kimi K2, which is not only as good as any other top model like Claude, (00:08):
undefined

Ejaaz: but it is also 100% open source, which means it's free to take, (00:13):
undefined

Ejaaz: customize and create into your own brand new AI model. (00:17):
undefined

Ejaaz: This thing is amazing at coding, it beats any other model at creative writing, (00:20):
undefined

Ejaaz: and it also has a pretty insane voice mode. (00:24):
undefined

Ejaaz: Oh, and I should probably mention that it is one trillion parameters in size, (00:27):
undefined

Ejaaz: which makes it one of the biggest and largest models to ever be created. (00:32):
undefined

Ejaaz: Josh, we were winding down on a Friday night and this news broke that this team (00:36):
undefined

Ejaaz: had released this model. (00:41):
undefined

Ejaaz: Absolutely crazy bomb, especially with like OpenAI rumored to release their (00:43):
undefined

Ejaaz: open source model this week. (00:48):
undefined

Ejaaz: You've been jumping into this. What's your take? (00:50):
undefined

Josh: Yeah. So last week we crowned Grok 4 as the new leading private model, closed source model. (00:54):
undefined

Josh: This week we got to give the crown to Kimi K2 we got another crown (00:59):
undefined

Josh: going for the open source team they are winning I mean this is (01:02):
undefined

Josh: better than DeepSeek and DeepSeek R2 this is basically DeepSeek R3 (01:05):
undefined

Josh: I would imagine um and if you remember back a couple months DeepSeek really (01:08):
undefined

Josh: flipped the world on its head because of how efficient it was and the algorithmic (01:12):
undefined

Josh: upgrades it made and I think what we see with Kimi K2 is a lot of the same thing (01:16):
undefined

Josh: it's it's these novel breakthroughs that come as a downstream effect of their (01:20):
undefined

Josh: needing to be resourceful (01:24):
undefined

Josh: China, they don't have the mega GPU clusters we have, they don't have all the (01:26):
undefined

Josh: cutting edge hardware, but they do have the software prowess to find these efficiencies. (01:30):
undefined

Josh: I think that's what makes this model so special. And that's what we're going (01:34):
undefined

Josh: to get into here is specifically what they did to make this model so special. (01:37):
undefined

Ejaaz: Yeah, I mean, look at these stats here, Josh, like 1 trillion parameters in total. (01:41):
undefined

Ejaaz: It's 32 billion active mixture of expert models. So what this means is, (01:46):
undefined

Ejaaz: although it's really large in size, typically these AI models can become pretty (01:50):
undefined

Ejaaz: inefficient if it's large in size, it uses this technique called mixture of (01:54):
undefined

Ejaaz: experts, which means that whenever someone queries a model, (01:58):
undefined

Ejaaz: it only uses or activates a number of parameters that are relevant for the query itself. (02:01):
undefined

Ejaaz: So it's more smarter, it's much more efficient, and it doesn't use or consume (02:07):
undefined

Ejaaz: as much energy as you would if you wanted to run it locally at home or whatever (02:12):
undefined

Ejaaz: that might be. It's also super cheap. (02:16):
undefined

Ejaaz: I think I saw somewhere that this was 20% the cost of clawed, (02:18):
undefined

Ejaaz: josh which uh we love that insane uh (02:21):
undefined

Ejaaz: for all the nerds that kind of want to run you know (02:25):
undefined

Ejaaz: really long tasks or you know just set and (02:28):
undefined

Ejaaz: forget the ai to to run on like your coding log or whatever that might mean (02:31):
undefined

Ejaaz: you can now do it at a much more affordable rate at one-fifth the cost uh than (02:34):
undefined

Ejaaz: some of the top models that are out there and it is as good as those models (02:39):
undefined

Ejaaz: so just insane kinds of things josh i know there's a bunch of things that you (02:42):
undefined

Ejaaz: wanted to point out here on benchmarks um And what do you want to get into? (02:46):
undefined

Josh: Yeah, it's really amazing. So they took 15 and a half trillion tokens and they (02:50):
undefined

Josh: condensed those down into a one trillion parameter model. (02:54):
undefined

Josh: And then what's amazing is when you use this model, like she said, (02:56):
undefined

Josh: it uses a thing called mixture of experts. (02:59):
undefined

Josh: So it has, I believe, 384 experts. (03:01):
undefined

Josh: And each expert is good at a specific thing. So let's say in the case you want (03:05):
undefined

Josh: to do a math problem, it will take a 32 billion parameter subset of the one (03:08):
undefined

Josh: trillion total parameters, and it will choose eight of these different (03:13):
undefined

Josh: Experts in a specific thing. So in the case of math, it'll find an expert that (03:17):
undefined

Josh: has the calculator tool. (03:21):
undefined

Josh: It'll find an expert that has a fact, like a fact checking tool or a proof tool (03:22):
undefined

Josh: to make sure that the math is accurate. (03:27):
undefined

Josh: It'll have just a series of tools to help itself. And that's kind of how it (03:29):
undefined

Josh: works so efficiently is instead of using a trillion parameters at once, (03:32):
undefined

Josh: it uses just 32 billion and it uses the eight best specialists out of the 384 (03:35):
undefined

Josh: that it has available to it. It's really impressive. (03:41):
undefined

Josh: And what we see here is the benchmarks that we're showing on screen. (03:44):
undefined

Josh: And the benchmarks are really good. (03:46):
undefined

Josh: It's up there in line with just about any other top model, except with the exception (03:48):
undefined

Josh: that this is open source. (03:52):
undefined

Josh: And there was another breakthrough that we had, which was the actual way that (03:53):
undefined

Josh: they handled the training of this. (03:57):
undefined

Josh: And yeah, this is the loss curve. So what you're looking at on screen for the (03:59):
undefined

Josh: people who are listening, it's this really pretty smooth curve that kind of (04:02):
undefined

Josh: starts at the top and it trends down in a very predictable and smooth way. (04:05):
undefined

Josh: And most curves don't look like this. And if they do look like this, (04:09):
undefined

Josh: it's because the company has spent tons and tons of money on error correction (04:13):
undefined

Josh: to make sure this curve is so smooth. (04:17):
undefined

Josh: So basically what you're seeing is the training run of the model. (04:19):
undefined

Josh: And a lot of times what happens is you get these very sharp spikes and it starts (04:22):
undefined

Josh: to defer away from the normal training run. (04:26):
undefined

Josh: And it takes a lot of compute to kind of recalibrate and push that back into the right way. (04:29):
undefined

Josh: What they've managed to do is really make it very smooth. (04:34):
undefined

Josh: And they've done this by increasing these efficiencies. So if you can think (04:37):
undefined

Josh: about it, there's this analogy I was thinking of right before we hit the record button. (04:40):
undefined

Josh: And it's if you were teaching a chef how to cook, right? (04:43):
undefined

Josh: So we have Chef Ejaz here. I am teaching him how to cook. I am an expert chef. (04:46):
undefined

Josh: And instead of telling him every ingredient and every step for every single (04:50):
undefined

Josh: dish, what I tell him is like, hey, if you're making this amazing dinner recipe, (04:54):
undefined

Josh: all you need that matters is this amount of salt applied at this time, (04:59):
undefined

Josh: this amount of heat applied for this length of time, and the other stuff doesn't matter as much. (05:04):
undefined

Josh: So just put in whatever you think is appropriate, but you'll get the same answer. (05:08):
undefined

Josh: And that's what we see with this model is just an increased amount of efficiency by being (05:11):
undefined

Josh: direct by being intentional about the data that they used to train it on, (05:16):
undefined

Josh: the data that they used to fetch in order to give you high quality queries. (05:20):
undefined

Josh: And it's a really novel breakthrough. They call it the MuonClip optimizer, (05:24):
undefined

Josh: which, I mean, it's a Chinese company, maybe it means something special there, (05:28):
undefined

Josh: but it is a new type of optimizer. (05:32):
undefined

Josh: And what you're seeing in this curve is that it's working really well and it's (05:34):
undefined

Josh: working really efficient. (05:37):
undefined

Josh: And that's part of the benefit of having this open source is now we have this (05:37):
undefined

Josh: novel breakthrough and we could take this and we could use this for even more (05:41):
undefined

Josh: breakthroughs even more open source models and and that's part that's been really cool to see (05:44):
undefined

Ejaaz: I i mean this is just um time (05:48):
undefined

Ejaaz: and again from china uh so so amazing from their research team so so like just (05:52):
undefined

Ejaaz: to kind of like um pick up your comment on deep seek at the end of last year (05:57):
undefined

Ejaaz: we were utterly convinced that the only way to create a breakthrough model was (06:02):
undefined

Ejaaz: to spend billions of dollars on compute clusters. (06:07):
undefined

Ejaaz: And so therefore it was a pay-to-play game. And then DeepSeek, (06:10):
undefined

Ejaaz: a team out of China, released their model and completely open-sourced it as well. (06:14):
undefined

Ejaaz: And it was as good as OpenAI's Frontier model, which was the top model at the time. (06:19):
undefined

Ejaaz: And the revelation there was, oh, you don't actually just need to chuck a bunch of compute at this. (06:24):
undefined

Ejaaz: There are different techniques and different methods if you get creative about (06:31):
undefined

Ejaaz: how you design your model and how you run the training cluster, (06:35):
undefined

Ejaaz: the training one, which is basically what you need to do to make your model smart, (06:38):
undefined

Ejaaz: you can run it in different ways that is more efficient, consumes less energy, (06:42):
undefined

Ejaaz: and therefore less amount of money, but is as smart, if not smarter, (06:47):
undefined

Ejaaz: than the frontier models that American AI companies are making. (06:52):
undefined

Ejaaz: And this is just a repeat of that, Josh. (06:55):
undefined

Ejaaz: I mean, look at this curve. For those who are looking at this episode on video. (06:57):
undefined

Ejaaz: It is just so clean yeah it's beautiful (07:03):
undefined

Ejaaz: the craziest part about this is when deep (07:06):
undefined

Ejaaz: seek was released they pioneered something called uh reasoning (07:09):
undefined

Ejaaz: or reinforcement learning uh which are two separate (07:12):
undefined

Ejaaz: techniques that made the model super smart um with less energy and less compute (07:15):
undefined

Ejaaz: spend um with this model they didn't even implement that technique at all so (07:19):
undefined

Ejaaz: theoretically this model can get so much more smarter than it already is um (07:24):
undefined

Ejaaz: and they just kind of leveraged a new method to make it as smart as it already is right now. (07:29):
undefined

Ejaaz: So just such a fascinating kind of like progress in research from China. (07:34):
undefined

Ejaaz: And it just keeps on coming out. It's so impressive. (07:39):
undefined

Josh: Yeah, this is this was the exciting part to me is that we're seeing so many (07:42):
undefined

Josh: algorithms or exponential improvements in so many different categories. (07:46):
undefined

Josh: So this was considered a breakthrough by all means. And this wasn't even the (07:49):
undefined

Josh: same type of breakthrough that DeepSeek had. (07:53):
undefined

Josh: So we get this now compounding effect where we have this new training breakthrough (07:55):
undefined

Josh: and then we have DeepSeek who has the reinforcement learning and that hasn't (08:00):
undefined

Josh: even yet been applied to this new model. (08:03):
undefined

Josh: So we get the exponential growth on one end, the exponential growth on the reasoning end, (08:05):
undefined

Josh: those come together and then you get the exponential growth on the hardware (08:10):
undefined

Josh: stack where the GPUs are getting much faster and there's all of these different (08:13):
undefined

Josh: subsets of AI that are compounding on each other and growing and accelerating (08:17):
undefined

Josh: quicker and quicker and what you get is this unbelievable rate of progress and (08:21):
undefined

Josh: that's what we're seeing. So (08:25):
undefined

Josh: reasoning isn't even here yet and we're going to see it soon because it is open (08:26):
undefined

Josh: source so people can apply their own reasoning on top of it i'm sure the moonshot (08:29):
undefined

Josh: team is going to be doing their own reasoning version of this model and i'm (08:33):
undefined

Josh: sure we're going to be getting even more impressive results soon i see you have (08:37):
undefined

Josh: a post up here um about the testing and overall performance can you please share yeah (08:40):
undefined

Ejaaz: Yeah so um this is a tweet that summarizes really well how this model performs (08:46):
undefined

Ejaaz: in relation to other Frontier models. (08:51):
undefined

Ejaaz: And the popular comparison that's taken for Kimi K2 is against Claude. (08:53):
undefined

Ejaaz: So Claude has a bunch of models out. (08:59):
undefined

Ejaaz: Claude 3.5 is its earlier model, and then Claude 4 is its latest. (09:01):
undefined

Ejaaz: And the general take is that this model is just better than those models, (09:05):
undefined

Ejaaz: which is just insane to say, because for so long, Josh, we've said that Claude (09:10):
undefined

Ejaaz: was the best coding model. (09:15):
undefined

Ejaaz: And indeed it was. And then within the span of, what is it, five days? (09:16):
undefined

Ejaaz: Grok 4 released and it just completely blew Claude 4 out of the water in terms of coding. (09:20):
undefined

Ejaaz: Now Kimi K2, an open source model out of China who doesn't even have access (09:26):
undefined

Ejaaz: to the research and kind of proprietary knowledge that a lot of American AI (09:30):
undefined

Ejaaz: companies have also beat it as well, right? (09:34):
undefined

Ejaaz: So it kind of beats Claude at its own game, but it's also cheaper. (09:36):
undefined

Ejaaz: It's 20% the cost of Claude 3.5, which is just an insane thing to say, (09:41):
undefined

Ejaaz: which means that if you are a developer out there that (09:45):
undefined

Ejaaz: wants to try your hand at kind of like vibe coding (09:49):
undefined

Ejaaz: a bunch of things or actually seriously coding something you (09:51):
undefined

Ejaaz: know that's quite novel but you don't have the hands on deck to do that you (09:55):
undefined

Ejaaz: can now spin up a Kimi K2 AI agent actually multiple of them for a very cost-efficient (09:58):
undefined

Ejaaz: reasonable you know salary you don't have to pay like hundreds of thousands (10:05):
undefined

Ejaaz: of dollars or you know hundreds of millions of dollars which is what Meta is (10:09):
undefined

Ejaaz: doing to kind of buy a bunch of these software engineers, (10:12):
undefined

Ejaaz: you can spend, you know, the equivalent of maybe a Netflix subscription or $500 (10:14):
undefined

Ejaaz: to $1,000 a month and spin up your own app. So super, super cool. (10:19):
undefined

Josh: And also one added perk that's there is it's that even if you have a lot of (10:23):
undefined

Josh: GPUs sitting around, you can actually run this model for free. (10:28):
undefined

Josh: So that's the cost if you actually query it from the servers. (10:31):
undefined

Josh: But I'm sure there's going to be companies that have access to XS GPUs. (10:34):
undefined

Josh: They can actually just download the model because it's open source, (10:36):
undefined

Josh: open weights, and they could run it on their own. (10:39):
undefined

Josh: And that brings the cost of compute down to the cost per kilowatt of the energy (10:41):
undefined

Josh: required to run the GPUs. (10:44):
undefined

Josh: So because it's open source, you really start to see these costs decline, (10:46):
undefined

Josh: but the quality doesn't. (10:49):
undefined

Josh: And that's every time we see this, we see a huge productivity unlock in encoding (10:50):
undefined

Josh: output and amount of queries used. It's like, this is freaking awesome. (10:55):
undefined

Ejaaz: Yeah josh i saw something else come up as well so so do you remember when claude (10:58):
undefined

Ejaaz: first released um their frontier model i think it was 3.5 or maybe it was four (11:04):
undefined

Ejaaz: one of their bragging rights was it had a one million uh token context window which. (11:09):
undefined

Josh: Oh yes which was huge (11:16):
undefined

Ejaaz: Yeah which for listeners of the show is huge it's like several uh book novels (11:17):
undefined

Ejaaz: worth um of words or characters you could just bung into one single prompt. (11:23):
undefined

Ejaaz: And the reason why that was such an amazing thing was for a while, (11:28):
undefined

Ejaaz: people struggled to kind of communicate with these AIs because they couldn't set the context. (11:32):
undefined

Ejaaz: There wasn't enough bandwidth within their chat log window for them to say, (11:37):
undefined

Ejaaz: you know, and don't forget this. And then there was this. (11:42):
undefined

Ejaaz: And then, you know, this detail and that detail, there just wasn't enough space. (11:44):
undefined

Ejaaz: And models weren't performing enough to kind of consume all of this in one go. (11:47):
undefined

Ejaaz: And then Claude came out and was like, hey, we have one million context windows. (11:51):
undefined

Ejaaz: Don't worry about it chuck in all the research papers that you want chuck in (11:54):
undefined

Ejaaz: your essay chuck in reference books and we got you um i saw this tweet that (11:57):
undefined

Ejaaz: was uh deleted i think you sent this to me um. (12:01):
undefined

Josh: We got the screenshots we always come with receipts yeah i (12:04):
undefined

Ejaaz: Wonder why they deleted it but uh good catch from you um yeah let's get into this. (12:07):
undefined

Josh: What's your take on it was was first posted i think (12:11):
undefined

Josh: earlier today yeah like an hour ago and then deleted pretty shortly afterwards (12:15):
undefined

Josh: and this is from a woman name crystal crystal works with the moonshot team she (12:19):
undefined

Josh: is part of the team that that released kimmy k2 um and in this post it says (12:23):
undefined

Josh: kimmy isn't just another ai it went viral in china as the first to support (12:28):
undefined

Josh: A 2 million token context window. And then she goes on to say, (12:32):
undefined

Josh: we're an AI lab with just 200 people, which is ministerially small compared (12:36):
undefined

Josh: to a lot of the other labs they're competing with. (12:41):
undefined

Josh: And it was acknowledgement that they had a 2 million token context window. (12:42):
undefined

Josh: And for those who, just a quick refresher on the context window stuff, (12:46):
undefined

Josh: it's imagine you have like a gigantic textbook and you've read it once and you (12:49):
undefined

Josh: close it and you kind of have a fuzzy memory of all the pages. (12:53):
undefined

Josh: The context window allows you to lay all of those out in clear view (12:56):
undefined

Josh: and directly reference every single page so when (12:59):
undefined

Josh: you have two million tokens which is roughly two million words (13:02):
undefined

Josh: of context we're talking about like hundreds and hundreds (13:05):
undefined

Josh: of books and textbooks and knowledge and you could really dump a (13:08):
undefined

Josh: lot of information in this for the ai to readily access and (13:11):
undefined

Josh: that if they release that a two million token (13:13):
undefined

Josh: open source model that's huge (13:17):
undefined

Josh: deal i mean even grok 4 recently i believe (13:20):
undefined

Josh: what did we say it was it was a 256 000 uh token context window something like (13:23):
undefined

Josh: that so grok 4 is one eighth of what they supposedly have accessible right now (13:27):
undefined

Josh: which is a really really big deal um so i'm hoping it was deleted because they (13:32):
undefined

Josh: just don't want to share that not because it's not true i would like to believe (13:37):
undefined

Josh: that it's true because man that'd be pretty epic yeah (13:39):
undefined

Ejaaz: And the people are loving it josh um check out this (13:42):
undefined

Ejaaz: graph from uh open router which basically shows (13:45):
undefined

Ejaaz: uh the split of usage between everyone (13:49):
undefined

Ejaaz: on their platform that are querying different models so for context (13:53):
undefined

Ejaaz: here open router is a website that you can go to (13:56):
undefined

Ejaaz: and you can type up a prompt just like you do at chat gpt and (13:59):
undefined

Ejaaz: you can decide which model your (14:03):
undefined

Ejaaz: prompt goes to or you could let open router decide for you (14:06):
undefined

Ejaaz: and it kind of like divvies up your query so if you have a coding query it's (14:09):
undefined

Ejaaz: probably going to send it to claude or now kimmy k2 or grok4 but if you have (14:13):
undefined

Ejaaz: something that's more like to do with creative writing or something that's like (14:18):
undefined

Ejaaz: a case study it might send it to OpenAI's O3 model, right? So it kind of like decides for you. (14:22):
undefined

Ejaaz: OpenRacha released this graphic, which basically shows that KimiK2 surpassed (14:27):
undefined

Ejaaz: XAI in token market share just a few days after launching, which basically means (14:33):
undefined

Ejaaz: that XAI spent, you know, (14:38):
undefined

Ejaaz: hundreds of billions of dollars training up their Grok4 model, (14:41):
undefined

Ejaaz: which just kind of beat out the competition just last week. (14:43):
undefined

Ejaaz: Then KimiK2 gets released completely open source (14:47):
undefined

Ejaaz: and everyone starts to use that more than (14:50):
undefined

Ejaaz: grok 4 which is just an insane thing to say and (14:53):
undefined

Ejaaz: just shows how rapidly these ai models compete with each other and surpass each (14:56):
undefined

Ejaaz: other um i think part of the reason for this josh is it's open source right (15:00):
undefined

Ejaaz: which means that not only are retail users like myself and yourself using it (15:05):
undefined

Ejaaz: for our daily queries you know uh you know, (15:12):
undefined

Ejaaz: create this recipe for me or whatever, but researchers and builders all over (15:14):
undefined

Ejaaz: the world that have so far been challenged or had this obstacle of pots of money (15:18):
undefined

Ejaaz: basically to start their own AI company now have access to a frontier, (15:25):
undefined

Ejaaz: world-renowned model and can create whatever application, website, (15:30):
undefined

Ejaaz: or product that they want to make. (15:34):
undefined

Ejaaz: So I think that's part of the usage there as well. Do you have any takes on this? (15:36):
undefined

Josh: Yeah, and it's downstream of cost, right? We always see this when a model is (15:40):
undefined

Josh: cheaper and mostly equivalent, the money will always flow to the cheaper model. (15:44):
undefined

Josh: It'll always get more queries. I think it's important to note the different (15:48):
undefined

Josh: use cases of these models. So they're not directly competing head to head on the same benchmarks. (15:51):
undefined

Josh: I think what we see is like when we talk about Claude, it's generally known as the coding model. (15:56):
undefined

Josh: And I don't think like OpenAI's O3 is not really competing directly with Claude (16:00):
undefined

Josh: because it's more of a general intelligence versus a coding specific intelligence. (16:04):
undefined

Josh: K2 is probably closer to a Claude. I would assume where it's really good at (16:08):
undefined

Josh: coding because it uses this mixture of experts. (16:13):
undefined

Josh: And I think that helps it find the tools. It uses this cool new novel thing (16:15):
undefined

Josh: called like multiple tool use. (16:19):
undefined

Josh: So each one of these experts can use a tool simultaneously and they could use (16:21):
undefined

Josh: these tools and work together to get better answers. (16:25):
undefined

Josh: So in the case of coding, this is a home run. (16:27):
undefined

Josh: Like it is very cheap cost per token, very high quality outputs. (16:30):
undefined

Ejaaz: I actually think you can compete with OpenAO3, Josh. Check this out. (16:33):
undefined

Ejaaz: So Rowan, yeah, Rowan Cheng put this out yesterday And he basically goes, (16:38):
undefined

Ejaaz: I think we're at the tipping point for AI-generated writing. (16:42):
undefined

Ejaaz: It's been notoriously bad, but China's Kimi K2, an open-weight model, (16:46):
undefined

Ejaaz: is now topping creative writing benchmarks. (16:51):
undefined

Ejaaz: So just to put that into context, that's like having the top most, I don't know, (16:53):
undefined

Ejaaz: smartest or slightly autistic software engineer, at the top engineering company (17:00):
undefined

Ejaaz: working on AI models, also being the best poet or creative script and directing (17:04):
undefined

Ejaaz: the next best movie or whatever that might be, (17:12):
undefined

Ejaaz: or creating a Harry Potter novel series. (17:14):
undefined

Ejaaz: This model can basically do both. And what it's pointing out here is that compared (17:17):
undefined

Ejaaz: to 03, it tops it. Look at this. Completely beats it. (17:22):
undefined

Josh: Okay, so I take that back. Maybe it is just better at everything. (17:27):
undefined

Josh: Yeah, that's some pretty impressive results. (17:31):
undefined

Ejaaz: I think like what's worth pointing out here is, and I don't know whether any (17:32):
undefined

Ejaaz: of the American AI models do this, Josh, but mixture of experts seems to be clearly a win here. (17:38):
undefined

Ejaaz: The ability to create an incredibly smart model doesn't come without, (17:43):
undefined

Ejaaz: you know, this large storage load that is needed, right? One trillion parameters. (17:47):
undefined

Ejaaz: But then combining it with the ability to be like, Like, hey, (17:52):
undefined

Ejaaz: you don't need to query the entire thing. (17:56):
undefined

Ejaaz: We've got you. We have a smart router, which basically pulls on the best experts, (17:58):
undefined

Ejaaz: as you described earlier, for whatever relevant query you have. (18:02):
undefined

Ejaaz: So if you have a creative writing task or if you have a coding thing, (18:05):
undefined

Ejaaz: we'll send it to two different departments of this model. (18:08):
undefined

Ejaaz: That's a really huge win. Do any other American models use this? (18:12):
undefined

Josh: Well, the first thing that came to my mind when you said that is Grok4, (18:16):
undefined

Josh: which doesn't exactly use this, but uses a similar thing, where instead of using (18:19):
undefined

Josh: a mixture of experts, It uses a mixture of agents. (18:23):
undefined

Josh: So Grok4 Heavy uses a bunch of distributed agents that are basically clones of the large model. (18:26):
undefined

Josh: But that takes up a tremendous amount of compute. And that is the $300 a month plan. (18:32):
undefined

Ejaaz: That's replicating Grok4 though, right? So that's like taking the model and copy pasting it. (18:36):
undefined

Ejaaz: So let's say Grok4 was one trillion parameters just for ease of comparison. (18:41):
undefined

Ejaaz: That's like creating, if there was four agents, that's four trillion parameters, (18:47):
undefined

Ejaaz: right? So it's still pretty costly and inefficient. (18:51):
undefined

Josh: Is that what you're saying no it's the actually the opposite direction of k2 (18:53):
undefined

Josh: so what they have used is just and again this is kind of similar to tracking (18:57):
undefined

Josh: sentiment between the united states and china where the united states will throw (19:02):
undefined

Josh: compute at it where china will throw like (19:06):
undefined

Josh: kind of clever resource at it so grok yeah (19:09):
undefined

Josh: when they use their mixture of agents it actually just costs a lot more (19:12):
undefined

Josh: money whereas k2 when they use their mixture of (19:14):
undefined

Josh: experts well it costs a lot less instead of using 4 trillion (19:17):
undefined

Josh: parameters in this case it uses just 32 billion and it (19:20):
undefined

Josh: kind of copies that 32 billion over and over and it's really it's a really (19:23):
undefined

Josh: elegant solution that seems to be (19:26):
undefined

Josh: yielding pretty comparable results so i think as we (19:29):
undefined

Josh: see these efficiency upgrades i'm sure they will (19:32):
undefined

Josh: eventually trickle down into the united states models and when they do that (19:34):
undefined

Josh: is going to be a huge unlock in terms of cost per token in terms of the smaller (19:38):
undefined

Josh: distilled models that we're going to be able to run on our own computers um (19:43):
undefined

Josh: but yeah i don't know of any who are also using it at this scale it might be (19:47):
undefined

Josh: novel just to k2 right now and (19:51):
undefined

Ejaaz: And i think that this is the method that probably scales the best josh like. (19:53):
undefined

Josh: Yeah it makes sense efficiency (19:58):
undefined

Ejaaz: Always wins at the end right and to see um this kind of innovation come pretty (20:00):
undefined

Ejaaz: early on in a technology's life cycle is just super impressive to see, (20:06):
undefined

Ejaaz: Another thing I saw is there's two different versions of this model, I believe. (20:11):
undefined

Ejaaz: There's something called Kimi K2 Base, which is basically the model for researchers (20:16):
undefined

Ejaaz: who want full control for fine-tuning and custom solutions, right? (20:22):
undefined

Ejaaz: So imagine this model as the entire parameter set. So you have access to one (20:26):
undefined

Ejaaz: trillion parameters, all the weight designs and everything. (20:32):
undefined

Ejaaz: And if you're a nerd that wants to nerd out you can (20:36):
undefined

Ejaaz: go crazy you know if you have like your own gpu (20:38):
undefined

Ejaaz: cluster at home or if you happen to have a convenient (20:41):
undefined

Ejaaz: warehouse full of of servers that you weirdly (20:44):
undefined

Ejaaz: have access to you can go crazy with it you can if you (20:48):
undefined

Ejaaz: think about like um the early gaming days of counter-strike and then you could (20:51):
undefined

Ejaaz: like mod it you can basically mod this uh model to your heart's desire and then (20:54):
undefined

Ejaaz: there's a second version called k2 instruct which is for drop-in general purpose (21:00):
undefined

Ejaaz: chat and AI agent experiences. (21:06):
undefined

Ejaaz: So this is kind of like at the consumer level, if you're experimenting with (21:08):
undefined

Ejaaz: these things, or if you want to run an experiment at home on a specific use (21:12):
undefined

Ejaaz: case, you can kind of like take that away and do that for yourself. (21:16):
undefined

Ejaaz: That's how I understand it, Josh. Do you have any takes on this? (21:19):
undefined

Josh: That makes sense. And I think that second version that you're describing is (21:22):
undefined

Josh: what's actually available publicly on their website, right? (21:25):
undefined

Josh: So if you go to Kimmy.com, it has a text box. It looks just like ChatGPT like you're used to. (21:27):
undefined

Josh: And that's where you can run that second tier model which (21:31):
undefined

Josh: um you described as that's the the drop in general purpose (21:34):
undefined

Josh: chat and then yeah for the the hardcore researchers there's (21:37):
undefined

Josh: a github repo and the github repo has all the weights and all the code and (21:40):
undefined

Josh: you can really download it dive in use the full thing i (21:43):
undefined

Josh: was playing around with the kimmy tool and it's it's really cool (21:46):
undefined

Josh: it's fast oh i mean it's lightning fast if you (21:49):
undefined

Josh: go from a reasoning model to an inference model like kimmy (21:52):
undefined

Josh: you get responses like this like when (21:55):
undefined

Josh: i'm using grok 4 or o3 i'm sitting there sometimes for a couple minutes it's (21:57):
undefined

Josh: waiting for an answer this you type it in and it just types back right away (22:01):
undefined

Josh: no time waiting so it's it's kind of refreshing to see that but it's also a (22:05):
undefined

Josh: testament to how impressive it is i'm getting great answers and it's just spitting (22:08):
undefined

Josh: it right out so what happens when they add the reasoning layer on top well it's (22:11):
undefined

Josh: probably going to get pretty freaking good (22:14):
undefined

Ejaaz: So the trend we're seeing, and we saw this last week with Grok4, (22:16):
undefined

Ejaaz: is typically we're expected to wait a while when we send a prompt to a breakthrough (22:21):
undefined

Ejaaz: model because it's thinking, it's trying to basically replicate what we have in our brains up here. (22:27):
undefined

Ejaaz: And now it's just getting much quicker and much smarter and much cheaper. (22:33):
undefined

Ejaaz: So the long story short is these incredibly powerful, I kind of think about (22:38):
undefined

Ejaaz: it as how we went from massive desktop computers to slick cell phones, (22:43):
undefined

Ejaaz: Josh, and then we're going to eventually have chips in our brain. (22:49):
undefined

Ejaaz: AI is just kind of like fast tracking that entire life cycle within like a couple (22:51):
undefined

Ejaaz: of years, which is just insane. (22:55):
undefined

Josh: And these efficiency improvements are really exciting because you can see how (22:57):
undefined

Josh: quickly they're shrinking and allowing eventually for those incredible models (23:00):
undefined

Josh: to just run on our phones. (23:05):
undefined

Josh: So there's totally a world a year from now in which like a (23:06):
undefined

Josh: grok 403 kimmy k2 capable model (23:09):
undefined

Josh: is small enough that it could just run inside of in our (23:12):
undefined

Josh: phone and run on a mobile device or run locally on a laptop (23:15):
undefined

Josh: or you're offline and you kind of have this portable intelligence (23:18):
undefined

Josh: that's available everywhere anytime even if (23:21):
undefined

Josh: you're not connected to the world and that seems really cool (23:24):
undefined

Josh: like we were talking a few episodes ago about apple's um local (23:27):
undefined

Josh: free ai inference running on an iphone (23:30):
undefined

Josh: but how the base models still kind of suck like they don't really do (23:33):
undefined

Josh: anything super interesting they're basically good enough to do what (23:36):
undefined

Josh: you would expect siri to do but can't do and these (23:38):
undefined

Josh: models as we get more and more breakthroughs like this that allow you to (23:41):
undefined

Josh: run much larger parameter counts (23:44):
undefined

Josh: on a much smaller device it's going to start really (23:47):
undefined

Josh: super powering these mobile devices and i can't help but think about the open (23:50):
undefined

Josh: ai hardware device i'm like wow that'd be super cool if you had like oh three (23:54):
undefined

Josh: running locally in the middle of the jungle somewhere with no service and you (23:58):
undefined

Josh: still had access to all of its capabilities like that's probably coming downstream (24:02):
undefined

Josh: of breakthroughs like this where we get really big efficiency unlocks (24:06):
undefined

Ejaaz: I mean, it's not just efficiency, though, right? It's the fact that if you can (24:10):
undefined

Ejaaz: run it locally on your device, it can have access to all your private data without (24:14):
undefined

Ejaaz: exposing all of that to the model providers themselves, right? (24:18):
undefined

Ejaaz: So one of the major concerns of not just AI models, but also with mobile phones is privacy. (24:22):
undefined

Ejaaz: I don't want to share all my kind of like private health, financial, (24:29):
undefined

Ejaaz: and social media data, because then you're just going to have everything on (24:32):
undefined

Ejaaz: me and you're going to use me. (24:35):
undefined

Ejaaz: You're going to use me as a product, right? And that's kind of like been the (24:36):
undefined

Ejaaz: quota for the last decade in tech. (24:39):
undefined

Ejaaz: And so with AI, that's a supercharged version of it. The information gets more (24:42):
undefined

Ejaaz: personal. It's not just your likes. (24:45):
undefined

Ejaaz: It's, you know, where Josh shops every day and, you know, who he's dating and (24:47):
undefined

Ejaaz: all these kinds of things, right? (24:52):
undefined

Ejaaz: And that becomes quite personal and intrusive very quickly. (24:53):
undefined

Ejaaz: So the question then becomes, how can we have the magic of an AI model without it being so obtrusive? (24:57):
undefined

Ejaaz: And that is open source locally run AI or privately run AI. and Kimi K2 is a (25:03):
undefined

Ejaaz: frontier model that can technically run on your local device if you set up the right hardware for it. (25:09):
undefined

Ejaaz: And the way that we're trending, you can basically end up having that on your (25:14):
undefined

Ejaaz: device, which is just a huge unlock. (25:17):
undefined

Ejaaz: And if you can imagine how you use OpenAI 03 right now, Josh, (25:20):
undefined

Ejaaz: right? I know you use it as much as I do. (25:25):
undefined

Ejaaz: The reason why you and I use it so much isn't just because it's so smart, (25:27):
undefined

Ejaaz: but it's because it remembers everything about us. (25:30):
undefined

Ejaaz: But I hate that Sam knows or has access to all that data. (25:33):
undefined

Ejaaz: I hate that if he chooses to switch on personalized ads, which is currently (25:36):
undefined

Ejaaz: the model where most of these tech companies make money right now, (25:41):
undefined

Ejaaz: he can, and I've got nothing to do about it because I don't want to use any (25:44):
undefined

Ejaaz: other model apart from that. (25:48):
undefined

Ejaaz: But if there was a locally run (25:49):
undefined

Ejaaz: model that had access to all the memory and context, I'd use that instead. (25:51):
undefined

Josh: And this is suspicious. I mean, this is a different conversation in total, (25:56):
undefined

Josh: but isn't it interesting how other companies haven't really leaned into memory (26:00):
undefined

Josh: when it's seemingly the most important mode that there is. (26:04):
undefined

Josh: Like Grok4 doesn't have good memory rolled out. Gemini doesn't really have memory. (26:07):
undefined

Josh: There's no, Claude doesn't have memory the way that OpenAI does. (26:11):
undefined

Josh: Yet it's the single biggest reason why we both continue to go back to ChatGPT and OpenAI. (26:13):
undefined

Josh: So that's just been an interesting thing. I mean, Kimmy is open source. (26:19):
undefined

Josh: I wouldn't expect them to lean too much into it. But for these closed source (26:21):
undefined

Josh: models, that's just, it's another interesting just observation. (26:24):
undefined

Josh: Like, hey, the most important thing isn't, doesn't seem to be prioritized by (26:27):
undefined

Josh: other companies just yet. (26:30):
undefined

Ejaaz: Why do you think that is so so my theory um at least from xai or grok force (26:31):
undefined

Ejaaz: perspective is elon's like okay i'm not going to be able to build a better chat (26:37):
undefined

Ejaaz: bot or chat messenger than openai has there's not too many features i can um. (26:43):
undefined

Ejaaz: Set Grok 4 apart, then that O3 doesn't already do, right? (26:51):
undefined

Ejaaz: But where I can beat O3 is at the app layer. (26:55):
undefined

Ejaaz: I can create a better app store than they have because I haven't really created (26:59):
undefined

Ejaaz: one that is sticky enough for users to continually use. (27:05):
undefined

Ejaaz: And I can use that data set to then unlock memory and context at that point, right? (27:09):
undefined

Ejaaz: So I just saw today that they released, they (27:15):
undefined

Ejaaz: being um xai released a new feature for grok 4 (27:18):
undefined

Ejaaz: called i think it's uh companions josh um (27:21):
undefined

Ejaaz: and it's basically these yeah these animated um (27:25):
undefined

Ejaaz: avatar like um characters so they basically look like they're from an anime (27:29):
undefined

Ejaaz: show and you know how you can use voice mode in open ai and you can kind of (27:33):
undefined

Ejaaz: like talk to this uh realistic human sounding ai you now have a face and a character (27:37):
undefined

Ejaaz: on grok 4 and it's really entertaining, Josh. (27:44):
undefined

Ejaaz: Like I find myself kind of like engaged in this thing because I'm not just typing words. (27:47):
undefined

Ejaaz: It's not just this binary to and fro with this chat messenger. (27:51):
undefined

Ejaaz: It's this human, this cute, attractive human that I'm just like now speaking to. (27:55):
undefined

Ejaaz: And I think that that's the strategy that a lot of these AI companies, (28:00):
undefined

Ejaaz: if I had to guess, are taking to kind of like seed their user base before they (28:04):
undefined

Ejaaz: unlock memory. I don't know whether you have a take on that. (28:08):
undefined

Josh: Yeah, I have a fun little demo. I actually played around with it this morning (28:10):
undefined

Josh: and I was using it totally unhinged, no filter, very vulgar, (28:13):
undefined

Josh: but like kind of fun. It's like a fun little party trick. (28:18):
undefined

Josh: And yeah, I mean, that was a surprise to me this morning when I saw that rolled (28:21):
undefined

Josh: out. I was like, huh, that doesn't really seem like it makes sense. (28:25):
undefined

Josh: But I think they're just having fun with it. (28:27):
undefined

Ejaaz: Can we for a second talk about the team? (28:29):
undefined

Ejaaz: So we've mentioned just now how they've all come from China and how China's (28:33):
undefined

Ejaaz: like really advancing open source AI models, and they've completely beat out (28:37):
undefined

Ejaaz: the competition in America, Mata's Lama being the obvious one. (28:41):
undefined

Ejaaz: We've got Kwen from Alibaba. (28:45):
undefined

Ejaaz: We've got Deep Seek R1. Now we have Kimi K2. The team is basically... (28:47):
undefined

Ejaaz: The AI Avengers of China, Josh. So these three co-founders all have deep AI (28:53):
undefined

Ejaaz: ML backgrounds that hail from the top American universities, (29:00):
undefined

Ejaaz: such as Carnegie Mellon. (29:04):
undefined

Ejaaz: One of them has a PhD from Carnegie Mellon in machine learning, (29:05):
undefined

Ejaaz: which is basically, for those of you who don't know, is like God-tier degree for AI. (29:08):
undefined

Ejaaz: That means you're desirable and hireable by every other AI company after you graduate. (29:14):
undefined

Ejaaz: But it's not just that. They also have credibility and degrees from the top universities in China. (29:19):
undefined

Ejaaz: Especially this one university called Tsinghua, which seemed to be the top of their field. (29:23):
undefined

Ejaaz: I looked them up on rankings for AI universities globally, and they often come (29:28):
undefined

Ejaaz: in number three or four in the top 10 AI universities. So pretty impressive from there. (29:34):
undefined

Ejaaz: But what I found really interesting, Josh, was one of the co-founders was an (29:41):
undefined

Ejaaz: expert in training AI models on low-cost optimized hardware. (29:46):
undefined

Ejaaz: And the reason why I mentioned this is it's no secret that if you want a top (29:51):
undefined

Ejaaz: frontier AI model, you need to train it on NVIDIA's GPUs. (29:58):
undefined

Ejaaz: You need to train it on NVIDIA's hardware. (30:03):
undefined

Ejaaz: NVIDIA's market cap, I think, at the end of last week, surpassed $4 trillion. (30:06):
undefined

Ejaaz: That's $4 trillion with a T. That is more than the current GDP of the entire British economy. (30:11):
undefined

Josh: Where I hail from. And the largest in the world. (30:18):
undefined

Ejaaz: And there's never been. (30:19):
undefined

Josh: A bigger company (30:20):
undefined

Ejaaz: There's never been a bigger company it it's just (30:21):
undefined

Ejaaz: insane to grab your head around and it's not without (30:24):
undefined

Ejaaz: reason they supply basically or they have a (30:27):
undefined

Ejaaz: grasp or a monopoly on the hardware that (30:30):
undefined

Ejaaz: is needed to train top models now kimmy k2 (30:33):
undefined

Ejaaz: comes along casually drops a one trillion parameter model one of the largest (30:36):
undefined

Ejaaz: models ever released um and it's trained on hardware that isn't nvidia's um (30:40):
undefined

Ejaaz: and jensen huang i i need to find this clip josh but But Jensen Huang basically (30:46):
undefined

Ejaaz: was on stage, I think it was at a private conference maybe yesterday, (30:50):
undefined

Ejaaz: but he was quoted as saying 50% of the top AI researchers are Chinese and are from China. (30:53):
undefined

Ejaaz: And what he was implicitly getting at is they're a real threat now. (31:01):
undefined

Ejaaz: I think for the last decade, we've kind of been like, ah, yeah, (31:05):
undefined

Ejaaz: China's just going to copy paste everything that comes out of America's tech sector. (31:08):
undefined

Ejaaz: But when it comes to AI, we've kind of like maintained the same mindset up until (31:13):
undefined

Ejaaz: now where they're really just competing with us. (31:17):
undefined

Ejaaz: And if they have the hardware, they have the ability to research new techniques (31:19):
undefined

Ejaaz: to train these models, like DeepSeek's reinforcement learning and reasoning, (31:24):
undefined

Ejaaz: and then Kimi K2's kind of like efficient training run, which you showed earlier. (31:28):
undefined

Ejaaz: They've come to play, Josh. And I think it's worth highlighting that China has (31:33):
undefined

Ejaaz: a very strong grasp on top AI researchers in the world and models that are coming out of it. (31:38):
undefined

Josh: Where are their $100 million offers? I haven't seen any of those coming through. (31:45):
undefined

Josh: None, dude. The most impressive thing is that they do it without the resources that we have. (31:49):
undefined

Josh: Imagine if they did have access to the clusters of these like H100s that NVIDIA is making. (31:55):
undefined

Josh: I mean, that would be, would they crush us? (32:01):
undefined

Josh: And we kind of have this timeline here where we're kind of running up against (32:03):
undefined

Josh: the edge of energy that we have available to us to train these massive models. (32:08):
undefined

Josh: Whereas China does not have that constraint. They have significantly more energy to power these. (32:13):
undefined

Josh: So in the event, the inevitable event that they do get the chips and they are (32:17):
undefined

Josh: able to train at the scale that we are, I'm not sure we're able to continue (32:22):
undefined

Josh: our rate of acceleration in terms of hardware manufacturing, (32:26):
undefined

Josh: large training as fast as they will. (32:30):
undefined

Josh: And they already have done the hard work on the software efficiency side. (32:32):
undefined

Josh: They've cranked out every single efficiency because they are doing it on constrained hardware. (32:36):
undefined

Josh: So it's going to create this really interesting effect where they're coming (32:40):
undefined

Josh: at it from the like ingenuity software approach we're coming at it from the (32:43):
undefined

Josh: brute force throw a lot of compute added approach and we'll see where both both (32:47):
undefined

Josh: sides end up um but it's clear that china is still behind because they are the (32:51):
undefined

Josh: ones open sourcing the models and we know at this point now if you're open sourcing (32:54):
undefined

Josh: your model you're doing it because you're behind (32:58):
undefined

Ejaaz: Yeah yeah i mean one thing (33:00):
undefined

Ejaaz: that did surprise me josh was that they released a one (33:03):
undefined

Ejaaz: trillion parameter open source model i i didn't (33:05):
undefined

Ejaaz: expect them to catch up that quickly um like one (33:08):
undefined

Ejaaz: trillion is a lot um yeah another thing (33:11):
undefined

Ejaaz: i was thinking about is china has dominated (33:14):
undefined

Ejaaz: hardware for so long now so it wouldn't (33:17):
undefined

Ejaaz: really surprise me if like i don't know a (33:20):
undefined

Ejaaz: couple years from now they're producing better models (33:23):
undefined

Ejaaz: at specific things basically because they have better (33:27):
undefined

Ejaaz: hardware than america than the west um but (33:30):
undefined

Ejaaz: where i think the west will continue to dominate (33:33):
undefined

Ejaaz: is at the application layer and i don't (33:36):
undefined

Ejaaz: know if i was a betting man i would say that most of the money is eventually going (33:39):
undefined

Ejaaz: to be made on the application side of things i think grok (33:42):
undefined

Ejaaz: 4 is starting to um kind of show that (33:45):
undefined

Ejaaz: with all these different kinds of novel features that they're releasing i i (33:48):
undefined

Ejaaz: don't know if you've seen some of the games that are being produced from grok (33:52):
undefined

Ejaaz: 4 josh but it is ultimately insane and i haven't seen any similar examples come (33:55):
undefined

Ejaaz: out of uh asia from any of their ai models even when they have access to american (33:59):
undefined

Ejaaz: models So I still think America dominates at the app layer. (34:03):
undefined

Ejaaz: But Josh, I just came across this tweet, which you reminded me of earlier. (34:06):
undefined

Ejaaz: Tell me about OpenAI's strategy to open source model, because I got this tweet (34:11):
undefined

Ejaaz: pulled up from Sam Altman, which is kind of hilarious. (34:16):
undefined

Josh: Yeah. All right. So this week, if you remember from our episode last week, (34:19):
undefined

Josh: we were excited about talking about OpenAI's new open source model. (34:23):
undefined

Josh: OpenAI, open source model, all checks out. This was going to be the big week. (34:27):
undefined

Josh: They released their new flagship open source. Well, conveniently, (34:30):
undefined

Josh: I think the same day as K2 launched, later in the day, or perhaps the very next morning. (34:35):
undefined

Josh: Sam Altman posted a tweet. He says, Hey, we plan to launch our open weights model next week. (34:39):
undefined

Josh: We are delaying it. We need time to run additional safety tests and review high-risk (34:44):
undefined

Josh: areas. We are not yet sure how long it will take us. (34:48):
undefined

Josh: While we trust the community will build great things with this model, (34:50):
undefined

Josh: once weights are out, they can't be pulled back. This is new for us and we want to get it right. (34:54):
undefined

Josh: Sorry to be the bearer of bad news. We are working super hard. (34:58):
undefined

Josh: So there's a few points of speculation. The first, obviously, (35:01):
undefined

Josh: being, did you just get your ass handed to you and now you are going back to (35:05):
undefined

Josh: reevaluate before you push out a remodel? (35:08):
undefined

Josh: So that's one possible thing where they saw K2. They were like, (35:11):
undefined

Josh: oh, boy, this is pretty sweet. (35:14):
undefined

Josh: This is our first open source model. We probably don't want to be lower than them. (35:16):
undefined

Josh: And there is this second point of speculation, which, Ejaz, you mentioned to (35:21):
undefined

Josh: me a little earlier today, where maybe something went wrong with the training one. (35:24):
undefined

Josh: And it's not quite that they're getting beat up by a Chinese company. (35:28):
undefined

Josh: Is that like they actually made a mistake on their own accord and can you explain (35:32):
undefined

Josh: to me specifically what that might be what the speculation is at least yeah (35:37):
undefined

Ejaaz: Well i'll keep it short i think it was a little racist under (35:40):
undefined

Ejaaz: the hood and i i can't find the tweet but basically (35:43):
undefined

Ejaaz: one of these um ai researchers slash (35:46):
undefined

Ejaaz: product builders on x got access to (35:50):
undefined

Ejaaz: the model supposedly according to him and he tested it (35:53):
undefined

Ejaaz: out uh in the background and he said yeah it's it's (35:56):
undefined

Ejaaz: not really an intelligence thing it's just worse than (35:59):
undefined

Ejaaz: what uh you'd expect from an alignment and uh consumer facing approach it was (36:02):
undefined

Ejaaz: it was ill-mannered it was saying some pretty wild shit kind of the stuff that (36:08):
undefined

Ejaaz: you'd expect coming out of 4chan um and so sam altman decided to delay whilst (36:12):
undefined

Ejaaz: they kind of like figured out why um it was kind of acting out. (36:17):
undefined

Josh: Got it okay so we'll leave (36:21):
undefined

Josh: that speculation where it is there's a there's a funny post (36:24):
undefined

Josh: that i'll actually share with you if you want to throw it up which was actually from elon (36:27):
undefined

Josh: and we'll abbreviate but it was like elon was basically saying um (36:30):
undefined

Josh: it's hard to avoid the the libtard slash (36:33):
undefined

Josh: mecha hitler like approach both of them (36:37):
undefined

Josh: because they're on so polar opposite ends of the spectrum and he said he spent (36:40):
undefined

Josh: several hours trying to solve this problem with the system prompt but there's (36:43):
undefined

Josh: too much garbage coming in at the foundation model level so basically i mean (36:47):
undefined

Josh: what happens with these models is you train them based on all the human knowledge (36:50):
undefined

Josh: that exists right so everything that we've believed all the ideas that we've (36:53):
undefined

Josh: shared it's been fed into these models. (36:57):
undefined

Josh: And what happens is you can try to adjust how they interpret this data through (36:59):
undefined

Josh: the system prompt, which is basically an instruction that every single query (37:03):
undefined

Josh: gets passed through, but at some point is reliant on this swath of human data that is just (37:06):
undefined

Josh: It's too overbearing. And that's kind of what Elon shared. (37:13):
undefined

Josh: And the difference between OpenAI and Grok is that Grok will just ship the crazy (37:16):
undefined

Josh: update. And that's what they did. And they caught a lot of backlash from it. (37:20):
undefined

Josh: But what I find interesting and what I'm sure OpenAI will probably follow is (37:22):
undefined

Josh: this last paragraph where he says, our V7 foundation model should be much better. (37:26):
undefined

Josh: And we're being far more selective about training data rather than just training on the entire internet. (37:30):
undefined

Josh: So what they're planning to do to solve this problem, which is what I assume (37:34):
undefined

Josh: OpenAI probably ran into in the case that the AI training model kind of went (37:37):
undefined

Josh: off the rails and it started saying bad things about lots of people is that (37:40):
undefined

Josh: you kind of have to rebuild the foundation model with new sets of data. (37:44):
undefined

Josh: And in the case of Grok, I know one of the intentions for v7 is actually to (37:49):
undefined

Josh: generate its own database of data based on synthetic data from their models. (37:52):
undefined

Josh: And I'm assuming OpenAO will probably have to do this too if they want to calibrate. (37:57):
undefined

Josh: A lot of times people call that the temperature, which is the like variance (38:01):
undefined

Josh: of aggression in which a model uses. (38:05):
undefined

Josh: And I don't know, I think we're gonna start to see interesting approaches from (38:08):
undefined

Josh: that because as they get smarter, you really don't want them to necessarily (38:11):
undefined

Josh: have these evil traits as the default. (38:15):
undefined

Josh: And it's very hard to get around that when you train them on the data that they've been trained on so far. (38:18):
undefined

Ejaaz: It just goes to show how, I guess, cumbersome it is to train these models, (38:24):
undefined

Ejaaz: Josh. It's such a hard thing. (38:30):
undefined

Josh: Yeah. Yeah. (38:31):
undefined

Ejaaz: It's not something that you can just kind of like jump into the code and tweak a few things. (38:32):
undefined

Ejaaz: Most of the time you don't know what's wrong with the model or where it went (38:37):
undefined

Ejaaz: wrong. I mean, we've talked about this on a previous episode, but (38:40):
undefined

Ejaaz: So essentially, if you build out this model, right, you spend hundreds of millions (38:44):
undefined

Ejaaz: of dollars, and then you feed it a query. (38:48):
undefined

Ejaaz: So you put something in and then you wait to see what it spits out. (38:51):
undefined

Ejaaz: You don't really know what it's going to spit out. You can't predict it. (38:54):
undefined

Ejaaz: It's completely probabilistic. and so if you (38:58):
undefined

Ejaaz: release a model and it starts being a little racist or uh (39:01):
undefined

Ejaaz: you know um kind of crazy uh you (39:04):
undefined

Ejaaz: have to kind of like go back to the drawing board and you have (39:07):
undefined

Ejaaz: to analyze many different sectors of of this model (39:10):
undefined

Ejaaz: like was it the data that was poisoned or was it the way that we trained it (39:13):
undefined

Ejaaz: or maybe it was a particular model weight that we tweaked too much or whatever (39:16):
undefined

Ejaaz: that might be so i i think over time it's going to get a lot easier once we (39:21):
undefined

Ejaaz: understand how these models actually work but my god it must be so expensive (39:25):
undefined

Ejaaz: to just continually rerun and retrain these models. (39:29):
undefined

Josh: Yeah when you think about a coherent cluster of 200 (39:32):
undefined

Josh: 000 gpus the amount of energy the amount (39:35):
undefined

Josh: of resources just to to retrain a mistake is is huge so i think i mean the more (39:37):
undefined

Josh: we go into it the deeper we get the more it kind of makes sense paying so much (39:42):
undefined

Josh: money for talent to avoid these mistakes where if you pay a hundred million (39:46):
undefined

Josh: dollars for one employee who will give you a strategic advantage to avoid having (39:50):
undefined

Josh: to do another training run, that will cost you more than $100 million. (39:54):
undefined

Josh: You've already, you're already in the profit. So you kind of start to see the (39:57):
undefined

Josh: scale, the complexity, the difficulties. (40:01):
undefined

Josh: I do not envy the challenges that some of these engineers have to face. (40:03):
undefined

Josh: Although I do envy the- I envy the salary. (40:07):
undefined

Ejaaz: I envy the salary, Josh. (40:09):
undefined

Josh: I envy the salary and I envy the adventure. Like how cool must that be trying (40:11):
undefined

Josh: to build super intelligence for the world as a human for the first time in like (40:14):
undefined

Josh: the history of everything. (40:18):
undefined

Josh: So it's gotta be pretty fun. This is where we're at now with the open source (40:20):
undefined

Josh: models closed source models k2's pretty epic i think that's a home run i think (40:24):
undefined

Josh: we've crowned a new model today um do you have any closing thoughts anything (40:28):
undefined

Josh: you want to add before we wrap up here this is pretty amazing i (40:32):
undefined

Ejaaz: Think i'm most excited uh for the episode that we're probably going to release (40:35):
undefined

Ejaaz: a week from now josh when we've seen what people have built with this open source (40:40):
undefined

Ejaaz: model that's the best part about this by the way just to remind the listener that, (40:44):
undefined

Ejaaz: anyone can take this model right now you if you're listening to this can take (40:49):
undefined

Ejaaz: this model right now run it locally at home and tweak it to your preference (40:52):
undefined

Ejaaz: now yes it's going to be you know you kind of need to know how to tweak model (40:56):
undefined

Ejaaz: weights and stuff but i think we're going to see some really cool applications (41:00):
undefined

Ejaaz: get released over the next week and i'm excited to play around with them personally. (41:03):
undefined

Josh: Yeah if you're listening to this um and you can (41:07):
undefined

Josh: run this model let us know because that means you have quite a solid uh (41:09):
undefined

Josh: rig at your home yeah i'm not sure the average person is (41:12):
undefined

Josh: going to be able to run this but that is the beauty of the open weights is that anybody (41:15):
undefined

Josh: with the capability of running this can do so they (41:18):
undefined

Josh: could tweak it how they like and now they have access to the new (41:21):
undefined

Josh: best open source model in the world which i mean just a (41:24):
undefined

Josh: couple months ago from now would have been the best model in the (41:27):
undefined

Josh: world so it's moving really quickly it's really accessible and (41:29):
undefined

Josh: i'm sure as the weeks go by i mean hopefully we'll get open ai's model open (41:33):
undefined

Josh: source model soon in the next few weeks we'll be able to cover that but until (41:37):
undefined

Josh: then just lots of stuff going on this was uh another great episode so thank (41:41):
undefined

Josh: you everyone for tuning in again for rocking with us We actually plan on making this like 20 minutes, (41:46):
undefined

Josh: but we just kind of kept tailing off into more interesting things. (41:50):
undefined

Josh: There's a lot of interesting stuff to talk about. I mean, there's really, (41:53):
undefined

Josh: you could take this in a lot of places. (41:56):
undefined

Josh: So hopefully this was interesting. (41:58):
undefined

Josh: Go check out Kimmy K2. It's really, really impressive. It's really fast. (42:00):
undefined

Josh: It's really cheap. If you're a developer, give it a try. (42:05):
undefined

Josh: And yeah, that's been another episode. We'll be back again later this week with (42:07):
undefined

Josh: another topic. and just keep on chugging along as the frontier of AMI models continues to head west. (42:11):
undefined

Ejaaz: So also we'd love to hear from you guys. So if you have any suggestions on things (42:18):
undefined

Ejaaz: that you want us to talk more about, or maybe there's like some weird model (42:23):
undefined

Ejaaz: or feature that you just don't understand and maybe we can do a job at explaining it, just message us. (42:27):
undefined

Ejaaz: Our DMs are open or respond to any of our tweets and we'll be happy to oblige. (42:32):
undefined

Josh: Yeah, let us know. If there's anything cool that we're missing, (42:37):
undefined

Josh: send it our way and we'll cover it. That'd be great. (42:40):
undefined

Josh: But yeah, we're all going on the journeys together. We're learning this as we go. (42:42):
undefined

Josh: So hopefully today was interesting. And if you did enjoy it, (42:45):
undefined

Josh: please share with friends, likes, comment, subscribe, all the great things. (42:47):
undefined

Josh: And we will see you on the next episode. (42:50):
undefined

Ejaaz: Thanks for watching. See you guys. See you. (42:52):
undefined

All Episodes

Kimi K2 is the Open Source Claude-Killer | US vs China AI

Episode Transcript

Popular Podcasts

Crime Junkie

24/7 News: The Latest

Stuff You Should Know

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Kimi K2 is the Open Source Claude-Killer | US vs China AI