All Episodes

July 16, 2025 42 mins

Kimi K2 is a groundbreaking open-source AI model from China with 1 trillion parameters. We discuss its competitive advantages, including low operational costs and superior coding capabilities through a "mixture of experts" approach. 

Josh highlights the implications for AI competition as Kimi K2 emerges in the market alongside OpenAI’s plans for an open-source model.

We also explore Kimi K2’s two versions—Base and Instruct—its impact on the AI landscape, and the challenges faced by OpenAI's ChatGPT, xAI's Grok, and Anthropic's Claude. Tune in for key insights on how Kimi K2 could reshape AI development!

------
💫 LIMITLESS | SUBSCRIBE & FOLLOW
https://limitless.bankless.com/
https://x.com/LimitlessFT

-----
TIMESTAMPS

0:00 Intro
0:58 The Rise of Kimi K2
2:49 Efficiency and Cost Benefits
3:53 Training Breakthroughs Explained
5:37 Innovations in AI Training
6:30 The Impact of Open Source
8:05 Competitive Landscape of AI
9:41 Context Window Capabilities
12:55 The Surge of Kimi K2
15:36 Market Adoption Insights
19:57 Versions of Kimi K2
24:21 Privacy and Local AI
26:30 The AI Talent Landscape
31:04 China's AI Competitive Edge
32:40 Open Source vs. Closed Source
40:19 Closing Thoughts and Future Prospects
42:49 Get Involved

-----
RESOURCES

Josh: https://x.com/Josh_Kale

Ejaaz:https://x.com/cryptopunk7213

------
Not financial or tax advice. See our investment disclosures here:
https://www.bankless.com/disclosures⁠

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Ejaaz: A bunch of AI researchers from China just released a brand new AI model called (00:03):
undefined

Ejaaz: Kimi K2, which is not only as good as any other top model like Claude, (00:08):
undefined

Ejaaz: but it is also 100% open source, which means it's free to take, (00:13):
undefined

Ejaaz: customize and create into your own brand new AI model. (00:17):
undefined

Ejaaz: This thing is amazing at coding, it beats any other model at creative writing, (00:20):
undefined

Ejaaz: and it also has a pretty insane voice mode. (00:24):
undefined

Ejaaz: Oh, and I should probably mention that it is one trillion parameters in size, (00:27):
undefined

Ejaaz: which makes it one of the biggest and largest models to ever be created. (00:32):
undefined

Ejaaz: Josh, we were winding down on a Friday night and this news broke that this team (00:36):
undefined

Ejaaz: had released this model. (00:41):
undefined

Ejaaz: Absolutely crazy bomb, especially with like OpenAI rumored to release their (00:43):
undefined

Ejaaz: open source model this week. (00:48):
undefined

Ejaaz: You've been jumping into this. What's your take? (00:50):
undefined

Josh: Yeah. So last week we crowned Grok 4 as the new leading private model, closed source model. (00:54):
undefined

Josh: This week we got to give the crown to Kimi K2 we got another crown (00:59):
undefined

Josh: going for the open source team they are winning I mean this is (01:02):
undefined

Josh: better than DeepSeek and DeepSeek R2 this is basically DeepSeek R3 (01:05):
undefined

Josh: I would imagine um and if you remember back a couple months DeepSeek really (01:08):
undefined

Josh: flipped the world on its head because of how efficient it was and the algorithmic (01:12):
undefined

Josh: upgrades it made and I think what we see with Kimi K2 is a lot of the same thing (01:16):
undefined

Josh: it's it's these novel breakthroughs that come as a downstream effect of their (01:20):
undefined

Josh: needing to be resourceful (01:24):
undefined

Josh: China, they don't have the mega GPU clusters we have, they don't have all the (01:26):
undefined

Josh: cutting edge hardware, but they do have the software prowess to find these efficiencies. (01:30):
undefined

Josh: I think that's what makes this model so special. And that's what we're going (01:34):
undefined

Josh: to get into here is specifically what they did to make this model so special. (01:37):
undefined

Ejaaz: Yeah, I mean, look at these stats here, Josh, like 1 trillion parameters in total. (01:41):
undefined

Ejaaz: It's 32 billion active mixture of expert models. So what this means is, (01:46):
undefined

Ejaaz: although it's really large in size, typically these AI models can become pretty (01:50):
undefined

Ejaaz: inefficient if it's large in size, it uses this technique called mixture of (01:54):
undefined

Ejaaz: experts, which means that whenever someone queries a model, (01:58):
undefined

Ejaaz: it only uses or activates a number of parameters that are relevant for the query itself. (02:01):
undefined

Ejaaz: So it's more smarter, it's much more efficient, and it doesn't use or consume (02:07):
undefined

Ejaaz: as much energy as you would if you wanted to run it locally at home or whatever (02:12):
undefined

Ejaaz: that might be. It's also super cheap. (02:16):
undefined

Ejaaz: I think I saw somewhere that this was 20% the cost of clawed, (02:18):
undefined

Ejaaz: josh which uh we love that insane uh (02:21):
undefined

Ejaaz: for all the nerds that kind of want to run you know (02:25):
undefined

Ejaaz: really long tasks or you know just set and (02:28):
undefined

Ejaaz: forget the ai to to run on like your coding log or whatever that might mean (02:31):
undefined

Ejaaz: you can now do it at a much more affordable rate at one-fifth the cost uh than (02:34):
undefined

Ejaaz: some of the top models that are out there and it is as good as those models (02:39):
undefined

Ejaaz: so just insane kinds of things josh i know there's a bunch of things that you (02:42):
undefined

Ejaaz: wanted to point out here on benchmarks um And what do you want to get into? (02:46):
undefined

Josh: Yeah, it's really amazing. So they took 15 and a half trillion tokens and they (02:50):
undefined

Josh: condensed those down into a one trillion parameter model. (02:54):
undefined

Josh: And then what's amazing is when you use this model, like she said, (02:56):
undefined

Josh: it uses a thing called mixture of experts. (02:59):
undefined

Josh: So it has, I believe, 384 experts. (03:01):
undefined

Josh: And each expert is good at a specific thing. So let's say in the case you want (03:05):
undefined

Josh: to do a math problem, it will take a 32 billion parameter subset of the one (03:08):
undefined

Josh: trillion total parameters, and it will choose eight of these different (03:13):
undefined

Josh: Experts in a specific thing. So in the case of math, it'll find an expert that (03:17):
undefined

Josh: has the calculator tool. (03:21):
undefined

Josh: It'll find an expert that has a fact, like a fact checking tool or a proof tool (03:22):
undefined

Josh: to make sure that the math is accurate. (03:27):
undefined

Josh: It'll have just a series of tools to help itself. And that's kind of how it (03:29):
undefined

Josh: works so efficiently is instead of using a trillion parameters at once, (03:32):
undefined

Josh: it uses just 32 billion and it uses the eight best specialists out of the 384 (03:35):
undefined

Josh: that it has available to it. It's really impressive. (03:41):
undefined

Josh: And what we see here is the benchmarks that we're showing on screen. (03:44):
undefined

Josh: And the benchmarks are really good. (03:46):
undefined

Josh: It's up there in line with just about any other top model, except with the exception (03:48):
undefined

Josh: that this is open source. (03:52):
undefined

Josh: And there was another breakthrough that we had, which was the actual way that (03:53):
undefined

Josh: they handled the training of this. (03:57):
undefined

Josh: And yeah, this is the loss curve. So what you're looking at on screen for the (03:59):
undefined

Josh: people who are listening, it's this really pretty smooth curve that kind of (04:02):
undefined

Josh: starts at the top and it trends down in a very predictable and smooth way. (04:05):
undefined

Josh: And most curves don't look like this. And if they do look like this, (04:09):
undefined

Josh: it's because the company has spent tons and tons of money on error correction (04:13):
undefined

Josh: to make sure this curve is so smooth. (04:17):
undefined

Josh: So basically what you're seeing is the training run of the model. (04:19):
undefined

Josh: And a lot of times what happens is you get these very sharp spikes and it starts (04:22):
undefined

Josh: to defer away from the normal training run. (04:26):
undefined

Josh: And it takes a lot of compute to kind of recalibrate and push that back into the right way. (04:29):
undefined

Josh: What they've managed to do is really make it very smooth. (04:34):
undefined

Josh: And they've done this by increasing these efficiencies. So if you can think (04:37):
undefined

Josh: about it, there's this analogy I was thinking of right before we hit the record button. (04:40):
undefined

Josh: And it's if you were teaching a chef how to cook, right? (04:43):
undefined

Josh: So we have Chef Ejaz here. I am teaching him how to cook. I am an expert chef. (04:46):
undefined

Josh: And instead of telling him every ingredient and every step for every single (04:50):
undefined

Josh: dish, what I tell him is like, hey, if you're making this amazing dinner recipe, (04:54):
undefined

Josh: all you need that matters is this amount of salt applied at this time, (04:59):
undefined

Josh: this amount of heat applied for this length of time, and the other stuff doesn't matter as much. (05:04):
undefined

Josh: So just put in whatever you think is appropriate, but you'll get the same answer. (05:08):
undefined

Josh: And that's what we see with this model is just an increased amount of efficiency by being (05:11):
undefined

Josh: direct by being intentional about the data that they used to train it on, (05:16):
undefined

Josh: the data that they used to fetch in order to give you high quality queries. (05:20):
undefined

Josh: And it's a really novel breakthrough. They call it the MuonClip optimizer, (05:24):
undefined

Josh: which, I mean, it's a Chinese company, maybe it means something special there, (05:28):
undefined

Josh: but it is a new type of optimizer. (05:32):
undefined

Josh: And what you're seeing in this curve is that it's working really well and it's (05:34):
undefined

Josh: working really efficient. (05:37):
undefined

Josh: And that's part of the benefit of having this open source is now we have this (05:37):
undefined

Josh: novel breakthrough and we could take this and we could use this for even more (05:41):
undefined

Josh: breakthroughs even more open source models and and that's part that's been really cool to see (05:44):
undefined

Ejaaz: I i mean this is just um time (05:48):
undefined

Ejaaz: and again from china uh so so amazing from their research team so so like just (05:52):
undefined

Ejaaz: to kind of like um pick up your comment on deep seek at the end of last year (05:57):
undefined

Ejaaz: we were utterly convinced that the only way to create a breakthrough model was (06:02):
undefined

Ejaaz: to spend billions of dollars on compute clusters. (06:07):
undefined

Ejaaz: And so therefore it was a pay-to-play game. And then DeepSeek, (06:10):
undefined

Ejaaz: a team out of China, released their model and completely open-sourced it as well. (06:14):
undefined

Ejaaz: And it was as good as OpenAI's Frontier model, which was the top model at the time. (06:19):
undefined

Ejaaz: And the revelation there was, oh, you don't actually just need to chuck a bunch of compute at this. (06:24):
undefined

Ejaaz: There are different techniques and different methods if you get creative about (06:31):
undefined

Ejaaz: how you design your model and how you run the training cluster, (06:35):
undefined

Ejaaz: the training one, which is basically what you need to do to make your model smart, (06:38):
undefined

Ejaaz: you can run it in different ways that is more efficient, consumes less energy, (06:42):
undefined

Ejaaz: and therefore less amount of money, but is as smart, if not smarter, (06:47):
undefined

Ejaaz: than the frontier models that American AI companies are making. (06:52):
undefined

Ejaaz: And this is just a repeat of that, Josh. (06:55):
undefined

Ejaaz: I mean, look at this curve. For those who are looking at this episode on video. (06:57):
undefined

Ejaaz: It is just so clean yeah it's beautiful (07:03):
undefined

Ejaaz: the craziest part about this is when deep (07:06):
undefined

Ejaaz: seek was released they pioneered something called uh reasoning (07:09):
undefined

Ejaaz: or reinforcement learning uh which are two separate (07:12):
undefined

Ejaaz: techniques that made the model super smart um with less energy and less compute (07:15):
undefined

Ejaaz: spend um with this model they didn't even implement that technique at all so (07:19):
undefined

Ejaaz: theoretically this model can get so much more smarter than it already is um (07:24):
undefined

Ejaaz: and they just kind of leveraged a new method to make it as smart as it already is right now. (07:29):
undefined

Ejaaz: So just such a fascinating kind of like progress in research from China. (07:34):
undefined

Ejaaz: And it just keeps on coming out. It's so impressive. (07:39):
undefined

Josh: Yeah, this is this was the exciting part to me is that we're seeing so many (07:42):
undefined

Josh: algorithms or exponential improvements in so many different categories. (07:46):
undefined

Josh: So this was considered a breakthrough by all means. And this wasn't even the (07:49):
undefined

Josh: same type of breakthrough that DeepSeek had. (07:53):
undefined

Josh: So we get this now compounding effect where we have this new training breakthrough (07:55):
undefined

Josh: and then we have DeepSeek who has the reinforcement learning and that hasn't (08:00):
undefined

Josh: even yet been applied to this new model. (08:03):
undefined

Josh: So we get the exponential growth on one end, the exponential growth on the reasoning end, (08:05):
undefined

Josh: those come together and then you get the exponential growth on the hardware (08:10):
undefined

Josh: stack where the GPUs are getting much faster and there's all of these different (08:13):
undefined

Josh: subsets of AI that are compounding on each other and growing and accelerating (08:17):
undefined

Josh: quicker and quicker and what you get is this unbelievable rate of progress and (08:21):
undefined

Josh: that's what we're seeing. So (08:25):
undefined

Josh: reasoning isn't even here yet and we're going to see it soon because it is open (08:26):
undefined

Josh: source so people can apply their own reasoning on top of it i'm sure the moonshot (08:29):
undefined

Josh: team is going to be doing their own reasoning version of this model and i'm (08:33):
undefined

Josh: sure we're going to be getting even more impressive results soon i see you have (08:37):
undefined

Josh: a post up here um about the testing and overall performance can you please share yeah (08:40):
undefined

Ejaaz: Yeah so um this is a tweet that summarizes really well how this model performs (08:46):
undefined

Ejaaz: in relation to other Frontier models. (08:51):
undefined

Ejaaz: And the popular comparison that's taken for Kimi K2 is against Claude. (08:53):
undefined

Ejaaz: So Claude has a bunch of models out. (08:59):
undefined

Ejaaz: Claude 3.5 is its earlier model, and then Claude 4 is its latest. (09:01):
undefined

Ejaaz: And the general take is that this model is just better than those models, (09:05):
undefined

Ejaaz: which is just insane to say, because for so long, Josh, we've said that Claude (09:10):
undefined

Ejaaz: was the best coding model. (09:15):
undefined

Ejaaz: And indeed it was. And then within the span of, what is it, five days? (09:16):
undefined

Ejaaz: Grok 4 released and it just completely blew Claude 4 out of the water in terms of coding. (09:20):
undefined

Ejaaz: Now Kimi K2, an open source model out of China who doesn't even have access (09:26):
undefined

Ejaaz: to the research and kind of proprietary knowledge that a lot of American AI (09:30):
undefined

Ejaaz: companies have also beat it as well, right? (09:34):
undefined

Ejaaz: So it kind of beats Claude at its own game, but it's also cheaper. (09:36):
undefined

Ejaaz: It's 20% the cost of Claude 3.5, which is just an insane thing to say, (09:41):
undefined

Ejaaz: which means that if you are a developer out there that (09:45):
undefined

Ejaaz: wants to try your hand at kind of like vibe coding (09:49):
undefined

Ejaaz: a bunch of things or actually seriously coding something you (09:51):
undefined

Ejaaz: know that's quite novel but you don't have the hands on deck to do that you (09:55):
undefined

Ejaaz: can now spin up a Kimi K2 AI agent actually multiple of them for a very cost-efficient (09:58):
undefined

Ejaaz: reasonable you know salary you don't have to pay like hundreds of thousands (10:05):
undefined

Ejaaz: of dollars or you know hundreds of millions of dollars which is what Meta is (10:09):
undefined

Ejaaz: doing to kind of buy a bunch of these software engineers, (10:12):
undefined

Ejaaz: you can spend, you know, the equivalent of maybe a Netflix subscription or $500 (10:14):
undefined

Ejaaz: to $1,000 a month and spin up your own app. So super, super cool. (10:19):
undefined

Josh: And also one added perk that's there is it's that even if you have a lot of (10:23):
undefined

Josh: GPUs sitting around, you can actually run this model for free. (10:28):
undefined

Josh: So that's the cost if you actually query it from the servers. (10:31):
undefined

Josh: But I'm sure there's going to be companies that have access to XS GPUs. (10:34):
undefined

Josh: They can actually just download the model because it's open source, (10:36):
undefined

Josh: open weights, and they could run it on their own. (10:39):
undefined

Josh: And that brings the cost of compute down to the cost per kilowatt of the energy (10:41):
undefined

Josh: required to run the GPUs. (10:44):
undefined

Josh: So because it's open source, you really start to see these costs decline, (10:46):
undefined

Josh: but the quality doesn't. (10:49):
undefined

Josh: And that's every time we see this, we see a huge productivity unlock in encoding (10:50):
undefined

Josh: output and amount of queries used. It's like, this is freaking awesome. (10:55):
undefined

Ejaaz: Yeah josh i saw something else come up as well so so do you remember when claude (10:58):
undefined

Ejaaz: first released um their frontier model i think it was 3.5 or maybe it was four (11:04):
undefined

Ejaaz: one of their bragging rights was it had a one million uh token context window which. (11:09):
undefined

Josh: Oh yes which was huge (11:16):
undefined

Ejaaz: Yeah which for listeners of the show is huge it's like several uh book novels (11:17):
undefined

Ejaaz: worth um of words or characters you could just bung into one single prompt. (11:23):
undefined

Ejaaz: And the reason why that was such an amazing thing was for a while, (11:28):
undefined

Ejaaz: people struggled to kind of communicate with these AIs because they couldn't set the context. (11:32):
undefined

Ejaaz: There wasn't enough bandwidth within their chat log window for them to say, (11:37):
undefined

Ejaaz: you know, and don't forget this. And then there was this. (11:42):
undefined

Ejaaz: And then, you know, this detail and that detail, there just wasn't enough space. (11:44):
undefined

Ejaaz: And models weren't performing enough to kind of consume all of this in one go. (11:47):
undefined

Ejaaz: And then Claude came out and was like, hey, we have one million context windows. (11:51):
undefined

Ejaaz: Don't worry about it chuck in all the research papers that you want chuck in (11:54):
undefined

Ejaaz: your essay chuck in reference books and we got you um i saw this tweet that (11:57):
undefined

Ejaaz: was uh deleted i think you sent this to me um. (12:01):
undefined

Josh: We got the screenshots we always come with receipts yeah i (12:04):
undefined

Ejaaz: Wonder why they deleted it but uh good catch from you um yeah let's get into this. (12:07):
undefined

Josh: What's your take on it was was first posted i think (12:11):
undefined

Josh: earlier today yeah like an hour ago and then deleted pretty shortly afterwards (12:15):
undefined

Josh: and this is from a woman name crystal crystal works with the moonshot team she (12:19):
undefined

Josh: is part of the team that that released kimmy k2 um and in this post it says (12:23):
undefined

Josh: kimmy isn't just another ai it went viral in china as the first to support (12:28):
undefined

Josh: A 2 million token context window. And then she goes on to say, (12:32):
undefined

Josh: we're an AI lab with just 200 people, which is ministerially small compared (12:36):
undefined

Josh: to a lot of the other labs they're competing with. (12:41):
undefined

Josh: And it was acknowledgement that they had a 2 million token context window. (12:42):
undefined

Josh: And for those who, just a quick refresher on the context window stuff, (12:46):
undefined

Josh: it's imagine you have like a gigantic textbook and you've read it once and you (12:49):
undefined

Josh: close it and you kind of have a fuzzy memory of all the pages. (12:53):
undefined

Josh: The context window allows you to lay all of those out in clear view (12:56):
undefined

Josh: and directly reference every single page so when (12:59):
undefined

Josh: you have two million tokens which is roughly two million words (13:02):
undefined

Josh: of context we're talking about like hundreds and hundreds (13:05):
undefined

Josh: of books and textbooks and knowledge and you could really dump a (13:08):
undefined

Josh: lot of information in this for the ai to readily access and (13:11):
undefined

Josh: that if they release that a two million token (13:13):
undefined

Josh: open source model that's huge (13:17):
undefined

Josh: deal i mean even grok 4 recently i believe (13:20):
undefined

Josh: what did we say it was it was a 256 000 uh token context window something like (13:23):
undefined

Josh: that so grok 4 is one eighth of what they supposedly have accessible right now (13:27):
undefined

Josh: which is a really really big deal um so i'm hoping it was deleted because they (13:32):
undefined

Josh: just don't want to share that not because it's not true i would like to believe (13:37):
undefined

Josh: that it's true because man that'd be pretty epic yeah (13:39):
undefined

Ejaaz: And the people are loving it josh um check out this (13:42):
undefined

Ejaaz: graph from uh open router which basically shows (13:45):
undefined

Ejaaz: uh the split of usage between everyone (13:49):
undefined

Ejaaz: on their platform that are querying different models so for context (13:53):
undefined

Ejaaz: here open router is a website that you can go to (13:56):
undefined

Ejaaz: and you can type up a prompt just like you do at chat gpt and (13:59):
undefined

Ejaaz: you can decide which model your (14:03):
undefined

Ejaaz: prompt goes to or you could let open router decide for you (14:06):
undefined

Ejaaz: and it kind of like divvies up your query so if you have a coding query it's (14:09):
undefined

Ejaaz: probably going to send it to claude or now kimmy k2 or grok4 but if you have (14:13):
undefined

Ejaaz: something that's more like to do with creative writing or something that's like (14:18):
undefined

Ejaaz: a case study it might send it to OpenAI's O3 model, right? So it kind of like decides for you. (14:22):
undefined

Ejaaz: OpenRacha released this graphic, which basically shows that KimiK2 surpassed (14:27):
undefined

Ejaaz: XAI in token market share just a few days after launching, which basically means (14:33):
undefined

Ejaaz: that XAI spent, you know, (14:38):
undefined

Ejaaz: hundreds of billions of dollars training up their Grok4 model, (14:41):
undefined

Ejaaz: which just kind of beat out the competition just last week. (14:43):
undefined

Ejaaz: Then KimiK2 gets released completely open source (14:47):
undefined

Ejaaz: and everyone starts to use that more than (14:50):
undefined

Ejaaz: grok 4 which is just an insane thing to say and (14:53):
undefined

Ejaaz: just shows how rapidly these ai models compete with each other and surpass each (14:56):
undefined

Ejaaz: other um i think part of the reason for this josh is it's open source right (15:00):
undefined

Ejaaz: which means that not only are retail users like myself and yourself using it (15:05):
undefined

Ejaaz: for our daily queries you know uh you know, (15:12):
undefined

Ejaaz: create this recipe for me or whatever, but researchers and builders all over (15:14):
undefined

Ejaaz: the world that have so far been challenged or had this obstacle of pots of money (15:18):
undefined

Ejaaz: basically to start their own AI company now have access to a frontier, (15:25):
undefined

Ejaaz: world-renowned model and can create whatever application, website, (15:30):
undefined

Ejaaz: or product that they want to make. (15:34):
undefined

Ejaaz: So I think that's part of the usage there as well. Do you have any takes on this? (15:36):
undefined

Josh: Yeah, and it's downstream of cost, right? We always see this when a model is (15:40):
undefined

Josh: cheaper and mostly equivalent, the money will always flow to the cheaper model. (15:44):
undefined

Josh: It'll always get more queries. I think it's important to note the different (15:48):
undefined

Josh: use cases of these models. So they're not directly competing head to head on the same benchmarks. (15:51):
undefined

Josh: I think what we see is like when we talk about Claude, it's generally known as the coding model. (15:56):
undefined

Josh: And I don't think like OpenAI's O3 is not really competing directly with Claude (16:00):
undefined

Josh: because it's more of a general intelligence versus a coding specific intelligence. (16:04):
undefined

Josh: K2 is probably closer to a Claude. I would assume where it's really good at (16:08):
undefined

Josh: coding because it uses this mixture of experts. (16:13):
undefined

Josh: And I think that helps it find the tools. It uses this cool new novel thing (16:15):
undefined

Josh: called like multiple tool use. (16:19):
undefined

Josh: So each one of these experts can use a tool simultaneously and they could use (16:21):
undefined

Josh: these tools and work together to get better answers. (16:25):
undefined

Josh: So in the case of coding, this is a home run. (16:27):
undefined

Josh: Like it is very cheap cost per token, very high quality outputs. (16:30):
undefined

Ejaaz: I actually think you can compete with OpenAO3, Josh. Check this out. (16:33):
undefined

Ejaaz: So Rowan, yeah, Rowan Cheng put this out yesterday And he basically goes, (16:38):
undefined

Ejaaz: I think we're at the tipping point for AI-generated writing. (16:42):
undefined

Ejaaz: It's been notoriously bad, but China's Kimi K2, an open-weight model, (16:46):
undefined

Ejaaz: is now topping creative writing benchmarks. (16:51):
undefined

Ejaaz: So just to put that into context, that's like having the top most, I don't know, (16:53):
undefined

Ejaaz: smartest or slightly autistic software engineer, at the top engineering company (17:00):
undefined

Ejaaz: working on AI models, also being the best poet or creative script and directing (17:04):
undefined

Ejaaz: the next best movie or whatever that might be, (17:12):
undefined

Ejaaz: or creating a Harry Potter novel series. (17:14):
undefined

Ejaaz: This model can basically do both. And what it's pointing out here is that compared (17:17):
undefined

Ejaaz: to 03, it tops it. Look at this. Completely beats it. (17:22):
undefined

Josh: Okay, so I take that back. Maybe it is just better at everything. (17:27):
undefined

Josh: Yeah, that's some pretty impressive results. (17:31):
undefined

Ejaaz: I think like what's worth pointing out here is, and I don't know whether any (17:32):
undefined

Ejaaz: of the American AI models do this, Josh, but mixture of experts seems to be clearly a win here. (17:38):
undefined

Ejaaz: The ability to create an incredibly smart model doesn't come without, (17:43):
undefined

Ejaaz: you know, this large storage load that is needed, right? One trillion parameters. (17:47):
undefined

Ejaaz: But then combining it with the ability to be like, Like, hey, (17:52):
undefined

Ejaaz: you don't need to query the entire thing. (17:56):
undefined

Ejaaz: We've got you. We have a smart router, which basically pulls on the best experts, (17:58):
undefined

Ejaaz: as you described earlier, for whatever relevant query you have. (18:02):
undefined

Ejaaz: So if you have a creative writing task or if you have a coding thing, (18:05):
undefined

Ejaaz: we'll send it to two different departments of this model. (18:08):
undefined

Ejaaz: That's a really huge win. Do any other American models use this? (18:12):
undefined

Josh: Well, the first thing that came to my mind when you said that is Grok4, (18:16):
undefined

Josh: which doesn't exactly use this, but uses a similar thing, where instead of using (18:19):
undefined

Josh: a mixture of experts, It uses a mixture of agents. (18:23):
undefined

Josh: So Grok4 Heavy uses a bunch of distributed agents that are basically clones of the large model. (18:26):
undefined

Josh: But that takes up a tremendous amount of compute. And that is the $300 a month plan. (18:32):
undefined

Ejaaz: That's replicating Grok4 though, right? So that's like taking the model and copy pasting it. (18:36):
undefined

Ejaaz: So let's say Grok4 was one trillion parameters just for ease of comparison. (18:41):
undefined

Ejaaz: That's like creating, if there was four agents, that's four trillion parameters, (18:47):
undefined

Ejaaz: right? So it's still pretty costly and inefficient. (18:51):
undefined

Josh: Is that what you're saying no it's the actually the opposite direction of k2 (18:53):
undefined

Josh: so what they have used is just and again this is kind of similar to tracking (18:57):
undefined

Josh: sentiment between the united states and china where the united states will throw (19:02):
undefined

Josh: compute at it where china will throw like (19:06):
undefined

Josh: kind of clever resource at it so grok yeah (19:09):
undefined

Josh: when they use their mixture of agents it actually just costs a lot more (19:12):
undefined

Josh: money whereas k2 when they use their mixture of (19:14):
undefined

Josh: experts well it costs a lot less instead of using 4 trillion (19:17):
undefined

Josh: parameters in this case it uses just 32 billion and it (19:20):
undefined

Josh: kind of copies that 32 billion over and over and it's really it's a really (19:23):
undefined

Josh: elegant solution that seems to be (19:26):
undefined

Josh: yielding pretty comparable results so i think as we (19:29):
undefined

Josh: see these efficiency upgrades i'm sure they will (19:32):
undefined

Josh: eventually trickle down into the united states models and when they do that (19:34):
undefined

Josh: is going to be a huge unlock in terms of cost per token in terms of the smaller (19:38):
undefined

Josh: distilled models that we're going to be able to run on our own computers um (19:43):
undefined

Josh: but yeah i don't know of any who are also using it at this scale it might be (19:47):
undefined

Josh: novel just to k2 right now and (19:51):
undefined

Ejaaz: And i think that this is the method that probably scales the best josh like. (19:53):
undefined

Josh: Yeah it makes sense efficiency (19:58):
undefined

Ejaaz: Always wins at the end right and to see um this kind of innovation come pretty (20:00):
undefined

Ejaaz: early on in a technology's life cycle is just super impressive to see, (20:06):
undefined

Ejaaz: Another thing I saw is there's two different versions of this model, I believe. (20:11):
undefined

Ejaaz: There's something called Kimi K2 Base, which is basically the model for researchers (20:16):
undefined

Ejaaz: who want full control for fine-tuning and custom solutions, right? (20:22):
undefined

Ejaaz: So imagine this model as the entire parameter set. So you have access to one (20:26):
undefined

Ejaaz: trillion parameters, all the weight designs and everything. (20:32):
undefined

Ejaaz: And if you're a nerd that wants to nerd out you can (20:36):
undefined

Ejaaz: go crazy you know if you have like your own gpu (20:38):
undefined

Ejaaz: cluster at home or if you happen to have a convenient (20:41):
undefined

Ejaaz: warehouse full of of servers that you weirdly (20:44):
undefined

Ejaaz: have access to you can go crazy with it you can if you (20:48):
undefined

Ejaaz: think about like um the early gaming days of counter-strike and then you could (20:51):
undefined

Ejaaz: like mod it you can basically mod this uh model to your heart's desire and then (20:54):
undefined

Ejaaz: there's a second version called k2 instruct which is for drop-in general purpose (21:00):
undefined

Ejaaz: chat and AI agent experiences. (21:06):
undefined

Ejaaz: So this is kind of like at the consumer level, if you're experimenting with (21:08):
undefined

Ejaaz: these things, or if you want to run an experiment at home on a specific use (21:12):
undefined

Ejaaz: case, you can kind of like take that away and do that for yourself. (21:16):
undefined

Ejaaz: That's how I understand it, Josh. Do you have any takes on this? (21:19):
undefined

Josh: That makes sense. And I think that second version that you're describing is (21:22):
undefined

Josh: what's actually available publicly on their website, right? (21:25):
undefined

Josh: So if you go to Kimmy.com, it has a text box. It looks just like ChatGPT like you're used to. (21:27):
undefined

Josh: And that's where you can run that second tier model which (21:31):
undefined

Josh: um you described as that's the the drop in general purpose (21:34):
undefined

Josh: chat and then yeah for the the hardcore researchers there's (21:37):
undefined

Josh: a github repo and the github repo has all the weights and all the code and (21:40):
undefined

Josh: you can really download it dive in use the full thing i (21:43):
undefined

Josh: was playing around with the kimmy tool and it's it's really cool (21:46):
undefined

Josh: it's fast oh i mean it's lightning fast if you (21:49):
undefined

Josh: go from a reasoning model to an inference model like kimmy (21:52):
undefined

Josh: you get responses like this like when (21:55):
undefined

Josh: i'm using grok 4 or o3 i'm sitting there sometimes for a couple minutes it's (21:57):
undefined

Josh: waiting for an answer this you type it in and it just types back right away (22:01):
undefined

Josh: no time waiting so it's it's kind of refreshing to see that but it's also a (22:05):
undefined

Josh: testament to how impressive it is i'm getting great answers and it's just spitting (22:08):
undefined

Josh: it right out so what happens when they add the reasoning layer on top well it's (22:11):
undefined

Josh: probably going to get pretty freaking good (22:14):
undefined

Ejaaz: So the trend we're seeing, and we saw this last week with Grok4, (22:16):
undefined

Ejaaz: is typically we're expected to wait a while when we send a prompt to a breakthrough (22:21):
undefined

Ejaaz: model because it's thinking, it's trying to basically replicate what we have in our brains up here. (22:27):
undefined

Ejaaz: And now it's just getting much quicker and much smarter and much cheaper. (22:33):
undefined

Ejaaz: So the long story short is these incredibly powerful, I kind of think about (22:38):
undefined

Ejaaz: it as how we went from massive desktop computers to slick cell phones, (22:43):
undefined

Ejaaz: Josh, and then we're going to eventually have chips in our brain. (22:49):
undefined

Ejaaz: AI is just kind of like fast tracking that entire life cycle within like a couple (22:51):
undefined

Ejaaz: of years, which is just insane. (22:55):
undefined

Josh: And these efficiency improvements are really exciting because you can see how (22:57):
undefined

Josh: quickly they're shrinking and allowing eventually for those incredible models (23:00):
undefined

Josh: to just run on our phones. (23:05):
undefined

Josh: So there's totally a world a year from now in which like a (23:06):
undefined

Josh: grok 403 kimmy k2 capable model (23:09):
undefined

Josh: is small enough that it could just run inside of in our (23:12):
undefined

Josh: phone and run on a mobile device or run locally on a laptop (23:15):
undefined

Josh: or you're offline and you kind of have this portable intelligence (23:18):
undefined

Josh: that's available everywhere anytime even if (23:21):
undefined

Josh: you're not connected to the world and that seems really cool (23:24):
undefined

Josh: like we were talking a few episodes ago about apple's um local (23:27):
undefined

Josh: free ai inference running on an iphone (23:30):
undefined

Josh: but how the base models still kind of suck like they don't really do (23:33):
undefined

Josh: anything super interesting they're basically good enough to do what (23:36):
undefined

Josh: you would expect siri to do but can't do and these (23:38):
undefined

Josh: models as we get more and more breakthroughs like this that allow you to (23:41):
undefined

Josh: run much larger parameter counts (23:44):
undefined

Josh: on a much smaller device it's going to start really (23:47):
undefined

Josh: super powering these mobile devices and i can't help but think about the open (23:50):
undefined

Josh: ai hardware device i'm like wow that'd be super cool if you had like oh three (23:54):
undefined

Josh: running locally in the middle of the jungle somewhere with no service and you (23:58):
undefined

Josh: still had access to all of its capabilities like that's probably coming downstream (24:02):
undefined

Josh: of breakthroughs like this where we get really big efficiency unlocks (24:06):
undefined

Ejaaz: I mean, it's not just efficiency, though, right? It's the fact that if you can (24:10):
undefined

Ejaaz: run it locally on your device, it can have access to all your private data without (24:14):
undefined

Ejaaz: exposing all of that to the model providers themselves, right? (24:18):
undefined

Ejaaz: So one of the major concerns of not just AI models, but also with mobile phones is privacy. (24:22):
undefined

Ejaaz: I don't want to share all my kind of like private health, financial, (24:29):
undefined

Ejaaz: and social media data, because then you're just going to have everything on (24:32):
undefined

Ejaaz: me and you're going to use me. (24:35):
undefined

Ejaaz: You're going to use me as a product, right? And that's kind of like been the (24:36):
undefined

Ejaaz: quota for the last decade in tech. (24:39):
undefined

Ejaaz: And so with AI, that's a supercharged version of it. The information gets more (24:42):
undefined

Ejaaz: personal. It's not just your likes. (24:45):
undefined

Ejaaz: It's, you know, where Josh shops every day and, you know, who he's dating and (24:47):
undefined

Ejaaz: all these kinds of things, right? (24:52):
undefined

Ejaaz: And that becomes quite personal and intrusive very quickly. (24:53):
undefined

Ejaaz: So the question then becomes, how can we have the magic of an AI model without it being so obtrusive? (24:57):
undefined

Ejaaz: And that is open source locally run AI or privately run AI. and Kimi K2 is a (25:03):
undefined

Ejaaz: frontier model that can technically run on your local device if you set up the right hardware for it. (25:09):
undefined

Ejaaz: And the way that we're trending, you can basically end up having that on your (25:14):
undefined

Ejaaz: device, which is just a huge unlock. (25:17):
undefined

Ejaaz: And if you can imagine how you use OpenAI 03 right now, Josh, (25:20):
undefined

Ejaaz: right? I know you use it as much as I do. (25:25):
undefined

Ejaaz: The reason why you and I use it so much isn't just because it's so smart, (25:27):
undefined

Ejaaz: but it's because it remembers everything about us. (25:30):
undefined

Ejaaz: But I hate that Sam knows or has access to all that data. (25:33):
undefined

Ejaaz: I hate that if he chooses to switch on personalized ads, which is currently (25:36):
undefined

Ejaaz: the model where most of these tech companies make money right now, (25:41):
undefined

Ejaaz: he can, and I've got nothing to do about it because I don't want to use any (25:44):
undefined

Ejaaz: other model apart from that. (25:48):
undefined

Ejaaz: But if there was a locally run (25:49):
undefined

Ejaaz: model that had access to all the memory and context, I'd use that instead. (25:51):
undefined

Josh: And this is suspicious. I mean, this is a different conversation in total, (25:56):
undefined

Josh: but isn't it interesting how other companies haven't really leaned into memory (26:00):
undefined

Josh: when it's seemingly the most important mode that there is. (26:04):
undefined

Josh: Like Grok4 doesn't have good memory rolled out. Gemini doesn't really have memory. (26:07):
undefined

Josh: There's no, Claude doesn't have memory the way that OpenAI does. (26:11):
undefined

Josh: Yet it's the single biggest reason why we both continue to go back to ChatGPT and OpenAI. (26:13):
undefined

Josh: So that's just been an interesting thing. I mean, Kimmy is open source. (26:19):
undefined

Josh: I wouldn't expect them to lean too much into it. But for these closed source (26:21):
undefined

Josh: models, that's just, it's another interesting just observation. (26:24):
undefined

Josh: Like, hey, the most important thing isn't, doesn't seem to be prioritized by (26:27):
undefined

Josh: other companies just yet. (26:30):
undefined

Ejaaz: Why do you think that is so so my theory um at least from xai or grok force (26:31):
undefined

Ejaaz: perspective is elon's like okay i'm not going to be able to build a better chat (26:37):
undefined

Ejaaz: bot or chat messenger than openai has there's not too many features i can um. (26:43):
undefined

Ejaaz: Set Grok 4 apart, then that O3 doesn't already do, right? (26:51):
undefined

Ejaaz: But where I can beat O3 is at the app layer. (26:55):
undefined

Ejaaz: I can create a better app store than they have because I haven't really created (26:59):
undefined

Ejaaz: one that is sticky enough for users to continually use. (27:05):
undefined

Ejaaz: And I can use that data set to then unlock memory and context at that point, right? (27:09):
undefined

Ejaaz: So I just saw today that they released, they (27:15):
undefined

Ejaaz: being um xai released a new feature for grok 4 (27:18):
undefined

Ejaaz: called i think it's uh companions josh um (27:21):
undefined

Ejaaz: and it's basically these yeah these animated um (27:25):
undefined

Ejaaz: avatar like um characters so they basically look like they're from an anime (27:29):
undefined

Ejaaz: show and you know how you can use voice mode in open ai and you can kind of (27:33):
undefined

Ejaaz: like talk to this uh realistic human sounding ai you now have a face and a character (27:37):
undefined

Ejaaz: on grok 4 and it's really entertaining, Josh. (27:44):
undefined

Ejaaz: Like I find myself kind of like engaged in this thing because I'm not just typing words. (27:47):
undefined

Ejaaz: It's not just this binary to and fro with this chat messenger. (27:51):
undefined

Ejaaz: It's this human, this cute, attractive human that I'm just like now speaking to. (27:55):
undefined

Ejaaz: And I think that that's the strategy that a lot of these AI companies, (28:00):
undefined

Ejaaz: if I had to guess, are taking to kind of like seed their user base before they (28:04):
undefined

Ejaaz: unlock memory. I don't know whether you have a take on that. (28:08):
undefined

Josh: Yeah, I have a fun little demo. I actually played around with it this morning (28:10):
undefined

Josh: and I was using it totally unhinged, no filter, very vulgar, (28:13):
undefined

Josh: but like kind of fun. It's like a fun little party trick. (28:18):
undefined

Josh: And yeah, I mean, that was a surprise to me this morning when I saw that rolled (28:21):
undefined

Josh: out. I was like, huh, that doesn't really seem like it makes sense. (28:25):
undefined

Josh: But I think they're just having fun with it. (28:27):
undefined

Ejaaz: Can we for a second talk about the team? (28:29):
undefined

Ejaaz: So we've mentioned just now how they've all come from China and how China's (28:33):
undefined

Ejaaz: like really advancing open source AI models, and they've completely beat out (28:37):
undefined

Ejaaz: the competition in America, Mata's Lama being the obvious one. (28:41):
undefined

Ejaaz: We've got Kwen from Alibaba. (28:45):
undefined

Ejaaz: We've got Deep Seek R1. Now we have Kimi K2. The team is basically... (28:47):
undefined

Ejaaz: The AI Avengers of China, Josh. So these three co-founders all have deep AI (28:53):
undefined

Ejaaz: ML backgrounds that hail from the top American universities, (29:00):
undefined

Ejaaz: such as Carnegie Mellon. (29:04):
undefined

Ejaaz: One of them has a PhD from Carnegie Mellon in machine learning, (29:05):
undefined

Ejaaz: which is basically, for those of you who don't know, is like God-tier degree for AI. (29:08):
undefined

Ejaaz: That means you're desirable and hireable by every other AI company after you graduate. (29:14):
undefined

Ejaaz: But it's not just that. They also have credibility and degrees from the top universities in China. (29:19):
undefined

Ejaaz: Especially this one university called Tsinghua, which seemed to be the top of their field. (29:23):
undefined

Ejaaz: I looked them up on rankings for AI universities globally, and they often come (29:28):
undefined

Ejaaz: in number three or four in the top 10 AI universities. So pretty impressive from there. (29:34):
undefined

Ejaaz: But what I found really interesting, Josh, was one of the co-founders was an (29:41):
undefined

Ejaaz: expert in training AI models on low-cost optimized hardware. (29:46):
undefined

Ejaaz: And the reason why I mentioned this is it's no secret that if you want a top (29:51):
undefined

Ejaaz: frontier AI model, you need to train it on NVIDIA's GPUs. (29:58):
undefined

Ejaaz: You need to train it on NVIDIA's hardware. (30:03):
undefined

Ejaaz: NVIDIA's market cap, I think, at the end of last week, surpassed $4 trillion. (30:06):
undefined

Ejaaz: That's $4 trillion with a T. That is more than the current GDP of the entire British economy. (30:11):
undefined

Josh: Where I hail from. And the largest in the world. (30:18):
undefined

Ejaaz: And there's never been. (30:19):
undefined

Josh: A bigger company (30:20):
undefined

Ejaaz: There's never been a bigger company it it's just (30:21):
undefined

Ejaaz: insane to grab your head around and it's not without (30:24):
undefined

Ejaaz: reason they supply basically or they have a (30:27):
undefined

Ejaaz: grasp or a monopoly on the hardware that (30:30):
undefined

Ejaaz: is needed to train top models now kimmy k2 (30:33):
undefined

Ejaaz: comes along casually drops a one trillion parameter model one of the largest (30:36):
undefined

Ejaaz: models ever released um and it's trained on hardware that isn't nvidia's um (30:40):
undefined

Ejaaz: and jensen huang i i need to find this clip josh but But Jensen Huang basically (30:46):
undefined

Ejaaz: was on stage, I think it was at a private conference maybe yesterday, (30:50):
undefined

Ejaaz: but he was quoted as saying 50% of the top AI researchers are Chinese and are from China. (30:53):
undefined

Ejaaz: And what he was implicitly getting at is they're a real threat now. (31:01):
undefined

Ejaaz: I think for the last decade, we've kind of been like, ah, yeah, (31:05):
undefined

Ejaaz: China's just going to copy paste everything that comes out of America's tech sector. (31:08):
undefined

Ejaaz: But when it comes to AI, we've kind of like maintained the same mindset up until (31:13):
undefined

Ejaaz: now where they're really just competing with us. (31:17):
undefined

Ejaaz: And if they have the hardware, they have the ability to research new techniques (31:19):
undefined

Ejaaz: to train these models, like DeepSeek's reinforcement learning and reasoning, (31:24):
undefined

Ejaaz: and then Kimi K2's kind of like efficient training run, which you showed earlier. (31:28):
undefined

Ejaaz: They've come to play, Josh. And I think it's worth highlighting that China has (31:33):
undefined

Ejaaz: a very strong grasp on top AI researchers in the world and models that are coming out of it. (31:38):
undefined

Josh: Where are their $100 million offers? I haven't seen any of those coming through. (31:45):
undefined

Josh: None, dude. The most impressive thing is that they do it without the resources that we have. (31:49):
undefined

Josh: Imagine if they did have access to the clusters of these like H100s that NVIDIA is making. (31:55):
undefined

Josh: I mean, that would be, would they crush us? (32:01):
undefined

Josh: And we kind of have this timeline here where we're kind of running up against (32:03):
undefined

Josh: the edge of energy that we have available to us to train these massive models. (32:08):
undefined

Josh: Whereas China does not have that constraint. They have significantly more energy to power these. (32:13):
undefined

Josh: So in the event, the inevitable event that they do get the chips and they are (32:17):
undefined

Josh: able to train at the scale that we are, I'm not sure we're able to continue (32:22):
undefined

Josh: our rate of acceleration in terms of hardware manufacturing, (32:26):
undefined

Josh: large training as fast as they will. (32:30):
undefined

Josh: And they already have done the hard work on the software efficiency side. (32:32):
undefined

Josh: They've cranked out every single efficiency because they are doing it on constrained hardware. (32:36):
undefined

Josh: So it's going to create this really interesting effect where they're coming (32:40):
undefined

Josh: at it from the like ingenuity software approach we're coming at it from the (32:43):
undefined

Josh: brute force throw a lot of compute added approach and we'll see where both both (32:47):
undefined

Josh: sides end up um but it's clear that china is still behind because they are the (32:51):
undefined

Josh: ones open sourcing the models and we know at this point now if you're open sourcing (32:54):
undefined

Josh: your model you're doing it because you're behind (32:58):
undefined

Ejaaz: Yeah yeah i mean one thing (33:00):
undefined

Ejaaz: that did surprise me josh was that they released a one (33:03):
undefined

Ejaaz: trillion parameter open source model i i didn't (33:05):
undefined

Ejaaz: expect them to catch up that quickly um like one (33:08):
undefined

Ejaaz: trillion is a lot um yeah another thing (33:11):
undefined

Ejaaz: i was thinking about is china has dominated (33:14):
undefined

Ejaaz: hardware for so long now so it wouldn't (33:17):
undefined

Ejaaz: really surprise me if like i don't know a (33:20):
undefined

Ejaaz: couple years from now they're producing better models (33:23):
undefined

Ejaaz: at specific things basically because they have better (33:27):
undefined

Ejaaz: hardware than america than the west um but (33:30):
undefined

Ejaaz: where i think the west will continue to dominate (33:33):
undefined

Ejaaz: is at the application layer and i don't (33:36):
undefined

Ejaaz: know if i was a betting man i would say that most of the money is eventually going (33:39):
undefined

Ejaaz: to be made on the application side of things i think grok (33:42):
undefined

Ejaaz: 4 is starting to um kind of show that (33:45):
undefined

Ejaaz: with all these different kinds of novel features that they're releasing i i (33:48):
undefined

Ejaaz: don't know if you've seen some of the games that are being produced from grok (33:52):
undefined

Ejaaz: 4 josh but it is ultimately insane and i haven't seen any similar examples come (33:55):
undefined

Ejaaz: out of uh asia from any of their ai models even when they have access to american (33:59):
undefined

Ejaaz: models So I still think America dominates at the app layer. (34:03):
undefined

Ejaaz: But Josh, I just came across this tweet, which you reminded me of earlier. (34:06):
undefined

Ejaaz: Tell me about OpenAI's strategy to open source model, because I got this tweet (34:11):
undefined

Ejaaz: pulled up from Sam Altman, which is kind of hilarious. (34:16):
undefined

Josh: Yeah. All right. So this week, if you remember from our episode last week, (34:19):
undefined

Josh: we were excited about talking about OpenAI's new open source model. (34:23):
undefined

Josh: OpenAI, open source model, all checks out. This was going to be the big week. (34:27):
undefined

Josh: They released their new flagship open source. Well, conveniently, (34:30):
undefined

Josh: I think the same day as K2 launched, later in the day, or perhaps the very next morning. (34:35):
undefined

Josh: Sam Altman posted a tweet. He says, Hey, we plan to launch our open weights model next week. (34:39):
undefined

Josh: We are delaying it. We need time to run additional safety tests and review high-risk (34:44):
undefined

Josh: areas. We are not yet sure how long it will take us. (34:48):
undefined

Josh: While we trust the community will build great things with this model, (34:50):
undefined

Josh: once weights are out, they can't be pulled back. This is new for us and we want to get it right. (34:54):
undefined

Josh: Sorry to be the bearer of bad news. We are working super hard. (34:58):
undefined

Josh: So there's a few points of speculation. The first, obviously, (35:01):
undefined

Josh: being, did you just get your ass handed to you and now you are going back to (35:05):
undefined

Josh: reevaluate before you push out a remodel? (35:08):
undefined

Josh: So that's one possible thing where they saw K2. They were like, (35:11):
undefined

Josh: oh, boy, this is pretty sweet. (35:14):
undefined

Josh: This is our first open source model. We probably don't want to be lower than them. (35:16):
undefined

Josh: And there is this second point of speculation, which, Ejaz, you mentioned to (35:21):
undefined

Josh: me a little earlier today, where maybe something went wrong with the training one. (35:24):
undefined

Josh: And it's not quite that they're getting beat up by a Chinese company. (35:28):
undefined

Josh: Is that like they actually made a mistake on their own accord and can you explain (35:32):
undefined

Josh: to me specifically what that might be what the speculation is at least yeah (35:37):
undefined

Ejaaz: Well i'll keep it short i think it was a little racist under (35:40):
undefined

Ejaaz: the hood and i i can't find the tweet but basically (35:43):
undefined

Ejaaz: one of these um ai researchers slash (35:46):
undefined

Ejaaz: product builders on x got access to (35:50):
undefined

Ejaaz: the model supposedly according to him and he tested it (35:53):
undefined

Ejaaz: out uh in the background and he said yeah it's it's (35:56):
undefined

Ejaaz: not really an intelligence thing it's just worse than (35:59):
undefined

Ejaaz: what uh you'd expect from an alignment and uh consumer facing approach it was (36:02):
undefined

Ejaaz: it was ill-mannered it was saying some pretty wild shit kind of the stuff that (36:08):
undefined

Ejaaz: you'd expect coming out of 4chan um and so sam altman decided to delay whilst (36:12):
undefined

Ejaaz: they kind of like figured out why um it was kind of acting out. (36:17):
undefined

Josh: Got it okay so we'll leave (36:21):
undefined

Josh: that speculation where it is there's a there's a funny post (36:24):
undefined

Josh: that i'll actually share with you if you want to throw it up which was actually from elon (36:27):
undefined

Josh: and we'll abbreviate but it was like elon was basically saying um (36:30):
undefined

Josh: it's hard to avoid the the libtard slash (36:33):
undefined

Josh: mecha hitler like approach both of them (36:37):
undefined

Josh: because they're on so polar opposite ends of the spectrum and he said he spent (36:40):
undefined

Josh: several hours trying to solve this problem with the system prompt but there's (36:43):
undefined

Josh: too much garbage coming in at the foundation model level so basically i mean (36:47):
undefined

Josh: what happens with these models is you train them based on all the human knowledge (36:50):
undefined

Josh: that exists right so everything that we've believed all the ideas that we've (36:53):
undefined

Josh: shared it's been fed into these models. (36:57):
undefined

Josh: And what happens is you can try to adjust how they interpret this data through (36:59):
undefined

Josh: the system prompt, which is basically an instruction that every single query (37:03):
undefined

Josh: gets passed through, but at some point is reliant on this swath of human data that is just (37:06):
undefined

Josh: It's too overbearing. And that's kind of what Elon shared. (37:13):
undefined

Josh: And the difference between OpenAI and Grok is that Grok will just ship the crazy (37:16):
undefined

Josh: update. And that's what they did. And they caught a lot of backlash from it. (37:20):
undefined

Josh: But what I find interesting and what I'm sure OpenAI will probably follow is (37:22):
undefined

Josh: this last paragraph where he says, our V7 foundation model should be much better. (37:26):
undefined

Josh: And we're being far more selective about training data rather than just training on the entire internet. (37:30):
undefined

Josh: So what they're planning to do to solve this problem, which is what I assume (37:34):
undefined

Josh: OpenAI probably ran into in the case that the AI training model kind of went (37:37):
undefined

Josh: off the rails and it started saying bad things about lots of people is that (37:40):
undefined

Josh: you kind of have to rebuild the foundation model with new sets of data. (37:44):
undefined

Josh: And in the case of Grok, I know one of the intentions for v7 is actually to (37:49):
undefined

Josh: generate its own database of data based on synthetic data from their models. (37:52):
undefined

Josh: And I'm assuming OpenAO will probably have to do this too if they want to calibrate. (37:57):
undefined

Josh: A lot of times people call that the temperature, which is the like variance (38:01):
undefined

Josh: of aggression in which a model uses. (38:05):
undefined

Josh: And I don't know, I think we're gonna start to see interesting approaches from (38:08):
undefined

Josh: that because as they get smarter, you really don't want them to necessarily (38:11):
undefined

Josh: have these evil traits as the default. (38:15):
undefined

Josh: And it's very hard to get around that when you train them on the data that they've been trained on so far. (38:18):
undefined

Ejaaz: It just goes to show how, I guess, cumbersome it is to train these models, (38:24):
undefined

Ejaaz: Josh. It's such a hard thing. (38:30):
undefined

Josh: Yeah. Yeah. (38:31):
undefined

Ejaaz: It's not something that you can just kind of like jump into the code and tweak a few things. (38:32):
undefined

Ejaaz: Most of the time you don't know what's wrong with the model or where it went (38:37):
undefined

Ejaaz: wrong. I mean, we've talked about this on a previous episode, but (38:40):
undefined

Ejaaz: So essentially, if you build out this model, right, you spend hundreds of millions (38:44):
undefined

Ejaaz: of dollars, and then you feed it a query. (38:48):
undefined

Ejaaz: So you put something in and then you wait to see what it spits out. (38:51):
undefined

Ejaaz: You don't really know what it's going to spit out. You can't predict it. (38:54):
undefined

Ejaaz: It's completely probabilistic. and so if you (38:58):
undefined

Ejaaz: release a model and it starts being a little racist or uh (39:01):
undefined

Ejaaz: you know um kind of crazy uh you (39:04):
undefined

Ejaaz: have to kind of like go back to the drawing board and you have (39:07):
undefined

Ejaaz: to analyze many different sectors of of this model (39:10):
undefined

Ejaaz: like was it the data that was poisoned or was it the way that we trained it (39:13):
undefined

Ejaaz: or maybe it was a particular model weight that we tweaked too much or whatever (39:16):
undefined

Ejaaz: that might be so i i think over time it's going to get a lot easier once we (39:21):
undefined

Ejaaz: understand how these models actually work but my god it must be so expensive (39:25):
undefined

Ejaaz: to just continually rerun and retrain these models. (39:29):
undefined

Josh: Yeah when you think about a coherent cluster of 200 (39:32):
undefined

Josh: 000 gpus the amount of energy the amount (39:35):
undefined

Josh: of resources just to to retrain a mistake is is huge so i think i mean the more (39:37):
undefined

Josh: we go into it the deeper we get the more it kind of makes sense paying so much (39:42):
undefined

Josh: money for talent to avoid these mistakes where if you pay a hundred million (39:46):
undefined

Josh: dollars for one employee who will give you a strategic advantage to avoid having (39:50):
undefined

Josh: to do another training run, that will cost you more than $100 million. (39:54):
undefined

Josh: You've already, you're already in the profit. So you kind of start to see the (39:57):
undefined

Josh: scale, the complexity, the difficulties. (40:01):
undefined

Josh: I do not envy the challenges that some of these engineers have to face. (40:03):
undefined

Josh: Although I do envy the- I envy the salary. (40:07):
undefined

Ejaaz: I envy the salary, Josh. (40:09):
undefined

Josh: I envy the salary and I envy the adventure. Like how cool must that be trying (40:11):
undefined

Josh: to build super intelligence for the world as a human for the first time in like (40:14):
undefined

Josh: the history of everything. (40:18):
undefined

Josh: So it's gotta be pretty fun. This is where we're at now with the open source (40:20):
undefined

Josh: models closed source models k2's pretty epic i think that's a home run i think (40:24):
undefined

Josh: we've crowned a new model today um do you have any closing thoughts anything (40:28):
undefined

Josh: you want to add before we wrap up here this is pretty amazing i (40:32):
undefined

Ejaaz: Think i'm most excited uh for the episode that we're probably going to release (40:35):
undefined

Ejaaz: a week from now josh when we've seen what people have built with this open source (40:40):
undefined

Ejaaz: model that's the best part about this by the way just to remind the listener that, (40:44):
undefined

Ejaaz: anyone can take this model right now you if you're listening to this can take (40:49):
undefined

Ejaaz: this model right now run it locally at home and tweak it to your preference (40:52):
undefined

Ejaaz: now yes it's going to be you know you kind of need to know how to tweak model (40:56):
undefined

Ejaaz: weights and stuff but i think we're going to see some really cool applications (41:00):
undefined

Ejaaz: get released over the next week and i'm excited to play around with them personally. (41:03):
undefined

Josh: Yeah if you're listening to this um and you can (41:07):
undefined

Josh: run this model let us know because that means you have quite a solid uh (41:09):
undefined

Josh: rig at your home yeah i'm not sure the average person is (41:12):
undefined

Josh: going to be able to run this but that is the beauty of the open weights is that anybody (41:15):
undefined

Josh: with the capability of running this can do so they (41:18):
undefined

Josh: could tweak it how they like and now they have access to the new (41:21):
undefined

Josh: best open source model in the world which i mean just a (41:24):
undefined

Josh: couple months ago from now would have been the best model in the (41:27):
undefined

Josh: world so it's moving really quickly it's really accessible and (41:29):
undefined

Josh: i'm sure as the weeks go by i mean hopefully we'll get open ai's model open (41:33):
undefined

Josh: source model soon in the next few weeks we'll be able to cover that but until (41:37):
undefined

Josh: then just lots of stuff going on this was uh another great episode so thank (41:41):
undefined

Josh: you everyone for tuning in again for rocking with us We actually plan on making this like 20 minutes, (41:46):
undefined

Josh: but we just kind of kept tailing off into more interesting things. (41:50):
undefined

Josh: There's a lot of interesting stuff to talk about. I mean, there's really, (41:53):
undefined

Josh: you could take this in a lot of places. (41:56):
undefined

Josh: So hopefully this was interesting. (41:58):
undefined

Josh: Go check out Kimmy K2. It's really, really impressive. It's really fast. (42:00):
undefined

Josh: It's really cheap. If you're a developer, give it a try. (42:05):
undefined

Josh: And yeah, that's been another episode. We'll be back again later this week with (42:07):
undefined

Josh: another topic. and just keep on chugging along as the frontier of AMI models continues to head west. (42:11):
undefined

Ejaaz: So also we'd love to hear from you guys. So if you have any suggestions on things (42:18):
undefined

Ejaaz: that you want us to talk more about, or maybe there's like some weird model (42:23):
undefined

Ejaaz: or feature that you just don't understand and maybe we can do a job at explaining it, just message us. (42:27):
undefined

Ejaaz: Our DMs are open or respond to any of our tweets and we'll be happy to oblige. (42:32):
undefined

Josh: Yeah, let us know. If there's anything cool that we're missing, (42:37):
undefined

Josh: send it our way and we'll cover it. That'd be great. (42:40):
undefined

Josh: But yeah, we're all going on the journeys together. We're learning this as we go. (42:42):
undefined

Josh: So hopefully today was interesting. And if you did enjoy it, (42:45):
undefined

Josh: please share with friends, likes, comment, subscribe, all the great things. (42:47):
undefined

Josh: And we will see you on the next episode. (42:50):
undefined

Ejaaz: Thanks for watching. See you guys. See you. (42:52):
undefined
Advertise With Us

Popular Podcasts

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.