All Episodes

December 9, 2025 27 mins

The groundbreaking Alpha Arena experiment involved eight AI trading models against each other. Grok 4.2 emerges as the standout winner, achieving 60% profit in just two weeks despite the volatility that affected many competitors.

What does this experiment mean for you? With strategies and behavioral patterns, we need to question the balance between AI trading success and necessary human oversight.

------
🌌 LIMITLESS HQ: LISTEN & FOLLOW HERE ⬇️
https://limitless.bankless.com/
https://x.com/LimitlessFT

------
TIMESTAMPS

0:00 Intro
1:39 Season 1 Results
2:39 Transition to Season 1.5
4:22 Mystery Model Revealed
5:55 Competition Breakdown
8:09 Insights from Competition
9:56 Model Trading Styles
12:16 AI Personalities in Trading
14:11 Comparing Model Performances
16:36 Limitations and Future Potential
19:53 Trusting AI with Investments
24:20 Future of AI Trading Tutorials

------
RESOURCES

Josh: https://x.com/JoshKale

Ejaaz: https://x.com/cryptopunk7213

------
Not financial or tax advice. See our investment disclosures here:
https://www.bankless.com/disclosures⁠

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Josh: Imagine this, you give eight of the world's most powerful AI models $10,000 (00:00):
undefined

Josh: each and tell them, go trade real stocks. (00:04):
undefined

Josh: No paper trading, but real money with real risk. And two weeks later, (00:06):
undefined

Josh: most of them have lost a painful amount of cash, which I guess is kind of expected. (00:10):
undefined

Josh: The kind of drawdowns that would get a human portfolio manager totally fired. (00:14):
undefined

Josh: But now, they ran the same experiment again, except this time with much higher stakes. (00:17):
undefined

Josh: There's $320,000 at stake. And we've talked about Alpha Arena before in a previous (00:22):
undefined

Josh: episode, which I highly recommend checking out. (00:27):
undefined

Josh: But now we have the new results from the new season, season 1.5. (00:29):
undefined

Josh: And what was exciting is that there was a very clear and obvious winner, (00:33):
undefined

Josh: but that winner was a mystery. (00:36):
undefined

Josh: We don't actually know or we didn't know who the winner was up until recently. (00:37):
undefined

Josh: In fact, it won all four of the trading competitions in (00:42):
undefined

Josh: this new season while leaving the other top models like ChatGPT (00:45):
undefined

Josh: 5.1 and Google Shemini 3.0 fighting for (00:49):
undefined

Josh: second place so at the core of this is one who is (00:52):
undefined

Josh: this model and two how on earth did they (00:55):
undefined

Josh: do it how are they outperforming everyone so much so as to make 65 percent in (00:58):
undefined

Josh: two weeks in one of these competitions so ijaz i want to walk through everyone (01:02):
undefined

Josh: about what what just happened what the model is and what alpha arena is so give (01:07):
undefined

Josh: us the lowdown on on who this was that made so much money oh (01:11):
undefined

Ejaaz: Yeah well we will get into all of that uh today (01:16):
undefined

Ejaaz: So Alpha Arena is basically a competition or test to see how well AI models can trade. (01:19):
undefined

Ejaaz: And they do this in a few different ways, Josh. Number one, they give each model (01:28):
undefined

Ejaaz: $10,000, as you mentioned. (01:32):
undefined

Ejaaz: And then they allow them to trade a range of different financial instruments (01:34):
undefined

Ejaaz: over a period of two weeks. (01:37):
undefined

Ejaaz: So there's like a season, two weeks, and we see which AI models do the best. (01:39):
undefined

Ejaaz: And they get all your AI models in there. You've got ChatGPT, (01:43):
undefined

Ejaaz: you have got Gemini, you've got Anthropics Claude, and you have Grok as well. (01:47):
undefined

Ejaaz: And so they've gone through about two seasons now, and the results have been (01:51):
undefined

Ejaaz: absolutely crazy. So they started off with season one. (01:55):
undefined

Ejaaz: And you can think of this as like the degen crypto season. (01:59):
undefined

Ejaaz: They gave seven models, $10,000 each, and allowed them to trade crypto assets (02:02):
undefined

Ejaaz: like Bitcoin, Ethereum, stuff like that. (02:07):
undefined

Ejaaz: And they did this in something called Perpetual, so they could leverage trade. (02:10):
undefined

Ejaaz: It was the only instrument that they were allowed to do this. (02:14):
undefined

Ejaaz: And the results were, as you'd probably expect, a lot of these AI models lost a lot of money. (02:17):
undefined

Ejaaz: Some of them actually ended up making a decent chunk of money, (02:23):
undefined

Ejaaz: and they were primarily Chinese models. (02:26):
undefined

Ejaaz: They were Quen, and I think it was DeepSeat that ended up making money. (02:28):
undefined

Ejaaz: So there was a lot of takeaways there. As you mentioned, we've got a previous (02:31):
undefined

Ejaaz: episode where we spoke about this. (02:34):
undefined

Ejaaz: Definitely go give that a watch. There's a lot of alpha in that one. (02:35):
undefined

Ejaaz: And then that brings us to season 1.5, where the AI models, instead of being (02:39):
undefined

Ejaaz: given crypto to trade, were given the ability to trade U.S. stocks. (02:45):
undefined

Ejaaz: And we're talking about equities, which is something that a lot of us listening (02:50):
undefined

Ejaaz: to this show are very familiar with. And I think this is for a few reasons, Josh. (02:53):
undefined

Ejaaz: Primarily, crypto is very volatile, and we kind of want to figure out how the (02:57):
undefined

Ejaaz: majority of money that is traded in the financial markets can translate into (03:01):
undefined

Ejaaz: AI models trading that. So a few things that they kept the same is that they (03:05):
undefined

Ejaaz: gave the AI model $10,000. (03:09):
undefined

Ejaaz: But there was a number of differences with Season 1.5. Number one, (03:11):
undefined

Ejaaz: they were allowed to trade US equities and stocks. (03:15):
undefined

Ejaaz: Number two, there were two new models that were introduced. (03:17):
undefined

Ejaaz: One was a model called Kimi K2, which is a really good open source Chinese model. (03:21):
undefined

Ejaaz: But the other was this thing called a mystery model. (03:26):
undefined

Ejaaz: I'm going to reveal which this model was in a second. But before I do, (03:29):
undefined

Ejaaz: do you have any guesses as to what model this might have been? (03:34):
undefined

Josh: Well, I cheated. I know the answer. But what I think is very exciting about (03:38):
undefined

Josh: this is that like, I think it's important to highlight these models made hundreds (03:42):
undefined

Josh: to even thousands of trades per model. Yes. (03:45):
undefined

Josh: And what we want to answer, like the question that I want more than this mystery (03:48):
undefined

Josh: model is like, is this real signal or is this just, I mean, Luke said earlier, (03:51):
undefined

Josh: is this a GPU intensive scratch off game? (03:56):
undefined

Josh: Where is there any real signal? and like I guess we'll talk about the reality (03:58):
undefined

Josh: of that and what this means for your portfolio if you ever want to manage it but to me (04:02):
undefined

Josh: I think that's the important thing to highlight. We probably should just spill (04:06):
undefined

Josh: the beans, EJ. Do you want to just tell them? Who's Mr. (04:09):
undefined

Ejaaz: Model? I can't keep it in any longer. It was an unofficial version of Grok, (04:11):
undefined

Ejaaz: aptly named Grok 4.2 or 4.20 for the memers out there. (04:18):
undefined

Ejaaz: And this was revealed by none other than the Grok man himself, Elon Musk. (04:23):
undefined

Ejaaz: And the reason why this mystery model was getting so much attention, (04:28):
undefined

Ejaaz: Josh, was because it ended up being the winner. It made the most money out of any other AI models. (04:32):
undefined

Ejaaz: And what was more impressive is there wasn't just one competition being run throughout season 1.5. (04:41):
undefined

Ejaaz: There were four at the same time. So these AI models were running across four (04:47):
undefined

Ejaaz: different competitions at the same time. That was $320,000. (04:54):
undefined

Ejaaz: At any one instance, which is a crazy amount of financial money to stake on (04:58):
undefined

Ejaaz: an experiment. That's a lot of money could have been lost here. (05:03):
undefined

Ejaaz: And Grok 4.20 ended up performing the best. (05:06):
undefined

Ejaaz: Josh, I want to go through a few different stats here, which kind of like shows (05:09):
undefined

Ejaaz: how amazing this particular model was. (05:14):
undefined

Ejaaz: So firstly, for some context, there were four different competitions that were (05:18):
undefined

Ejaaz: being run that these AR models were being tested on. (05:21):
undefined

Ejaaz: Competition number one was something called new baseline. This is basically (05:24):
undefined

Ejaaz: the ability for these AI models to get access to. (05:28):
undefined

Ejaaz: Trading AI stocks, to get access to all the common news that you and I can read (05:32):
undefined

Ejaaz: online and in newspapers to kind of like figure out, okay, what kind of news (05:37):
undefined

Ejaaz: would affect my stock positions. (05:42):
undefined

Ejaaz: They would also get access to sentiment data to see how kind of like the markets (05:43):
undefined

Ejaaz: and retail traders would kind of react to certain bits of news. (05:47):
undefined

Ejaaz: They had access to a much wider spread amount of data in competition number one. (05:50):
undefined

Ejaaz: Competition number two was called Monk Mode. They kind of amended the investing (05:56):
undefined

Ejaaz: prompt here. And so kind of like they traded more conservatively. (06:00):
undefined

Ejaaz: Competition number three was called Situational Awareness, Josh. (06:04):
undefined

Ejaaz: So each model had an awareness of other models trading and where they ranked in accordance to them. (06:07):
undefined

Ejaaz: So there was this kind of like ecosystem of peer pressure being put on by each model. (06:13):
undefined

Ejaaz: And competition number four was just outright degeneracy max (06:17):
undefined

Ejaaz: leverage you could only trade with like 20 to 50x leverage which is just kind (06:21):
undefined

Ejaaz: of i don't think it's 50x but like 30x uh just crazy amount of risk um adjustment (06:26):
undefined

Ejaaz: to test whether a model would take that risk or whether it would trade more (06:30):
undefined

Ejaaz: conservatively josh did you have any reactions on the the results of this of this competition the. (06:35):
undefined

Josh: Results that we're looking at right now actually i found most interesting this (06:41):
undefined

Josh: is from the new baseline competition it's It's basically the full info mode. (06:44):
undefined

Josh: And one of the big differences between this mode versus previous competitions (06:47):
undefined

Josh: that have been held is like you mentioned earlier, it has access to a lot of data. (06:51):
undefined

Josh: This is the first time an AI trading model has had access to real time information (06:55):
undefined

Josh: outside of just looking at a chart. So I think. (06:59):
undefined

Josh: In that sense, this is the closest competition to how a human quant fund would actually operate. (07:02):
undefined

Josh: So if you're looking for high signal in terms of which AI can actually make (07:07):
undefined

Josh: you real money in the real world, this is the one. (07:11):
undefined

Josh: And what we're seeing here is that the Grok 4.20 model, the memetic mystery (07:13):
undefined

Josh: model, outperformed by a fairly large margin to OpenAI and ChatGPT 5.1, (07:17):
undefined

Josh: which is the clear second place. (07:23):
undefined

Josh: And those are the only two that actually made profit. everybody else lost money (07:24):
undefined

Josh: in the real world competition which to me signals a few things one of them being (07:28):
undefined

Josh: well perhaps one is really good at (07:33):
undefined

Josh: understanding real world information perhaps it understands company fundamentals (07:36):
undefined

Josh: better perhaps it just has access to real (07:40):
undefined

Josh: world information that's better like grok and having access to the x ai model (07:43):
undefined

Josh: um so there's a lot of things to speculate here but for me the new baseline (07:47):
undefined

Josh: chart that we're looking at right now was the high signal one i'm like oh my (07:51):
undefined

Josh: god wait this has the same type of information flows that i'm now getting so (07:54):
undefined

Josh: now we're even we're on the same playing field okay (07:58):
undefined

Ejaaz: Um i actually had a different answer to that which is i was more impressed, (08:01):
undefined

Ejaaz: Josh, by the situational awareness competition. (08:06):
undefined

Ejaaz: So this was a competition where each model had access to data and news, (08:10):
undefined

Ejaaz: but they also had awareness of who they were competing against. (08:15):
undefined

Ejaaz: So Grok 4.20, the winner, knew that GPT-5 was in second place. (08:19):
undefined

Ejaaz: And so he was always keeping an eye on GPT-5, being like, oh, (08:25):
undefined

Ejaaz: what trades is GPT-5 making? Why did they make that trade? Oh, that's interesting. (08:28):
undefined

Ejaaz: And then he would look at Gemini and be like, oh, what trades are Gemini making. (08:32):
undefined

Ejaaz: So he would have this awareness of his competitors, which you didn't have in (08:35):
undefined

Ejaaz: season one, where they were just kind of like trading in silos, right? (08:39):
undefined

Ejaaz: And why this competition was so interesting, Josh, is this was technically where (08:42):
undefined

Ejaaz: Grok 4.20 made the most money. (08:47):
undefined

Ejaaz: In fact, if you look at the top of this leaderboard right here, (08:50):
undefined

Ejaaz: the account value at the end of season 1.5 was $16,656, (08:53):
undefined

Ejaaz: which is technically a 60% plus return in two weeks on $10,000 worth of capital. (09:00):
undefined

Josh: I needed to take my money immediately. (09:09):
undefined

Ejaaz: Isn't that insane, right? Like if you had to pick a competition of where you (09:12):
undefined

Ejaaz: would have given an AI model money, just given from this data, (09:15):
undefined

Ejaaz: and I'm not saying you should do that, you would be most bullish on situational awareness. (09:19):
undefined

Ejaaz: And I'm going to make some implications here that I haven't tested yet, (09:24):
undefined

Ejaaz: but it seems to imply that (09:29):
undefined

Ejaaz: this kind of competitive nature where the models were kind of aware and exposed (09:31):
undefined

Ejaaz: to their competitors' trades and thinking, and we're going to get to the model (09:35):
undefined

Ejaaz: chat thinking in a second, seems (09:40):
undefined

Ejaaz: to have given them a better trading advantage, at least in some cases. (09:41):
undefined

Josh: Yeah, so like you mentioned, one of my favorite parts, I think we share this (09:45):
undefined

Josh: in one of our favorite parts about this competition in particular, (09:48):
undefined

Josh: is that you can actually see all of the trades. (09:51):
undefined

Josh: One thing about these private quant funds, you don't know what the hell is going on. (09:53):
undefined

Josh: But with So these models, you can see exactly what they're thinking every time (09:56):
undefined

Josh: they think and make a decision. (10:01):
undefined

Josh: So maybe you guys can go through a few of them and see kind of what the model (10:02):
undefined

Josh: is thinking, how they're processing this real world data. (10:06):
undefined

Josh: And if there's any tips for us to learn from processing this real world data, (10:08):
undefined

Josh: because clearly they're a much better trader than I am. (10:12):
undefined

Ejaaz: Yeah. So I have a few examples pulled up here on the right side of the screen. (10:14):
undefined

Ejaaz: It's under model chat. By the way, any of you listening to this can go onto (10:19):
undefined

Ejaaz: this website and see for yourself and scroll through their hundreds and hundreds of posts. (10:23):
undefined

Ejaaz: But it basically gives us an insight into how each model thinks about a trade (10:26):
undefined

Ejaaz: that they currently either have open or they're thinking about opening or closing (10:31):
undefined

Ejaaz: or whatever that might be, right? (10:35):
undefined

Ejaaz: So it's like being in the mind of an actual investor and figuring out how they make their decisions. (10:36):
undefined

Ejaaz: An example here at the top of the screen is Gemini 3 Pro. (10:42):
undefined

Ejaaz: He goes, I'm betting on a breakout in NVIDIA, seeing a strong setup as it holds (10:47):
undefined

Ejaaz: support and leading the market with a target of $189 and a stop just below $180. (10:51):
undefined

Ejaaz: So what he's referring to there is kind of a typical quant style of trading (10:57):
undefined

Ejaaz: where it's kind of like he's looking at technicals, he's evaluating kind of (11:02):
undefined

Ejaaz: graphs, momentum of the stock price. (11:05):
undefined

Ejaaz: It's very price evaluated type of trading, right, Josh? But if you look just (11:07):
undefined

Ejaaz: below it, you've got GPT 5.1, which actually came in second at the end of this (11:11):
undefined

Ejaaz: competition, who goes, my analysis indicates continued strength in AI names (11:16):
undefined

Ejaaz: like NVIDIA and Microsoft. (11:20):
undefined

Ejaaz: So I'm holding out on existing long positions over the weekend and potential macro event risk. (11:22):
undefined

Ejaaz: Now, the point I want to make about this particular model is it's less price (11:28):
undefined

Ejaaz: specific and it's more focused on just kind of general themes, (11:32):
undefined

Ejaaz: news and data that it's seeing outside of price. (11:37):
undefined

Ejaaz: And that really goes to demonstrate that some of these models are very kind (11:41):
undefined

Ejaaz: of price and quantitative focused, whereas other models are kind of more thesis (11:45):
undefined

Ejaaz: driven over a shorter period of time. (11:49):
undefined

Ejaaz: And it kind of gives rise to these types of personalities, right, Josh? (11:51):
undefined

Josh: Yeah, well, now we have to answer the uncomfortable question is like, (11:55):
undefined

Josh: is this evidence that Grok is some kind of money printing god? (11:58):
undefined

Josh: Or is this just like really well produced content that happens to involve real money? (12:01):
undefined

Josh: And that kind of comes down to understanding the AI, understand the personalities, (12:05):
undefined

Josh: understanding how each model considers these trades and how they place themselves (12:10):
undefined

Josh: in different positions. (12:15):
undefined

Josh: So I kind of want to go through one by one, all of the models and kind of what (12:17):
undefined

Josh: their personalities are like. (12:20):
undefined

Josh: We see with DeepSeek a lot that it behaves, and we mentioned on a previous episode (12:22):
undefined

Josh: as well, it behaves like a very disciplined quant fund. (12:26):
undefined

Josh: And DeepSeek, for those that don't know, it's an open source Chinese model. (12:29):
undefined

Josh: They are very systematic, very mathematic, very comfortable with leverage, (12:32):
undefined

Josh: but able to hedge and adjust mid-trade based on its decisions and new information. (12:36):
undefined

Josh: So DeepSeek and Quen even is kind of similar to this. (12:41):
undefined

Josh: If you remember from the last episode, Ejaz, Quen was my early favorite. (12:45):
undefined

Josh: I had hoped that Quen was going to win. (12:48):
undefined

Josh: Unfortunately, that's not the case at all in season 1.5. Quen has gotten crushed (12:51):
undefined

Josh: right there with DeepSeek. (12:55):
undefined

Josh: I can kind of imagine it as like more similar to me, maybe that's why I resonated (12:56):
undefined

Josh: with it, where it has one big thesis and then it sizes aggressively around that thesis. (13:00):
undefined

Josh: So if you remember, Quen would only buy Bitcoin or Ether in the last one and (13:04):
undefined

Josh: it wouldn't buy any other altcoins. (13:07):
undefined

Josh: It just had a thesis that these major coins were going up, nothing else was. (13:09):
undefined

Josh: Claude is interesting. It's very (13:12):
undefined

Josh: reflective of how the actual Claude model works when you engage with it. (13:14):
undefined

Josh: It's very patient and it's thoughtful, but it occasionally sizes up too much (13:17):
undefined

Josh: and then it gets crushed by leverage. (13:21):
undefined

Josh: So, and like, as we go through these, and EJs, I also noticed you assigned a masculine... (13:24):
undefined

Josh: Personality to gemini you said he when you were talking about google gemini (13:29):
undefined

Josh: and that's kind of because it's it's daddy right like gemini's been (13:32):
undefined

Josh: the big boy on top but but in (13:35):
undefined

Josh: this training competition i don't know if it is i was going through the trades (13:39):
undefined

Josh: and it very much panic flip flops from shorts to long after losing and it kind (13:42):
undefined

Josh: of in a way gemini was most reflective of retail behavior because and i'm not (13:46):
undefined

Josh: sure what we could tie that to but gemini was very reactionary where if it lost (13:51):
undefined

Josh: money it would flip its position and if it gained money it would it would kind of hedge quickly. (13:55):
undefined

Josh: So that was interesting. And then we have GPT-5, which is very sophisticated reasoning. (13:58):
undefined

Josh: But in season one, they over-traded and over-leveraged and got absolutely wiped (14:03):
undefined

Josh: out. And they were very timid in their way that they went about this. (14:06):
undefined

Josh: So that's kind of how you can think about these. (14:09):
undefined

Josh: The final one, which is the secret model, Grok 4.2. If we know anything about (14:11):
undefined

Josh: Grok, we know that it is a very high risk taker, but a calculated risk taker. (14:15):
undefined

Josh: And that's probably what put it at the top there. So that's kind of how I would (14:19):
undefined

Josh: consider all of these models. (14:22):
undefined

Josh: They're a little different and they are reflective of, if you've used these (14:23):
undefined

Josh: in person, you could kind of understand the thinking that gets placed behind the trades (14:26):
undefined

Ejaaz: Yeah i i want to dig into a few (14:30):
undefined

Ejaaz: things around the the personality or rather the trading styles here josh because (14:33):
undefined

Ejaaz: um it may not be as explicit as we kind of lay it out like so grok 4 4.20 was (14:37):
undefined

Ejaaz: the winner right by far and it made money uh it was the top across all of the (14:44):
undefined

Ejaaz: competitions all four competitions that's great but did you look at the results of grok 4, (14:48):
undefined

Ejaaz: its predecessor. (14:53):
undefined

Josh: It was absolutely crushed. (14:55):
undefined

Ejaaz: It was the worst performing model in this entire competition, (14:57):
undefined

Ejaaz: which is crazy because in season one, where it was trading crypto, (15:01):
undefined

Ejaaz: it came in at second or third. (15:06):
undefined

Ejaaz: And for about 75% of the competition, Josh, it was number one. (15:08):
undefined

Ejaaz: So it had some kind of an advantage trading kind of very riskily, right? (15:13):
undefined

Ejaaz: And that might be because of the nature of the instruments that it was trading. (15:19):
undefined

Ejaaz: Crypto is very volatile and it was kind of going blase. (15:22):
undefined

Ejaaz: So when it was like 20x bullish Bitcoin, it benefited a lot when Bitcoin price (15:24):
undefined

Ejaaz: went up, but obviously it like suffered when it went down. (15:29):
undefined

Ejaaz: It's interesting to see the discourse between these two models and 1.5, right? (15:32):
undefined

Ejaaz: Grok 4.20, the winner, seems to be a kind of more mature version of Grok 4. (15:38):
undefined

Ejaaz: It seems to be thinking more about its trades. (15:44):
undefined

Ejaaz: It has more kind of like risk percentiles and boundaries in place, (15:47):
undefined

Ejaaz: whereas Grokforce seems to be its kind of usual degenerate self. (15:52):
undefined

Ejaaz: And I don't know how much of that is reliant on the fact that it's trading stocks, (15:55):
undefined

Ejaaz: which is generally a less volatile market versus Grok 4.20 being a more thesis (15:59):
undefined

Ejaaz: driven, sensible trader, as you kind of described. (16:04):
undefined

Ejaaz: The other one that we have to call out because it's the elephant in the room (16:07):
undefined

Ejaaz: here, GPT-5 came in at second in season 1.5. (16:10):
undefined

Josh: Right? 5.1. 5.1. (16:15):
undefined

Ejaaz: Sorry, 5.1, right? In the previous season, season one, it was the second worst (16:17):
undefined

Ejaaz: performing. No, sorry, it was the worst performing. (16:24):
undefined

Josh: It was horrible. (16:26):
undefined

Ejaaz: It was GPT-5. (16:28):
undefined

Josh: It was an abomination. (16:30):
undefined

Ejaaz: And Gemini. So whatever OpenAI has cooked up in the .1, congrats. (16:30):
undefined

Ejaaz: Because you must have traded on some kind of financial data or you've you've (16:36):
undefined

Ejaaz: like kind of like implemented a kind of like risk trading strategy that made (16:40):
undefined

Ejaaz: it a lot more sensible because it made some really great trades on this season (16:44):
undefined

Ejaaz: so just two different kind of like jumps from season one to 1.5 that i i had to call out. (16:48):
undefined

Josh: Yeah it makes me excited to see the improvements in these like (16:53):
undefined

Josh: significant improvements with incremental models because we (16:56):
undefined

Josh: normally talk about 5 to 5.1 being pretty marginal like (16:59):
undefined

Josh: there's nothing really noteworthy or exciting and yet the results in the (17:02):
undefined

Josh: small sample size at least are pretty reassuring that hey there is something (17:05):
undefined

Josh: new going under the hood and maybe this is an appropriate time to address the (17:09):
undefined

Josh: i guess the the limitations the kind of bare case of this starting with the (17:13):
undefined

Josh: sample size um we do have to say i mean this is two weeks ejs this is not a long time um they they (17:18):
undefined

Josh: placed some trades. Some people maybe got lucky. Some models maybe did not. (17:25):
undefined

Josh: Is there any real signal here? (17:29):
undefined

Josh: I'm curious, your take, do you think this is reflective of future performance? (17:31):
undefined

Josh: Like, is there what is here that's actually valuable versus what is here is actually kind of lucky? (17:35):
undefined

Ejaaz: I don't think we have enough information to make that call, at least for me. (17:40):
undefined

Ejaaz: I'll speak for myself personally. (17:45):
undefined

Ejaaz: The real test is, you know, I asked myself before we recorded this episode, (17:47):
undefined

Ejaaz: would i give my money to grok 4.2 or the winner that one across all categories (17:51):
undefined

Ejaaz: and the simple answer is like no like i don't i don't know if it's going to (17:55):
undefined

Ejaaz: repeat that over week three week four week five it was only two weeks to your (17:59):
undefined

Ejaaz: point right so i want to see this experiment kind of, (18:03):
undefined

Ejaaz: rehash like a million times before i'm like okay that's cool um even then it's (18:06):
undefined

Ejaaz: it's still kind of like risky right it's like i i can justify giving my money (18:10):
undefined

Ejaaz: to a human that i can kind of relate to that I can call up in speed to, (18:14):
undefined

Ejaaz: less so when it comes to an AI model, right? But maybe that's my thing, (18:18):
undefined

Ejaaz: it needs to kind of evolve. (18:22):
undefined

Ejaaz: The other way I'm thinking about this is there's just a lot of unknowns around this, Josh, right? (18:24):
undefined

Ejaaz: Like I can see it's thinking, I can see kind of like how the model kind of completes its trades, (18:31):
undefined

Ejaaz: I don't really know what's going under the hood. Is this just kind of like a (18:37):
undefined

Ejaaz: pattern matching thing? (18:41):
undefined

Ejaaz: Does it inherit the risks that a lot of humans have already done? (18:42):
undefined

Ejaaz: Because it's trained on the same kind of corpus of trading data that we have (18:45):
undefined

Ejaaz: kind of evaluated on? Or is it kind of net better? (18:49):
undefined

Ejaaz: Do you feel the same or? (18:52):
undefined

Josh: Yeah, it's probably, I mean, it's not the new gold standard of AI benchmarks. (18:54):
undefined

Josh: But it is a standard that I think is interesting. Because this is a benchmark (18:59):
undefined

Josh: that happens in the real world with real dynamic data that cannot be game. (19:03):
undefined

Josh: So in that case, I love it. (19:07):
undefined

Josh: But I saw one writer, they called it Schrodinger's Benchmark, (19:08):
undefined

Josh: because it's simultaneously serious and degenerate at the same time. (19:12):
undefined

Josh: And it's like it's entertainment with real money that happens to produce some (19:16):
undefined

Josh: legitimate insights about AI behavior, but it's not really indicative of future (19:20):
undefined

Josh: returns at the small of a sample size, at least. (19:24):
undefined

Josh: And that's kind of where I feel about it. there is one breakthrough that we (19:27):
undefined

Josh: mentioned earlier that does provide real value, which is the transparency. (19:30):
undefined

Josh: Every trade being on chain and every step reason being logged is actually really (19:35):
undefined

Josh: helpful to understanding how these models think and how you can consider thinking. (19:39):
undefined

Josh: So for example, you could show me every decision Grok 4.20 made on Tesla after (19:43):
undefined

Josh: the Fed announcement or something like that. And it'll walk you through a chain of thought. (19:47):
undefined

Josh: And if anything, make you into a better investor. (19:50):
undefined

Josh: Would I trust the model of my own money? No. (19:54):
undefined

Josh: Maybe a little bit maybe with a small sample size how (19:57):
undefined

Josh: much it is that's a (20:00):
undefined

Josh: good question i'd give it a couple thousand dollars to play around with and see (20:03):
undefined

Josh: what happens i think that that would be interesting and fun and it's it's (20:06):
undefined

Josh: low enough stakes but i would trust it enough to not lose it like i'd say i (20:09):
undefined

Josh: would probably trust grok more with my money than i would the average day trader (20:14):
undefined

Josh: off the street um which granted they don't have a very good reputation but i (20:20):
undefined

Josh: think there is some sort of an edge there that doesn't exist in the average person. (20:23):
undefined

Josh: And if you assume that these models are going to continue to get better and (20:28):
undefined

Josh: better, well, you have to assume that they're going to form some sort of an (20:31):
undefined

Josh: edge, but I don't know how much. (20:35):
undefined

Josh: It's an interesting question because as a quant trading fund too, (20:37):
undefined

Josh: if your job or as just a trader in general, if your job is to make money off (20:41):
undefined

Josh: of trading, what are you doing about this information? Are you leaning into AI? (20:44):
undefined

Josh: Are you trying to get these models to help you with your information flows and make decisions? (20:48):
undefined

Josh: Are you using them to help you actually transact trades or are you just kind (20:53):
undefined

Josh: of looking the other way and saying oh this is just a dumb experiment to benchmark (20:56):
undefined

Josh: models there's no actual signal here and the answer is probably somewhere in the middle right yeah (21:00):
undefined

Ejaaz: I mean well my initial reaction to that is um, (21:04):
undefined

Ejaaz: Okay, quant funds already use algorithms. It would make a lot of sense if they (21:07):
undefined

Ejaaz: started using AI algorithms, right? (21:13):
undefined

Ejaaz: If you could get a smarter algorithm to trade for your fund, absolutely, right? (21:15):
undefined

Ejaaz: So it's a no-brainer to me that these hedge funds, quant funds are going to (21:18):
undefined

Ejaaz: be using AI, probably already using AI. (21:21):
undefined

Ejaaz: Where I have maybe a hot take is that the transparency is just a nice to have. (21:24):
undefined

Ejaaz: It is no way going to win in the best of models. (21:29):
undefined

Ejaaz: Why? Because if you have an AI model that is like better than all the other (21:32):
undefined

Ejaaz: AI models at trading, why would you make that public? (21:36):
undefined

Ejaaz: Right. So like, I'm kind of like at ties between this thing, (21:39):
undefined

Ejaaz: because I think the transparency is a really good thing in kind of like bringing (21:43):
undefined

Ejaaz: up the floor of trading credibility for people that get access to this type of information. (21:46):
undefined

Ejaaz: Like I have loved reading through these kind of like trade logs here, (21:52):
undefined

Ejaaz: seeing how each model thinks and being like, okay, yeah, wow. (21:55):
undefined

Ejaaz: I actually didn't think about that myself when I was buying that stock. (21:58):
undefined

Ejaaz: Right. And these are like stocks that I've seen that I, that I can buy, (22:01):
undefined

Ejaaz: right. The Amazon trade, the NVIDIA trade, I'm just like, oh, okay. (22:04):
undefined

Ejaaz: I didn't think about that, right, yesterday whenever they made this trade. (22:06):
undefined

Ejaaz: If I am a hedge fund, I'm like, yeah, if I've fine-tuned a model that is like (22:10):
undefined

Ejaaz: beating all these models, I don't really want to expose that really. (22:15):
undefined

Ejaaz: So it's kind of like a push and pull. (22:18):
undefined

Ejaaz: The other thought I had, Josh, is, and maybe this is kind of like kind of semi-adjacent (22:20):
undefined

Ejaaz: to what we're discussing here. (22:26):
undefined

Ejaaz: I couldn't get the thought out of my head that if you could get Grok in X, (22:28):
undefined

Ejaaz: trading some kind of money for you or guaranteeing you like a 5% to 10% annual (22:33):
undefined

Ejaaz: return, that is something that I would like if framed correctly, (22:37):
undefined

Ejaaz: I would put some money into, right? (22:41):
undefined

Ejaaz: Maybe not over two weeks, but (22:43):
undefined

Ejaaz: maybe over an adjusted kind of yearly period would be super cool to see. (22:45):
undefined

Josh: Yeah, that's such a, it's such a fun question to ask is like, (22:49):
undefined

Josh: what happens when this kind of system runs for two years, but with your, (22:51):
undefined

Josh: like, let's say it's a large pension management fund and they just want a manager (22:56):
undefined

Josh: that doesn't take fees and does a pretty good job. (22:59):
undefined

Josh: Like, is there going to be enough trust in these systems to reliably place money at scale with them? (23:02):
undefined

Josh: And And you have to assume, given the signal this early on, that the answer will be yes. (23:07):
undefined

Josh: The question is, how much of a yes will it be? (23:12):
undefined

Josh: What percentage of management will be AI as it gets better over time? (23:15):
undefined

Josh: And the sample size sucks. I wish it was more than two weeks. I wish it was two years. (23:20):
undefined

Josh: In two years from now, think about the progress we're going to see and what (23:23):
undefined

Josh: type of impact that's going to have on trading models. (23:26):
undefined

Josh: So this is, it's interesting. It's fascinating. (23:29):
undefined

Josh: In fact, I'm really curious to actually run this experiment for ourselves. (23:32):
undefined

Josh: I'd love to try to come up with a little trading model that runs these things (23:35):
undefined

Josh: and test it out because it's fun and there is some sort of an edge there. (23:38):
undefined

Ejaaz: I would say, okay, if I were to summarize my lesson from this entire competition (23:42):
undefined

Ejaaz: or experiment so far, Josh, (23:48):
undefined

Ejaaz: it is I'm not convinced to give AI models money to trade, but I'm convinced (23:50):
undefined

Ejaaz: to use AI models to help me trade. (23:56):
undefined

Ejaaz: So kind of like a human and AI model kind of work together and kind of become (23:59):
undefined

Ejaaz: a better trader overall, I think is the main takeaway for me here. Do you share the same? (24:04):
undefined

Josh: It's funny. I mean, this is how agents work today, right? (24:09):
undefined

Josh: Like if you go on ChatGPT and you say, go book me a reservation, (24:12):
undefined

Josh: it'll take you to the finish line. (24:16):
undefined

Josh: And then you as the human provide the final filter and approve or deny. (24:17):
undefined

Josh: And I think that's probably the happy middle ground while (24:20):
undefined

Josh: we still don't really trust these models too much is give me (24:23):
undefined

Josh: the thesis give me the trade i will either approve (24:27):
undefined

Josh: or deny and that's how the money gets managed so (24:30):
undefined

Josh: it's cool this is a great experiment i love that we got season 1.5 (24:33):
undefined

Josh: i mean it's fascinating even more fascinating is that we (24:36):
undefined

Josh: have an early look at grok 4.2 which by all (24:38):
undefined

Josh: means is the best trading model in the world where will (24:41):
undefined

Josh: it rank in the other benchmarks we will see we will be covering it as soon as (24:44):
undefined

Josh: it comes out but i guess that's that's really it for this episode on season (24:48):
undefined

Josh: 1.5 the question i want to leave everyone else with is i mean would you trust (24:51):
undefined

Josh: an ai with your part of the portfolio like how much money would you actually (24:55):
undefined

Josh: give to an ai currently grok 4.2 who just made (24:58):
undefined

Josh: 60% in two weeks in one of these trading competitions. Is that enough for you to risk your money? (25:02):
undefined

Josh: Or is it still just this dumb AI system that you don't really trust? (25:07):
undefined

Ejaaz: Well, if you're interested in this experiment, Josh and I were actually discussing, (25:12):
undefined

Ejaaz: about potentially giving you guys a tutorial on how to use an AI to trade money (25:17):
undefined

Ejaaz: for you and kind of like an experiment, this own end of one experiment, but our own. (25:24):
undefined

Ejaaz: But we want to get a little more signal from you guys. Let us know in the comments (25:29):
undefined

Ejaaz: whether this is something that you'd be interested in seeing. (25:32):
undefined

Ejaaz: And I have, Josh, I have a requirement for the listeners. (25:35):
undefined

Ejaaz: If we do want to put the tutorial out. Our last video that we did on AI trading (25:40):
undefined

Ejaaz: reached 100,000 views and 3,000 likes. (25:45):
undefined

Ejaaz: So I'm not going to ask for the 100,000 views, but I will ask for the likes. (25:49):
undefined

Ejaaz: If this video can get more than 3,000, if it gets 3,000 likes, (25:54):
undefined

Ejaaz: we will definitely put out that tutorial by the end of the year. (25:58):
undefined

Ejaaz: And we have a lot of thoughts around this, about how we're going to do it. (26:01):
undefined

Ejaaz: We're super excited to do it. So help us get there. (26:05):
undefined

Ejaaz: It is another week of really exciting news. Josh, I don't know if you saw the (26:08):
undefined

Ejaaz: rumors. Did you see the rumors about OpenAI? (26:12):
undefined

Josh: Tell me, fill me in. (26:14):
undefined

Ejaaz: About OpenAI releasing a potential new groundbreaking model? (26:15):
undefined

Josh: As a matter of fact, the Polymarket is showing that OpenAI is very favored to (26:18):
undefined

Josh: release the best model of the year. (26:22):
undefined

Josh: And last I checked, Gemini is the best model of the year. So that implies we're (26:25):
undefined

Josh: getting something big in the next few weeks. (26:28):
undefined

Ejaaz: I think we will. and like you said, the Polymarket is kind of like revealing (26:30):
undefined

Ejaaz: its hands so maybe there's some inside information coming out here. (26:35):
undefined

Ejaaz: So kind of stay tuned to Limitless. Put the notifications on, (26:37):
undefined

Ejaaz: guys and also subscribe if you want to get the latest videos. (26:41):
undefined

Ejaaz: We put out the best content out there. (26:44):
undefined

Ejaaz: It's unchallenged right now. Josh and I are sitting here unchallenged. (26:47):
undefined

Ejaaz: You have to like and subscribe if you want to get our content on your feed. (26:50):
undefined

Ejaaz: Thank you so, so much for listening. Again, let us know what you thought of (26:53):
undefined

Ejaaz: this episode in the comments. Get that like number up and we will see you on the next one. (26:56):
undefined
Advertise With Us

Popular Podcasts

Stuff You Should Know
My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder is a true crime comedy podcast hosted by Karen Kilgariff and Georgia Hardstark. Each week, Karen and Georgia share compelling true crimes and hometown stories from friends and listeners. Since MFM launched in January of 2016, Karen and Georgia have shared their lifelong interest in true crime and have covered stories of infamous serial killers like the Night Stalker, mysterious cold cases, captivating cults, incredible survivor stories and important events from history like the Tulsa race massacre of 1921. My Favorite Murder is part of the Exactly Right podcast network that provides a platform for bold, creative voices to bring to life provocative, entertaining and relatable stories for audiences everywhere. The Exactly Right roster of podcasts covers a variety of topics including historic true crime, comedic interviews and news, science, pop culture and more. Podcasts on the network include Buried Bones with Kate Winkler Dawson and Paul Holes, That's Messed Up: An SVU Podcast, This Podcast Will Kill You, Bananas and more.

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.