Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Josh:
Imagine this, you give eight of the world's most powerful AI models $10,000 (00:00):
undefined
Josh:
each and tell them, go trade real stocks. (00:04):
undefined
Josh:
No paper trading, but real money with real risk. And two weeks later, (00:06):
undefined
Josh:
most of them have lost a painful amount of cash, which I guess is kind of expected. (00:10):
undefined
Josh:
The kind of drawdowns that would get a human portfolio manager totally fired. (00:14):
undefined
Josh:
But now, they ran the same experiment again, except this time with much higher stakes. (00:17):
undefined
Josh:
There's $320,000 at stake. And we've talked about Alpha Arena before in a previous (00:22):
undefined
Josh:
episode, which I highly recommend checking out. (00:27):
undefined
Josh:
But now we have the new results from the new season, season 1.5. (00:29):
undefined
Josh:
And what was exciting is that there was a very clear and obvious winner, (00:33):
undefined
Josh:
but that winner was a mystery. (00:36):
undefined
Josh:
We don't actually know or we didn't know who the winner was up until recently. (00:37):
undefined
Josh:
In fact, it won all four of the trading competitions in (00:42):
undefined
Josh:
this new season while leaving the other top models like ChatGPT (00:45):
undefined
Josh:
5.1 and Google Shemini 3.0 fighting for (00:49):
undefined
Josh:
second place so at the core of this is one who is (00:52):
undefined
Josh:
this model and two how on earth did they (00:55):
undefined
Josh:
do it how are they outperforming everyone so much so as to make 65 percent in (00:58):
undefined
Josh:
two weeks in one of these competitions so ijaz i want to walk through everyone (01:02):
undefined
Josh:
about what what just happened what the model is and what alpha arena is so give (01:07):
undefined
Josh:
us the lowdown on on who this was that made so much money oh (01:11):
undefined
Ejaaz:
Yeah well we will get into all of that uh today (01:16):
undefined
Ejaaz:
So Alpha Arena is basically a competition or test to see how well AI models can trade. (01:19):
undefined
Ejaaz:
And they do this in a few different ways, Josh. Number one, they give each model (01:28):
undefined
Ejaaz:
$10,000, as you mentioned. (01:32):
undefined
Ejaaz:
And then they allow them to trade a range of different financial instruments (01:34):
undefined
Ejaaz:
over a period of two weeks. (01:37):
undefined
Ejaaz:
So there's like a season, two weeks, and we see which AI models do the best. (01:39):
undefined
Ejaaz:
And they get all your AI models in there. You've got ChatGPT, (01:43):
undefined
Ejaaz:
you have got Gemini, you've got Anthropics Claude, and you have Grok as well. (01:47):
undefined
Ejaaz:
And so they've gone through about two seasons now, and the results have been (01:51):
undefined
Ejaaz:
absolutely crazy. So they started off with season one. (01:55):
undefined
Ejaaz:
And you can think of this as like the degen crypto season. (01:59):
undefined
Ejaaz:
They gave seven models, $10,000 each, and allowed them to trade crypto assets (02:02):
undefined
Ejaaz:
like Bitcoin, Ethereum, stuff like that. (02:07):
undefined
Ejaaz:
And they did this in something called Perpetual, so they could leverage trade. (02:10):
undefined
Ejaaz:
It was the only instrument that they were allowed to do this. (02:14):
undefined
Ejaaz:
And the results were, as you'd probably expect, a lot of these AI models lost a lot of money. (02:17):
undefined
Ejaaz:
Some of them actually ended up making a decent chunk of money, (02:23):
undefined
Ejaaz:
and they were primarily Chinese models. (02:26):
undefined
Ejaaz:
They were Quen, and I think it was DeepSeat that ended up making money. (02:28):
undefined
Ejaaz:
So there was a lot of takeaways there. As you mentioned, we've got a previous (02:31):
undefined
Ejaaz:
episode where we spoke about this. (02:34):
undefined
Ejaaz:
Definitely go give that a watch. There's a lot of alpha in that one. (02:35):
undefined
Ejaaz:
And then that brings us to season 1.5, where the AI models, instead of being (02:39):
undefined
Ejaaz:
given crypto to trade, were given the ability to trade U.S. stocks. (02:45):
undefined
Ejaaz:
And we're talking about equities, which is something that a lot of us listening (02:50):
undefined
Ejaaz:
to this show are very familiar with. And I think this is for a few reasons, Josh. (02:53):
undefined
Ejaaz:
Primarily, crypto is very volatile, and we kind of want to figure out how the (02:57):
undefined
Ejaaz:
majority of money that is traded in the financial markets can translate into (03:01):
undefined
Ejaaz:
AI models trading that. So a few things that they kept the same is that they (03:05):
undefined
Ejaaz:
gave the AI model $10,000. (03:09):
undefined
Ejaaz:
But there was a number of differences with Season 1.5. Number one, (03:11):
undefined
Ejaaz:
they were allowed to trade US equities and stocks. (03:15):
undefined
Ejaaz:
Number two, there were two new models that were introduced. (03:17):
undefined
Ejaaz:
One was a model called Kimi K2, which is a really good open source Chinese model. (03:21):
undefined
Ejaaz:
But the other was this thing called a mystery model. (03:26):
undefined
Ejaaz:
I'm going to reveal which this model was in a second. But before I do, (03:29):
undefined
Ejaaz:
do you have any guesses as to what model this might have been? (03:34):
undefined
Josh:
Well, I cheated. I know the answer. But what I think is very exciting about (03:38):
undefined
Josh:
this is that like, I think it's important to highlight these models made hundreds (03:42):
undefined
Josh:
to even thousands of trades per model. Yes. (03:45):
undefined
Josh:
And what we want to answer, like the question that I want more than this mystery (03:48):
undefined
Josh:
model is like, is this real signal or is this just, I mean, Luke said earlier, (03:51):
undefined
Josh:
is this a GPU intensive scratch off game? (03:56):
undefined
Josh:
Where is there any real signal? and like I guess we'll talk about the reality (03:58):
undefined
Josh:
of that and what this means for your portfolio if you ever want to manage it but to me (04:02):
undefined
Josh:
I think that's the important thing to highlight. We probably should just spill (04:06):
undefined
Josh:
the beans, EJ. Do you want to just tell them? Who's Mr. (04:09):
undefined
Ejaaz:
Model? I can't keep it in any longer. It was an unofficial version of Grok, (04:11):
undefined
Ejaaz:
aptly named Grok 4.2 or 4.20 for the memers out there. (04:18):
undefined
Ejaaz:
And this was revealed by none other than the Grok man himself, Elon Musk. (04:23):
undefined
Ejaaz:
And the reason why this mystery model was getting so much attention, (04:28):
undefined
Ejaaz:
Josh, was because it ended up being the winner. It made the most money out of any other AI models. (04:32):
undefined
Ejaaz:
And what was more impressive is there wasn't just one competition being run throughout season 1.5. (04:41):
undefined
Ejaaz:
There were four at the same time. So these AI models were running across four (04:47):
undefined
Ejaaz:
different competitions at the same time. That was $320,000. (04:54):
undefined
Ejaaz:
At any one instance, which is a crazy amount of financial money to stake on (04:58):
undefined
Ejaaz:
an experiment. That's a lot of money could have been lost here. (05:03):
undefined
Ejaaz:
And Grok 4.20 ended up performing the best. (05:06):
undefined
Ejaaz:
Josh, I want to go through a few different stats here, which kind of like shows (05:09):
undefined
Ejaaz:
how amazing this particular model was. (05:14):
undefined
Ejaaz:
So firstly, for some context, there were four different competitions that were (05:18):
undefined
Ejaaz:
being run that these AR models were being tested on. (05:21):
undefined
Ejaaz:
Competition number one was something called new baseline. This is basically (05:24):
undefined
Ejaaz:
the ability for these AI models to get access to. (05:28):
undefined
Ejaaz:
Trading AI stocks, to get access to all the common news that you and I can read (05:32):
undefined
Ejaaz:
online and in newspapers to kind of like figure out, okay, what kind of news (05:37):
undefined
Ejaaz:
would affect my stock positions. (05:42):
undefined
Ejaaz:
They would also get access to sentiment data to see how kind of like the markets (05:43):
undefined
Ejaaz:
and retail traders would kind of react to certain bits of news. (05:47):
undefined
Ejaaz:
They had access to a much wider spread amount of data in competition number one. (05:50):
undefined
Ejaaz:
Competition number two was called Monk Mode. They kind of amended the investing (05:56):
undefined
Ejaaz:
prompt here. And so kind of like they traded more conservatively. (06:00):
undefined
Ejaaz:
Competition number three was called Situational Awareness, Josh. (06:04):
undefined
Ejaaz:
So each model had an awareness of other models trading and where they ranked in accordance to them. (06:07):
undefined
Ejaaz:
So there was this kind of like ecosystem of peer pressure being put on by each model. (06:13):
undefined
Ejaaz:
And competition number four was just outright degeneracy max (06:17):
undefined
Ejaaz:
leverage you could only trade with like 20 to 50x leverage which is just kind (06:21):
undefined
Ejaaz:
of i don't think it's 50x but like 30x uh just crazy amount of risk um adjustment (06:26):
undefined
Ejaaz:
to test whether a model would take that risk or whether it would trade more (06:30):
undefined
Ejaaz:
conservatively josh did you have any reactions on the the results of this of this competition the. (06:35):
undefined
Josh:
Results that we're looking at right now actually i found most interesting this (06:41):
undefined
Josh:
is from the new baseline competition it's It's basically the full info mode. (06:44):
undefined
Josh:
And one of the big differences between this mode versus previous competitions (06:47):
undefined
Josh:
that have been held is like you mentioned earlier, it has access to a lot of data. (06:51):
undefined
Josh:
This is the first time an AI trading model has had access to real time information (06:55):
undefined
Josh:
outside of just looking at a chart. So I think. (06:59):
undefined
Josh:
In that sense, this is the closest competition to how a human quant fund would actually operate. (07:02):
undefined
Josh:
So if you're looking for high signal in terms of which AI can actually make (07:07):
undefined
Josh:
you real money in the real world, this is the one. (07:11):
undefined
Josh:
And what we're seeing here is that the Grok 4.20 model, the memetic mystery (07:13):
undefined
Josh:
model, outperformed by a fairly large margin to OpenAI and ChatGPT 5.1, (07:17):
undefined
Josh:
which is the clear second place. (07:23):
undefined
Josh:
And those are the only two that actually made profit. everybody else lost money (07:24):
undefined
Josh:
in the real world competition which to me signals a few things one of them being (07:28):
undefined
Josh:
well perhaps one is really good at (07:33):
undefined
Josh:
understanding real world information perhaps it understands company fundamentals (07:36):
undefined
Josh:
better perhaps it just has access to real (07:40):
undefined
Josh:
world information that's better like grok and having access to the x ai model (07:43):
undefined
Josh:
um so there's a lot of things to speculate here but for me the new baseline (07:47):
undefined
Josh:
chart that we're looking at right now was the high signal one i'm like oh my (07:51):
undefined
Josh:
god wait this has the same type of information flows that i'm now getting so (07:54):
undefined
Josh:
now we're even we're on the same playing field okay (07:58):
undefined
Ejaaz:
Um i actually had a different answer to that which is i was more impressed, (08:01):
undefined
Ejaaz:
Josh, by the situational awareness competition. (08:06):
undefined
Ejaaz:
So this was a competition where each model had access to data and news, (08:10):
undefined
Ejaaz:
but they also had awareness of who they were competing against. (08:15):
undefined
Ejaaz:
So Grok 4.20, the winner, knew that GPT-5 was in second place. (08:19):
undefined
Ejaaz:
And so he was always keeping an eye on GPT-5, being like, oh, (08:25):
undefined
Ejaaz:
what trades is GPT-5 making? Why did they make that trade? Oh, that's interesting. (08:28):
undefined
Ejaaz:
And then he would look at Gemini and be like, oh, what trades are Gemini making. (08:32):
undefined
Ejaaz:
So he would have this awareness of his competitors, which you didn't have in (08:35):
undefined
Ejaaz:
season one, where they were just kind of like trading in silos, right? (08:39):
undefined
Ejaaz:
And why this competition was so interesting, Josh, is this was technically where (08:42):
undefined
Ejaaz:
Grok 4.20 made the most money. (08:47):
undefined
Ejaaz:
In fact, if you look at the top of this leaderboard right here, (08:50):
undefined
Ejaaz:
the account value at the end of season 1.5 was $16,656, (08:53):
undefined
Ejaaz:
which is technically a 60% plus return in two weeks on $10,000 worth of capital. (09:00):
undefined
Josh:
I needed to take my money immediately. (09:09):
undefined
Ejaaz:
Isn't that insane, right? Like if you had to pick a competition of where you (09:12):
undefined
Ejaaz:
would have given an AI model money, just given from this data, (09:15):
undefined
Ejaaz:
and I'm not saying you should do that, you would be most bullish on situational awareness. (09:19):
undefined
Ejaaz:
And I'm going to make some implications here that I haven't tested yet, (09:24):
undefined
Ejaaz:
but it seems to imply that (09:29):
undefined
Ejaaz:
this kind of competitive nature where the models were kind of aware and exposed (09:31):
undefined
Ejaaz:
to their competitors' trades and thinking, and we're going to get to the model (09:35):
undefined
Ejaaz:
chat thinking in a second, seems (09:40):
undefined
Ejaaz:
to have given them a better trading advantage, at least in some cases. (09:41):
undefined
Josh:
Yeah, so like you mentioned, one of my favorite parts, I think we share this (09:45):
undefined
Josh:
in one of our favorite parts about this competition in particular, (09:48):
undefined
Josh:
is that you can actually see all of the trades. (09:51):
undefined
Josh:
One thing about these private quant funds, you don't know what the hell is going on. (09:53):
undefined
Josh:
But with So these models, you can see exactly what they're thinking every time (09:56):
undefined
Josh:
they think and make a decision. (10:01):
undefined
Josh:
So maybe you guys can go through a few of them and see kind of what the model (10:02):
undefined
Josh:
is thinking, how they're processing this real world data. (10:06):
undefined
Josh:
And if there's any tips for us to learn from processing this real world data, (10:08):
undefined
Josh:
because clearly they're a much better trader than I am. (10:12):
undefined
Ejaaz:
Yeah. So I have a few examples pulled up here on the right side of the screen. (10:14):
undefined
Ejaaz:
It's under model chat. By the way, any of you listening to this can go onto (10:19):
undefined
Ejaaz:
this website and see for yourself and scroll through their hundreds and hundreds of posts. (10:23):
undefined
Ejaaz:
But it basically gives us an insight into how each model thinks about a trade (10:26):
undefined
Ejaaz:
that they currently either have open or they're thinking about opening or closing (10:31):
undefined
Ejaaz:
or whatever that might be, right? (10:35):
undefined
Ejaaz:
So it's like being in the mind of an actual investor and figuring out how they make their decisions. (10:36):
undefined
Ejaaz:
An example here at the top of the screen is Gemini 3 Pro. (10:42):
undefined
Ejaaz:
He goes, I'm betting on a breakout in NVIDIA, seeing a strong setup as it holds (10:47):
undefined
Ejaaz:
support and leading the market with a target of $189 and a stop just below $180. (10:51):
undefined
Ejaaz:
So what he's referring to there is kind of a typical quant style of trading (10:57):
undefined
Ejaaz:
where it's kind of like he's looking at technicals, he's evaluating kind of (11:02):
undefined
Ejaaz:
graphs, momentum of the stock price. (11:05):
undefined
Ejaaz:
It's very price evaluated type of trading, right, Josh? But if you look just (11:07):
undefined
Ejaaz:
below it, you've got GPT 5.1, which actually came in second at the end of this (11:11):
undefined
Ejaaz:
competition, who goes, my analysis indicates continued strength in AI names (11:16):
undefined
Ejaaz:
like NVIDIA and Microsoft. (11:20):
undefined
Ejaaz:
So I'm holding out on existing long positions over the weekend and potential macro event risk. (11:22):
undefined
Ejaaz:
Now, the point I want to make about this particular model is it's less price (11:28):
undefined
Ejaaz:
specific and it's more focused on just kind of general themes, (11:32):
undefined
Ejaaz:
news and data that it's seeing outside of price. (11:37):
undefined
Ejaaz:
And that really goes to demonstrate that some of these models are very kind (11:41):
undefined
Ejaaz:
of price and quantitative focused, whereas other models are kind of more thesis (11:45):
undefined
Ejaaz:
driven over a shorter period of time. (11:49):
undefined
Ejaaz:
And it kind of gives rise to these types of personalities, right, Josh? (11:51):
undefined
Josh:
Yeah, well, now we have to answer the uncomfortable question is like, (11:55):
undefined
Josh:
is this evidence that Grok is some kind of money printing god? (11:58):
undefined
Josh:
Or is this just like really well produced content that happens to involve real money? (12:01):
undefined
Josh:
And that kind of comes down to understanding the AI, understand the personalities, (12:05):
undefined
Josh:
understanding how each model considers these trades and how they place themselves (12:10):
undefined
Josh:
in different positions. (12:15):
undefined
Josh:
So I kind of want to go through one by one, all of the models and kind of what (12:17):
undefined
Josh:
their personalities are like. (12:20):
undefined
Josh:
We see with DeepSeek a lot that it behaves, and we mentioned on a previous episode (12:22):
undefined
Josh:
as well, it behaves like a very disciplined quant fund. (12:26):
undefined
Josh:
And DeepSeek, for those that don't know, it's an open source Chinese model. (12:29):
undefined
Josh:
They are very systematic, very mathematic, very comfortable with leverage, (12:32):
undefined
Josh:
but able to hedge and adjust mid-trade based on its decisions and new information. (12:36):
undefined
Josh:
So DeepSeek and Quen even is kind of similar to this. (12:41):
undefined
Josh:
If you remember from the last episode, Ejaz, Quen was my early favorite. (12:45):
undefined
Josh:
I had hoped that Quen was going to win. (12:48):
undefined
Josh:
Unfortunately, that's not the case at all in season 1.5. Quen has gotten crushed (12:51):
undefined
Josh:
right there with DeepSeek. (12:55):
undefined
Josh:
I can kind of imagine it as like more similar to me, maybe that's why I resonated (12:56):
undefined
Josh:
with it, where it has one big thesis and then it sizes aggressively around that thesis. (13:00):
undefined
Josh:
So if you remember, Quen would only buy Bitcoin or Ether in the last one and (13:04):
undefined
Josh:
it wouldn't buy any other altcoins. (13:07):
undefined
Josh:
It just had a thesis that these major coins were going up, nothing else was. (13:09):
undefined
Josh:
Claude is interesting. It's very (13:12):
undefined
Josh:
reflective of how the actual Claude model works when you engage with it. (13:14):
undefined
Josh:
It's very patient and it's thoughtful, but it occasionally sizes up too much (13:17):
undefined
Josh:
and then it gets crushed by leverage. (13:21):
undefined
Josh:
So, and like, as we go through these, and EJs, I also noticed you assigned a masculine... (13:24):
undefined
Josh:
Personality to gemini you said he when you were talking about google gemini (13:29):
undefined
Josh:
and that's kind of because it's it's daddy right like gemini's been (13:32):
undefined
Josh:
the big boy on top but but in (13:35):
undefined
Josh:
this training competition i don't know if it is i was going through the trades (13:39):
undefined
Josh:
and it very much panic flip flops from shorts to long after losing and it kind (13:42):
undefined
Josh:
of in a way gemini was most reflective of retail behavior because and i'm not (13:46):
undefined
Josh:
sure what we could tie that to but gemini was very reactionary where if it lost (13:51):
undefined
Josh:
money it would flip its position and if it gained money it would it would kind of hedge quickly. (13:55):
undefined
Josh:
So that was interesting. And then we have GPT-5, which is very sophisticated reasoning. (13:58):
undefined
Josh:
But in season one, they over-traded and over-leveraged and got absolutely wiped (14:03):
undefined
Josh:
out. And they were very timid in their way that they went about this. (14:06):
undefined
Josh:
So that's kind of how you can think about these. (14:09):
undefined
Josh:
The final one, which is the secret model, Grok 4.2. If we know anything about (14:11):
undefined
Josh:
Grok, we know that it is a very high risk taker, but a calculated risk taker. (14:15):
undefined
Josh:
And that's probably what put it at the top there. So that's kind of how I would (14:19):
undefined
Josh:
consider all of these models. (14:22):
undefined
Josh:
They're a little different and they are reflective of, if you've used these (14:23):
undefined
Josh:
in person, you could kind of understand the thinking that gets placed behind the trades (14:26):
undefined
Ejaaz:
Yeah i i want to dig into a few (14:30):
undefined
Ejaaz:
things around the the personality or rather the trading styles here josh because (14:33):
undefined
Ejaaz:
um it may not be as explicit as we kind of lay it out like so grok 4 4.20 was (14:37):
undefined
Ejaaz:
the winner right by far and it made money uh it was the top across all of the (14:44):
undefined
Ejaaz:
competitions all four competitions that's great but did you look at the results of grok 4, (14:48):
undefined
Ejaaz:
its predecessor. (14:53):
undefined
Josh:
It was absolutely crushed. (14:55):
undefined
Ejaaz:
It was the worst performing model in this entire competition, (14:57):
undefined
Ejaaz:
which is crazy because in season one, where it was trading crypto, (15:01):
undefined
Ejaaz:
it came in at second or third. (15:06):
undefined
Ejaaz:
And for about 75% of the competition, Josh, it was number one. (15:08):
undefined
Ejaaz:
So it had some kind of an advantage trading kind of very riskily, right? (15:13):
undefined
Ejaaz:
And that might be because of the nature of the instruments that it was trading. (15:19):
undefined
Ejaaz:
Crypto is very volatile and it was kind of going blase. (15:22):
undefined
Ejaaz:
So when it was like 20x bullish Bitcoin, it benefited a lot when Bitcoin price (15:24):
undefined
Ejaaz:
went up, but obviously it like suffered when it went down. (15:29):
undefined
Ejaaz:
It's interesting to see the discourse between these two models and 1.5, right? (15:32):
undefined
Ejaaz:
Grok 4.20, the winner, seems to be a kind of more mature version of Grok 4. (15:38):
undefined
Ejaaz:
It seems to be thinking more about its trades. (15:44):
undefined
Ejaaz:
It has more kind of like risk percentiles and boundaries in place, (15:47):
undefined
Ejaaz:
whereas Grokforce seems to be its kind of usual degenerate self. (15:52):
undefined
Ejaaz:
And I don't know how much of that is reliant on the fact that it's trading stocks, (15:55):
undefined
Ejaaz:
which is generally a less volatile market versus Grok 4.20 being a more thesis (15:59):
undefined
Ejaaz:
driven, sensible trader, as you kind of described. (16:04):
undefined
Ejaaz:
The other one that we have to call out because it's the elephant in the room (16:07):
undefined
Ejaaz:
here, GPT-5 came in at second in season 1.5. (16:10):
undefined
Josh:
Right? 5.1. 5.1. (16:15):
undefined
Ejaaz:
Sorry, 5.1, right? In the previous season, season one, it was the second worst (16:17):
undefined
Ejaaz:
performing. No, sorry, it was the worst performing. (16:24):
undefined
Josh:
It was horrible. (16:26):
undefined
Ejaaz:
It was GPT-5. (16:28):
undefined
Josh:
It was an abomination. (16:30):
undefined
Ejaaz:
And Gemini. So whatever OpenAI has cooked up in the .1, congrats. (16:30):
undefined
Ejaaz:
Because you must have traded on some kind of financial data or you've you've (16:36):
undefined
Ejaaz:
like kind of like implemented a kind of like risk trading strategy that made (16:40):
undefined
Ejaaz:
it a lot more sensible because it made some really great trades on this season (16:44):
undefined
Ejaaz:
so just two different kind of like jumps from season one to 1.5 that i i had to call out. (16:48):
undefined
Josh:
Yeah it makes me excited to see the improvements in these like (16:53):
undefined
Josh:
significant improvements with incremental models because we (16:56):
undefined
Josh:
normally talk about 5 to 5.1 being pretty marginal like (16:59):
undefined
Josh:
there's nothing really noteworthy or exciting and yet the results in the (17:02):
undefined
Josh:
small sample size at least are pretty reassuring that hey there is something (17:05):
undefined
Josh:
new going under the hood and maybe this is an appropriate time to address the (17:09):
undefined
Josh:
i guess the the limitations the kind of bare case of this starting with the (17:13):
undefined
Josh:
sample size um we do have to say i mean this is two weeks ejs this is not a long time um they they (17:18):
undefined
Josh:
placed some trades. Some people maybe got lucky. Some models maybe did not. (17:25):
undefined
Josh:
Is there any real signal here? (17:29):
undefined
Josh:
I'm curious, your take, do you think this is reflective of future performance? (17:31):
undefined
Josh:
Like, is there what is here that's actually valuable versus what is here is actually kind of lucky? (17:35):
undefined
Ejaaz:
I don't think we have enough information to make that call, at least for me. (17:40):
undefined
Ejaaz:
I'll speak for myself personally. (17:45):
undefined
Ejaaz:
The real test is, you know, I asked myself before we recorded this episode, (17:47):
undefined
Ejaaz:
would i give my money to grok 4.2 or the winner that one across all categories (17:51):
undefined
Ejaaz:
and the simple answer is like no like i don't i don't know if it's going to (17:55):
undefined
Ejaaz:
repeat that over week three week four week five it was only two weeks to your (17:59):
undefined
Ejaaz:
point right so i want to see this experiment kind of, (18:03):
undefined
Ejaaz:
rehash like a million times before i'm like okay that's cool um even then it's (18:06):
undefined
Ejaaz:
it's still kind of like risky right it's like i i can justify giving my money (18:10):
undefined
Ejaaz:
to a human that i can kind of relate to that I can call up in speed to, (18:14):
undefined
Ejaaz:
less so when it comes to an AI model, right? But maybe that's my thing, (18:18):
undefined
Ejaaz:
it needs to kind of evolve. (18:22):
undefined
Ejaaz:
The other way I'm thinking about this is there's just a lot of unknowns around this, Josh, right? (18:24):
undefined
Ejaaz:
Like I can see it's thinking, I can see kind of like how the model kind of completes its trades, (18:31):
undefined
Ejaaz:
I don't really know what's going under the hood. Is this just kind of like a (18:37):
undefined
Ejaaz:
pattern matching thing? (18:41):
undefined
Ejaaz:
Does it inherit the risks that a lot of humans have already done? (18:42):
undefined
Ejaaz:
Because it's trained on the same kind of corpus of trading data that we have (18:45):
undefined
Ejaaz:
kind of evaluated on? Or is it kind of net better? (18:49):
undefined
Ejaaz:
Do you feel the same or? (18:52):
undefined
Josh:
Yeah, it's probably, I mean, it's not the new gold standard of AI benchmarks. (18:54):
undefined
Josh:
But it is a standard that I think is interesting. Because this is a benchmark (18:59):
undefined
Josh:
that happens in the real world with real dynamic data that cannot be game. (19:03):
undefined
Josh:
So in that case, I love it. (19:07):
undefined
Josh:
But I saw one writer, they called it Schrodinger's Benchmark, (19:08):
undefined
Josh:
because it's simultaneously serious and degenerate at the same time. (19:12):
undefined
Josh:
And it's like it's entertainment with real money that happens to produce some (19:16):
undefined
Josh:
legitimate insights about AI behavior, but it's not really indicative of future (19:20):
undefined
Josh:
returns at the small of a sample size, at least. (19:24):
undefined
Josh:
And that's kind of where I feel about it. there is one breakthrough that we (19:27):
undefined
Josh:
mentioned earlier that does provide real value, which is the transparency. (19:30):
undefined
Josh:
Every trade being on chain and every step reason being logged is actually really (19:35):
undefined
Josh:
helpful to understanding how these models think and how you can consider thinking. (19:39):
undefined
Josh:
So for example, you could show me every decision Grok 4.20 made on Tesla after (19:43):
undefined
Josh:
the Fed announcement or something like that. And it'll walk you through a chain of thought. (19:47):
undefined
Josh:
And if anything, make you into a better investor. (19:50):
undefined
Josh:
Would I trust the model of my own money? No. (19:54):
undefined
Josh:
Maybe a little bit maybe with a small sample size how (19:57):
undefined
Josh:
much it is that's a (20:00):
undefined
Josh:
good question i'd give it a couple thousand dollars to play around with and see (20:03):
undefined
Josh:
what happens i think that that would be interesting and fun and it's it's (20:06):
undefined
Josh:
low enough stakes but i would trust it enough to not lose it like i'd say i (20:09):
undefined
Josh:
would probably trust grok more with my money than i would the average day trader (20:14):
undefined
Josh:
off the street um which granted they don't have a very good reputation but i (20:20):
undefined
Josh:
think there is some sort of an edge there that doesn't exist in the average person. (20:23):
undefined
Josh:
And if you assume that these models are going to continue to get better and (20:28):
undefined
Josh:
better, well, you have to assume that they're going to form some sort of an (20:31):
undefined
Josh:
edge, but I don't know how much. (20:35):
undefined
Josh:
It's an interesting question because as a quant trading fund too, (20:37):
undefined
Josh:
if your job or as just a trader in general, if your job is to make money off (20:41):
undefined
Josh:
of trading, what are you doing about this information? Are you leaning into AI? (20:44):
undefined
Josh:
Are you trying to get these models to help you with your information flows and make decisions? (20:48):
undefined
Josh:
Are you using them to help you actually transact trades or are you just kind (20:53):
undefined
Josh:
of looking the other way and saying oh this is just a dumb experiment to benchmark (20:56):
undefined
Josh:
models there's no actual signal here and the answer is probably somewhere in the middle right yeah (21:00):
undefined
Ejaaz:
I mean well my initial reaction to that is um, (21:04):
undefined
Ejaaz:
Okay, quant funds already use algorithms. It would make a lot of sense if they (21:07):
undefined
Ejaaz:
started using AI algorithms, right? (21:13):
undefined
Ejaaz:
If you could get a smarter algorithm to trade for your fund, absolutely, right? (21:15):
undefined
Ejaaz:
So it's a no-brainer to me that these hedge funds, quant funds are going to (21:18):
undefined
Ejaaz:
be using AI, probably already using AI. (21:21):
undefined
Ejaaz:
Where I have maybe a hot take is that the transparency is just a nice to have. (21:24):
undefined
Ejaaz:
It is no way going to win in the best of models. (21:29):
undefined
Ejaaz:
Why? Because if you have an AI model that is like better than all the other (21:32):
undefined
Ejaaz:
AI models at trading, why would you make that public? (21:36):
undefined
Ejaaz:
Right. So like, I'm kind of like at ties between this thing, (21:39):
undefined
Ejaaz:
because I think the transparency is a really good thing in kind of like bringing (21:43):
undefined
Ejaaz:
up the floor of trading credibility for people that get access to this type of information. (21:46):
undefined
Ejaaz:
Like I have loved reading through these kind of like trade logs here, (21:52):
undefined
Ejaaz:
seeing how each model thinks and being like, okay, yeah, wow. (21:55):
undefined
Ejaaz:
I actually didn't think about that myself when I was buying that stock. (21:58):
undefined
Ejaaz:
Right. And these are like stocks that I've seen that I, that I can buy, (22:01):
undefined
Ejaaz:
right. The Amazon trade, the NVIDIA trade, I'm just like, oh, okay. (22:04):
undefined
Ejaaz:
I didn't think about that, right, yesterday whenever they made this trade. (22:06):
undefined
Ejaaz:
If I am a hedge fund, I'm like, yeah, if I've fine-tuned a model that is like (22:10):
undefined
Ejaaz:
beating all these models, I don't really want to expose that really. (22:15):
undefined
Ejaaz:
So it's kind of like a push and pull. (22:18):
undefined
Ejaaz:
The other thought I had, Josh, is, and maybe this is kind of like kind of semi-adjacent (22:20):
undefined
Ejaaz:
to what we're discussing here. (22:26):
undefined
Ejaaz:
I couldn't get the thought out of my head that if you could get Grok in X, (22:28):
undefined
Ejaaz:
trading some kind of money for you or guaranteeing you like a 5% to 10% annual (22:33):
undefined
Ejaaz:
return, that is something that I would like if framed correctly, (22:37):
undefined
Ejaaz:
I would put some money into, right? (22:41):
undefined
Ejaaz:
Maybe not over two weeks, but (22:43):
undefined
Ejaaz:
maybe over an adjusted kind of yearly period would be super cool to see. (22:45):
undefined
Josh:
Yeah, that's such a, it's such a fun question to ask is like, (22:49):
undefined
Josh:
what happens when this kind of system runs for two years, but with your, (22:51):
undefined
Josh:
like, let's say it's a large pension management fund and they just want a manager (22:56):
undefined
Josh:
that doesn't take fees and does a pretty good job. (22:59):
undefined
Josh:
Like, is there going to be enough trust in these systems to reliably place money at scale with them? (23:02):
undefined
Josh:
And And you have to assume, given the signal this early on, that the answer will be yes. (23:07):
undefined
Josh:
The question is, how much of a yes will it be? (23:12):
undefined
Josh:
What percentage of management will be AI as it gets better over time? (23:15):
undefined
Josh:
And the sample size sucks. I wish it was more than two weeks. I wish it was two years. (23:20):
undefined
Josh:
In two years from now, think about the progress we're going to see and what (23:23):
undefined
Josh:
type of impact that's going to have on trading models. (23:26):
undefined
Josh:
So this is, it's interesting. It's fascinating. (23:29):
undefined
Josh:
In fact, I'm really curious to actually run this experiment for ourselves. (23:32):
undefined
Josh:
I'd love to try to come up with a little trading model that runs these things (23:35):
undefined
Josh:
and test it out because it's fun and there is some sort of an edge there. (23:38):
undefined
Ejaaz:
I would say, okay, if I were to summarize my lesson from this entire competition (23:42):
undefined
Ejaaz:
or experiment so far, Josh, (23:48):
undefined
Ejaaz:
it is I'm not convinced to give AI models money to trade, but I'm convinced (23:50):
undefined
Ejaaz:
to use AI models to help me trade. (23:56):
undefined
Ejaaz:
So kind of like a human and AI model kind of work together and kind of become (23:59):
undefined
Ejaaz:
a better trader overall, I think is the main takeaway for me here. Do you share the same? (24:04):
undefined
Josh:
It's funny. I mean, this is how agents work today, right? (24:09):
undefined
Josh:
Like if you go on ChatGPT and you say, go book me a reservation, (24:12):
undefined
Josh:
it'll take you to the finish line. (24:16):
undefined
Josh:
And then you as the human provide the final filter and approve or deny. (24:17):
undefined
Josh:
And I think that's probably the happy middle ground while (24:20):
undefined
Josh:
we still don't really trust these models too much is give me (24:23):
undefined
Josh:
the thesis give me the trade i will either approve (24:27):
undefined
Josh:
or deny and that's how the money gets managed so (24:30):
undefined
Josh:
it's cool this is a great experiment i love that we got season 1.5 (24:33):
undefined
Josh:
i mean it's fascinating even more fascinating is that we (24:36):
undefined
Josh:
have an early look at grok 4.2 which by all (24:38):
undefined
Josh:
means is the best trading model in the world where will (24:41):
undefined
Josh:
it rank in the other benchmarks we will see we will be covering it as soon as (24:44):
undefined
Josh:
it comes out but i guess that's that's really it for this episode on season (24:48):
undefined
Josh:
1.5 the question i want to leave everyone else with is i mean would you trust (24:51):
undefined
Josh:
an ai with your part of the portfolio like how much money would you actually (24:55):
undefined
Josh:
give to an ai currently grok 4.2 who just made (24:58):
undefined
Josh:
60% in two weeks in one of these trading competitions. Is that enough for you to risk your money? (25:02):
undefined
Josh:
Or is it still just this dumb AI system that you don't really trust? (25:07):
undefined
Ejaaz:
Well, if you're interested in this experiment, Josh and I were actually discussing, (25:12):
undefined
Ejaaz:
about potentially giving you guys a tutorial on how to use an AI to trade money (25:17):
undefined
Ejaaz:
for you and kind of like an experiment, this own end of one experiment, but our own. (25:24):
undefined
Ejaaz:
But we want to get a little more signal from you guys. Let us know in the comments (25:29):
undefined
Ejaaz:
whether this is something that you'd be interested in seeing. (25:32):
undefined
Ejaaz:
And I have, Josh, I have a requirement for the listeners. (25:35):
undefined
Ejaaz:
If we do want to put the tutorial out. Our last video that we did on AI trading (25:40):
undefined
Ejaaz:
reached 100,000 views and 3,000 likes. (25:45):
undefined
Ejaaz:
So I'm not going to ask for the 100,000 views, but I will ask for the likes. (25:49):
undefined
Ejaaz:
If this video can get more than 3,000, if it gets 3,000 likes, (25:54):
undefined
Ejaaz:
we will definitely put out that tutorial by the end of the year. (25:58):
undefined
Ejaaz:
And we have a lot of thoughts around this, about how we're going to do it. (26:01):
undefined
Ejaaz:
We're super excited to do it. So help us get there. (26:05):
undefined
Ejaaz:
It is another week of really exciting news. Josh, I don't know if you saw the (26:08):
undefined
Ejaaz:
rumors. Did you see the rumors about OpenAI? (26:12):
undefined
Josh:
Tell me, fill me in. (26:14):
undefined
Ejaaz:
About OpenAI releasing a potential new groundbreaking model? (26:15):
undefined
Josh:
As a matter of fact, the Polymarket is showing that OpenAI is very favored to (26:18):
undefined
Josh:
release the best model of the year. (26:22):
undefined
Josh:
And last I checked, Gemini is the best model of the year. So that implies we're (26:25):
undefined
Josh:
getting something big in the next few weeks. (26:28):
undefined
Ejaaz:
I think we will. and like you said, the Polymarket is kind of like revealing (26:30):
undefined
Ejaaz:
its hands so maybe there's some inside information coming out here. (26:35):
undefined
Ejaaz:
So kind of stay tuned to Limitless. Put the notifications on, (26:37):
undefined
Ejaaz:
guys and also subscribe if you want to get the latest videos. (26:41):
undefined
Ejaaz:
We put out the best content out there. (26:44):
undefined
Ejaaz:
It's unchallenged right now. Josh and I are sitting here unchallenged. (26:47):
undefined
Ejaaz:
You have to like and subscribe if you want to get our content on your feed. (26:50):
undefined
Ejaaz:
Thank you so, so much for listening. Again, let us know what you thought of (26:53):
undefined
Ejaaz:
this episode in the comments. Get that like number up and we will see you on the next one. (26:56):
undefined