Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Ejaaz:
What if I told you there was a single website you could go to where you can (00:03):
undefined
Ejaaz:
chat to any major AI model from one single interface? (00:07):
undefined
Ejaaz:
It's kind of like chat GPT, but instead every prompt gets routed to the exact (00:11):
undefined
Ejaaz:
AI model that will do the best job for whatever your prompt might be. (00:16):
undefined
Ejaaz:
Well, on today's episode, we're joined by Alex Atala, the founder and CEO of Open Router AI. (00:20):
undefined
Ejaaz:
It's the fastest growing AI model marketplace with access to over 400 LLMs, (00:25):
undefined
Ejaaz:
making it the only place that really knows how people use AI models, (00:31):
undefined
Ejaaz:
and more importantly, how they might use them in the future. (00:34):
undefined
Ejaaz:
It's at the intersection of every single prompt that anyone writes and every (00:37):
undefined
Ejaaz:
model that they might ever be. (00:42):
undefined
Ejaaz:
Alex Atala, welcome to the show. How are you, man? (00:43):
undefined
Alex:
Thanks, guys. Great. Thanks so much for having me on. (00:46):
undefined
Ejaaz:
So it is a Monday. How does the founder of OpenRouter spend his weekend? (00:48):
undefined
Ejaaz:
Presumably you know out and about chilling relaxing not at all focused on the company oh (00:53):
undefined
Alex:
I usually i love weekends with no (00:58):
undefined
Alex:
meetings planned and i just go to a coffee shop and just have tons of hours (01:01):
undefined
Alex:
stacked in a row uh to do things that require a lot of momentum build up so (01:06):
undefined
Alex:
i did that at coffee shops on saturday and sunday and then i watched blade runner again. (01:11):
undefined
Ejaaz:
Again okay um well so (01:20):
undefined
Ejaaz:
when we were preparing for this episode alex (01:23):
undefined
Ejaaz:
um i couldn't help but think that you've had a pretty insane decade of startup (01:27):
undefined
Ejaaz:
foundership right um so open router is kind of like your second major thing (01:35):
undefined
Ejaaz:
that you've done but prior to doing that you were the founder and cto of OpenSea, (01:40):
undefined
Ejaaz:
the biggest NFT marketplace out there. (01:45):
undefined
Ejaaz:
And now you're focused on one of the biggest AI companies out there. (01:48):
undefined
Ejaaz:
So it sounds like you're at kind of like the pivot point of two of the most (01:51):
undefined
Ejaaz:
important technology sectors over the last decade. (01:56):
undefined
Ejaaz:
Can you just give us a bit of background as to how you ended up here? (01:59):
undefined
Ejaaz:
And more importantly, where you started. (02:04):
undefined
Ejaaz:
Walk us through the journey of OpenSea and how you ended up at OpenRouter AI. (02:06):
undefined
Alex:
Yeah, so I co-founded OpenSea with Devin Finzer the very beginning of 2018, very end of 2017. (02:10):
undefined
Alex:
It was the first NFT marketplace. And... (02:20):
undefined
Alex:
It was not dissimilar to OpenRouter in that there was a really fragmented ecosystem (02:25):
undefined
Alex:
of NFT metadata and media that gets attached to these tokens. (02:33):
undefined
Alex:
And it was the first example of something in crypto that could be non-fungible, (02:40):
undefined
Alex:
meaning it's a single thing that can be traded from person to person. (02:47):
undefined
Alex:
Most things in the world are non-fungible. A chair is non-fungible. a (02:51):
undefined
Alex:
currency is fungible so it was (02:54):
undefined
Alex:
back back in 2018 no (02:58):
undefined
Alex:
one was really thinking about crypto in terms of non-fungible goods (03:02):
undefined
Alex:
and uh and the problem with the non with (03:05):
undefined
Alex:
non-fungible goods is that there weren't any real standards set up (03:08):
undefined
Alex:
um there was a lot of heterogeneous like (03:11):
undefined
Alex:
implementations for how to get uh like (03:15):
undefined
Alex:
a non-fungible item represented and tradable in a decentralized way So OpenSea (03:18):
undefined
Alex:
organized this like very heterogeneous inventory and put it together in one (03:24):
undefined
Alex:
place. We came up with like a metadata standard. (03:33):
undefined
Alex:
We did a lot of like a lot of work to really make the experience super good for each collection. (03:36):
undefined
Alex:
And you see a lot of those a lot of similarities with how AI works today, (03:45):
undefined
Alex:
too, where there's also just a very heterogeneous ecosystem. (03:50):
undefined
Alex:
On a lot of different APIs and different features supported by language model providers. (03:53):
undefined
Alex:
And Open Router similarly does a lot of work to organize it all. (04:00):
undefined
Alex:
Um i was at open sea uh until 2022 um when i was kind of feeling the itch to do something new. (04:06):
undefined
Alex:
And um i'm at the very end of i left in august and then chat gpt came out a few months later. (04:16):
undefined
Alex:
And uh and my biggest question around that (04:24):
undefined
Alex:
time was whether it was going to be a winner take all market (04:27):
undefined
Alex:
because opening i was very far ahead of (04:30):
undefined
Alex:
everybody else and um you know (04:33):
undefined
Alex:
we had cohere command we had a couple open source models um (04:36):
undefined
Alex:
but opening i was the only really usable one i (04:39):
undefined
Alex:
was doing little projects to experiment (04:42):
undefined
Alex:
with the gpt3 api and uh (04:45):
undefined
Alex:
and then llama came out in january um really (04:48):
undefined
Alex:
exciting about a tenth the size one on a (04:52):
undefined
Alex:
couple benchmarks but it wasn't really chattable yet and (04:55):
undefined
Alex:
uh and it wasn't until uh a few (04:59):
undefined
Alex:
months later that somebody a team at (05:02):
undefined
Alex:
stanford distilled it into a new (05:05):
undefined
Alex:
model called alpaca um distillation means (05:09):
undefined
Alex:
you you take the model and you customize it or fine tune it (05:12):
undefined
Alex:
on a set of synthetic data that they (05:15):
undefined
Alex:
made using chat gpt as a research project (05:18):
undefined
Alex:
and uh and that was it was (05:21):
undefined
Alex:
the first successful major distillation that i'm aware (05:24):
undefined
Alex:
of um and it was an actually usable model i (05:27):
undefined
Alex:
was like on the airplane talking to him i was like wow this (05:30):
undefined
Alex:
is if it only took six hundred dollars to make something like this then you (05:33):
undefined
Alex:
don't need ten million dollars to make a model there might be like tens of thousands (05:36):
undefined
Alex:
hundreds of thousands of models in the future and suddenly this started to look (05:40):
undefined
Alex:
like a new like economic primitive a new building block that people that kind (05:44):
undefined
Alex:
of deserve their own place on the internet. (05:51):
undefined
Alex:
And there wasn't one. There wasn't a place where you could discover new language (05:52):
undefined
Alex:
models and see who uses them and why. (05:56):
undefined
Alex:
And that's how OpenRouter got started. (05:59):
undefined
Josh:
That's amazing. So one of the things that we're obsessed with on this channel (06:02):
undefined
Josh:
in particular is exploring frontiers and how to properly see these frontiers (06:05):
undefined
Josh:
and analyze them and understand when they're going to happen. (06:10):
undefined
Josh:
And when I was going through your history, you have this talent consistently over time. (06:12):
undefined
Josh:
And even as far back as early on, I read you were hacking Wi-Fi routers in a hackathon. (06:16):
undefined
Josh:
You're very early to that. You were early to the NFTs. You were early to understanding (06:21):
undefined
Josh:
AI and the impact that it would have. And what I'd love for you to explain is (06:25):
undefined
Josh:
the thought process and the indicators you look for when exploring these new (06:28):
undefined
Josh:
frontiers, because clearly there's some sort of pattern matching going on. (06:31):
undefined
Josh:
Clearly you have some sort of awareness of what will be important and why it (06:34):
undefined
Josh:
will be important, and then inserting yourself into that narrative. (06:38):
undefined
Josh:
So are there patterns? Are there certain things that you look for when searching (06:41):
undefined
Josh:
for these new opportunities and that led you to make these decisions that you have? (06:45):
undefined
Alex:
I think there's there's a lot to be said for finding enthusiast communities (06:49):
undefined
Alex:
and and seeing if you're going to join it. (06:55):
undefined
Alex:
Like, can you be an enthusiast with them? (06:58):
undefined
Alex:
Like whenever something new comes out that has like some kind of ecosystem potential, (07:02):
undefined
Alex:
there's there are going to be enthusiast communities that pop up. (07:08):
undefined
Alex:
And the Internet has made it self-certain. You could just join the communities. (07:11):
undefined
Alex:
Um discord i think is a incredible (07:17):
undefined
Alex:
and super underrated platform because (07:20):
undefined
Alex:
the communities feel kind of private you're (07:24):
undefined
Alex:
like getting you don't feel like you're you know (07:27):
undefined
Alex:
seeing somebody trying to get s you (07:30):
undefined
Alex:
know like advertise something for seo juice there's (07:33):
undefined
Alex:
no seo juice in discord um it's it's (07:36):
undefined
Alex:
just people talking about what they're passionate about and and it (07:40):
undefined
Alex:
goes it gets really niche um and when (07:43):
undefined
Alex:
you find a like an interest group in discord that (07:45):
undefined
Alex:
like has to do with some some new (07:49):
undefined
Alex:
piece of technology that's just being developed right now and doesn't really (07:52):
undefined
Alex:
work very well at all um you get people who are just trying to figure out what (07:57):
undefined
Alex:
to do with it and how to make it better and i think that's like that's the first (08:02):
undefined
Alex:
core piece of magic that jumps to mind, (08:07):
undefined
Alex:
there's got to be like a willingness to be weird because if you jump into any (08:11):
undefined
Alex:
of these communities at face value it's stupid. (08:17):
undefined
Alex:
Like oh this is like just a game or it's like a really weird game I mean I'm (08:21):
undefined
Alex:
not really a collectible game so I'm going to leave right now and yeah. (08:26):
undefined
Alex:
Not only do you have to be aware, but you have to be creative. (08:34):
undefined
Alex:
Like, okay, these are just cats on the blockchain, and people are just trading cats back and forth. (08:36):
undefined
Alex:
You can't look at the community as simply that. (08:43):
undefined
Alex:
Think about what you could do with it. (08:49):
undefined
Alex:
Like, what is this unlock that wasn't achievable before? (08:52):
undefined
Alex:
Um and uh (08:56):
undefined
Alex:
and and i think there are there are people who (08:59):
undefined
Alex:
just are good who will do this and they'll join the communities (09:02):
undefined
Alex:
and and brainstorm live and you can see everybody (09:05):
undefined
Alex:
brainstorming uh in real time but like (09:08):
undefined
Alex:
another incredible example of this was the mid-journey discord (09:11):
undefined
Alex:
you know it became the (09:15):
undefined
Alex:
biggest biggest server in discord by (09:18):
undefined
Alex:
far uh and you know (09:21):
undefined
Alex:
why did that happened well you could it started with (09:25):
undefined
Alex:
something weird silly maybe (09:28):
undefined
Alex:
not super useful but you could see all the (09:32):
undefined
Alex:
enthusiasts like remixing and (09:34):
undefined
Alex:
brainstorming live how to turn it into something beautiful (09:38):
undefined
Alex:
and how to how to make it useful and um (09:41):
undefined
Alex:
and then you know just explode it like i it's the most it's the it's the most (09:45):
undefined
Alex:
incredible like niche community uh i think that discord has ever seen because (09:53):
undefined
Alex:
of like how useless it started and how insanely exciting it became. (09:59):
undefined
Alex:
So um like i mean i i (10:06):
undefined
Alex:
think i saw big sleep i was like playing around with this model (10:09):
undefined
Alex:
called big sleep in 2021 that uh (10:12):
undefined
Alex:
let you generate images that (10:16):
undefined
Alex:
look kind of like deviant art okay and (10:19):
undefined
Alex:
uh you could see you could like they're all (10:22):
undefined
Alex:
animated images and they none of them really made sense but you could get some (10:25):
undefined
Alex:
really cool stuff not like potentially something you'd want to make your desktop (10:29):
undefined
Alex:
wallpaper and if you're really like deep in some deviant art communities you (10:32):
undefined
Alex:
know you kind appreciate it and so and that that that was like oh there's like (10:38):
undefined
Alex:
a kernel of something here, (10:41):
undefined
Alex:
and uh it took like a like another year or two before mid-journey started to (10:43):
undefined
Alex:
like pick up but that was like. (10:49):
undefined
Ejaaz:
Where were you seeing all of this alex like where were you scouring just random (10:51):
undefined
Ejaaz:
forums or just wherever your nose told you to go (10:56):
undefined
Alex:
But basically there's this twitter account I'm trying to remember what it's (10:59):
undefined
Alex:
called that posts AI research papers and and like kind of tries to show what you can do with them. (11:03):
undefined
Alex:
And I discovered this Twitter account in like 2021. (11:13):
undefined
Alex:
And I. (11:19):
undefined
Alex:
I think it was not it was it wasn't at all like related to crypto but it was (11:21):
undefined
Alex:
a way you know big sleep was like the first thing i saw that used ai to generate (11:27):
undefined
Alex:
things that could potentially be nfts, (11:32):
undefined
Alex:
so i started experimenting around like how how much you could direct it to make (11:35):
undefined
Alex:
an nft collection that would make any sense it was very very difficult um but (11:42):
undefined
Alex:
that was how uh that was like the first generative and. (11:49):
undefined
Ejaaz:
This was before you were even thinking about starting open router right (11:54):
undefined
Alex:
Um yeah yeah this was back this was when i was (12:01):
undefined
Alex:
full-time at openc um oh is (12:04):
undefined
Alex:
yeah i got the it's a colic (12:07):
undefined
Alex:
this twitter account all right (12:10):
undefined
Alex:
i really recommend it they basically post papers and like explainate and explore (12:13):
undefined
Alex:
how this paper gets useful um they post animations uh like they make they make (12:20):
undefined
Alex:
ai research like kind of fun to engage with and that was that was my first experience. (12:27):
undefined
Ejaaz:
Okay, so I mean that's a massive win for X or formerly as it was known back (12:33):
undefined
Ejaaz:
then, Twitter as a platform, right? (12:39):
undefined
Ejaaz:
It gave birth to kind of like two of the biggest technologies crypto, (12:41):
undefined
Ejaaz:
also known as crypto Twitter, and now apparently all the AI research stuff which (12:44):
undefined
Ejaaz:
kind of put you on to the path that led you to OpenRatter. (12:49):
undefined
Ejaaz:
So if I've got this right, Alex, you were full-time at OpenSea with a (12:53):
undefined
Ejaaz:
multi-billion dollar company loads of important stuff to do there, (12:59):
undefined
Ejaaz:
but you still found the time to kind of scour this fringe technology because (13:02):
undefined
Ejaaz:
that's what AI was at the time. (13:07):
undefined
Ejaaz:
Prior to kind of GPT-2 or GPT-3, no one really knew about this. (13:09):
undefined
Ejaaz:
And you were playing around with these gen AI models, these generative AI models (13:13):
undefined
Ejaaz:
that would create this magical little substance and maybe it came in the form (13:17):
undefined
Ejaaz:
of a pitcher or a weird little cat. (13:22):
undefined
Ejaaz:
And you kind of like jumped into these niche forums of enthusiasts, (13:25):
undefined
Ejaaz:
as you say, and kind of explored that further. (13:28):
undefined
Ejaaz:
And it sounds like you kind of like honed that even beyond your journey from OpenSea when you left. (13:32):
undefined
Ejaaz:
I remember actually meeting you in this kind of like this abbess between you (13:37):
undefined
Ejaaz:
leaving OpenSea and starting OpenRouter where you were kind of brainstorming (13:43):
undefined
Ejaaz:
a bunch of these ideas. And I remember a snippet from our conversation (13:48):
undefined
Ejaaz:
In like one of the WeWorks here, where you just kind of like had whiteboarded a bunch of AI stuff. (13:53):
undefined
Ejaaz:
And one of those things was kind of like the whole topic of inference. (13:58):
undefined
Ejaaz:
And if I'm being honest with you, I had no idea what that word even meant back then. (14:02):
undefined
Ejaaz:
I was extremely focused on all the NFT stuff and all the crypto stuff, (14:07):
undefined
Ejaaz:
my background's in all of that. (14:12):
undefined
Ejaaz:
But I just found that fascinating that you always had your nose in some of the (14:13):
undefined
Ejaaz:
early communities. And I think that's a really important lesson there. (14:17):
undefined
Ejaaz:
I want to pick up on something that you actually brought up when you said you (14:21):
undefined
Ejaaz:
discovered kind of like your path to open router, Alex. (14:27):
undefined
Ejaaz:
And that is, you said you were playing around with these early AI models. (14:31):
undefined
Ejaaz:
So not the GPTs before Claude was even created. (14:36):
undefined
Ejaaz:
You're playing around with these random models that you would find either on (14:40):
undefined
Ejaaz:
forums, on Twitter, or on Reddit, right? and you would experiment with them. (14:43):
undefined
Ejaaz:
And I find it fascinating that back then, even when GPT became a thing, (14:49):
undefined
Ejaaz:
you were convinced that there would be hundreds of thousands, (14:54):
undefined
Ejaaz:
or did you say hundreds of thousands of AI models? (14:57):
undefined
Ejaaz:
Back then, that wasn't a normal view. (14:59):
undefined
Ejaaz:
Back then, everyone was like, you need hundreds of millions of dollars. (15:02):
undefined
Ejaaz:
Maybe it was tens of millions of dollars back then. And it was going to be a rich man's game. (15:06):
undefined
Alex:
Yeah, it was basically the Alpaca Project that kind of put me over the sack. (15:11):
undefined
Alex:
On there being many, many, many models instead of just a very small number. (15:18):
undefined
Ejaaz:
And can you explain what the Alpaca project is for the audience? Yeah. (15:24):
undefined
Alex:
So the Alpaca project, after Lama came out, you really could not chat with it (15:29):
undefined
Alex:
very well. It was a text completion model. (15:38):
undefined
Alex:
There were a couple benchmarks where it beat GPT-3. (15:41):
undefined
Alex:
And... It was about a tenth the size of what most people thought GPT-3 was sized at. (15:46):
undefined
Alex:
So it was a pretty incredible achievement. (15:54):
undefined
Alex:
But it wasn't really like, the user experience wasn't there. (15:57):
undefined
Alex:
And the Alpaca project took ChatGPT and generated a bunch of synthetic outputs. (16:01):
undefined
Alex:
And then they fine-tuned Llama on those synthetic outputs. (16:09):
undefined
Alex:
And this did two things to Llama. It taught it style, and it taught it knowledge. (16:14):
undefined
Alex:
It taught it, like, the style is like how to chat, which was the big user experience gap. (16:21):
undefined
Alex:
And it made it smarter. (16:26):
undefined
Alex:
Like, you can, fine-tuning transfers both style and knowledge. (16:30):
undefined
Alex:
And the model would, like, respond to things that it had, you know, (16:34):
undefined
Alex:
like, the content of the synthetic data, like, was reflected in the model's (16:37):
undefined
Alex:
performance on benchmarks after that point. (16:42):
undefined
Alex:
So um so if you can do (16:44):
undefined
Alex:
that without revealing all (16:47):
undefined
Alex:
the data that goes in um now now (16:50):
undefined
Alex:
there's like a way you could sell data via api without (16:54):
undefined
Alex:
like like just dumping all the data out to the world and then never being able (16:57):
undefined
Alex:
to to like monetize it again so there's like a brand new business model around (17:02):
undefined
Alex:
data that emerges um yet like the ability to create just like work towards open intelligence, (17:07):
undefined
Alex:
and uh and build like new (17:16):
undefined
Alex:
architectures test them more quickly and and and (17:20):
undefined
Alex:
uh uh fine-tune them quickly basically you (17:23):
undefined
Alex:
can build on top of the work of giants i mean (17:26):
undefined
Alex:
you don't have to start from zero every time a lot (17:29):
undefined
Alex:
of like the biggest developer experience innovations just involve like giving (17:32):
undefined
Alex:
developers a higher stair to start walking up so they don't have to start at (17:38):
undefined
Alex:
the bottom of the staircase every single time um and you know that was like the the the big. (17:43):
undefined
Alex:
Like generous give that llama had for the community um and it wasn't you know (17:52):
undefined
Alex:
that wasn't the only company doing open source models, Mastral, (17:58):
undefined
Alex:
came out with 7B Instruct a few months later. It was an incredible model. (18:02):
undefined
Alex:
Then they came out with the first open-weight mixture of experts a few months later. (18:08):
undefined
Alex:
It felt like actual intelligence, but completely open. (18:15):
undefined
Alex:
And all of these provide higher and higher stairs for other developers to kind (18:19):
undefined
Alex:
of like, basically to crowdsource new ideas from the whole planet. (18:24):
undefined
Alex:
Uh and and let these new ideas build on (18:30):
undefined
Alex:
top of really good foundations so and (18:33):
undefined
Alex:
you know when that when that like whole picture started (18:37):
undefined
Alex:
to form into place um it felt like okay this is going to be like a huge inventory (18:40):
undefined
Alex:
situation you kind of like nft collections were a huge inventory situation obviously (18:46):
undefined
Alex:
completely different really different market dynamics really different type (18:51):
undefined
Alex:
of of goal that buyers have. (18:55):
undefined
Alex:
And so a lot of like my early experimentation, like I made like a Chrome extension called Window AI. (18:59):
undefined
Alex:
I did like a few other things were just about learning how the ecosystem works (19:06):
undefined
Alex:
and like what makes it different and how the like, like what people really want, (19:11):
undefined
Alex:
what developers really want. (19:16):
undefined
Josh:
So that leads us to OpenRouter itself, right? So I kind of want you to help (19:17):
undefined
Josh:
explain to the listeners who aren't familiar with OpenRouter what it does. (19:21):
undefined
Josh:
Because I think a lot of people, the way they interact with an AI is they send (19:24):
undefined
Josh:
a prompt to their model of choice. (19:27):
undefined
Josh:
They use ChatGPT or they use the Grok app or they're on Gemini and they kind (19:29):
undefined
Josh:
of live in these siloed worlds. (19:33):
undefined
Josh:
And then the next step up from the people are those kind of who use it professionally, (19:34):
undefined
Josh:
who are developers. They're interacting with APIs. (19:38):
undefined
Josh:
Maybe they're not interfacing with the actual UI, but they're calling a single model. (19:40):
undefined
Josh:
And OpenRouter kind of exists on top of this, right? Can you walk us through (19:44):
undefined
Josh:
how it works and why so many people love using OpenRouter? (19:47):
undefined
Alex:
Open Router is an aggregator and marketplace for large language models. (19:50):
undefined
Alex:
You can kind of think of it as like a Stripe meets Cloudflare for both of them. (19:56):
undefined
Alex:
It's like a single pane of glass. You can orchestrate, discover, (20:04):
undefined
Alex:
and optimize all of your intelligence needs in one place. (20:09):
undefined
Alex:
One billing provider gets you all the models. (20:14):
undefined
Alex:
Uh there's like 470 plus now uh (20:17):
undefined
Alex:
like all the models like they sort of implement features (20:21):
undefined
Alex:
but they do it differently and they also there's (20:24):
undefined
Alex:
a lot of like intelligence brownouts as andre carpoffi calls them yeah where (20:28):
undefined
Alex:
models just go down all the time even the you know even the top models like (20:32):
undefined
Alex:
anthropic and gemini and and open (20:36):
undefined
Alex:
ai um so what we do is you know we like developers need a lot of choice. (20:39):
undefined
Alex:
CTOs need a lot of reliability. (20:48):
undefined
Alex:
CFOs need predictable costs. CISOs need complex policy controls. (20:50):
undefined
Alex:
All of these are inputs to what we do, which is build a single pane of glass (20:55):
undefined
Alex:
that makes models more reliable, lower costs, gives you more choice, and, (21:03):
undefined
Alex:
and then and helps you choose between all the options for where to source your intelligence. (21:11):
undefined
Josh:
How does it work uh because i would imagine like what (21:17):
undefined
Josh:
each as and i on the show we frequently talk about benchmarks right where (21:20):
undefined
Josh:
a certain model is the best at coding and that infers that maybe you should (21:23):
undefined
Josh:
go to that model to do all of your coding needs because it's the best at it (21:27):
undefined
Josh:
but it would appear as if it's not true if you're routing through a lot of different (21:30):
undefined
Josh:
providers so how do you consider which provider gets routed to when and how (21:34):
undefined
Josh:
to get the best result for what you're asking (21:39):
undefined
Alex:
So we've taken a different approach so (21:41):
undefined
Alex:
far which is instead of like focusing on (21:44):
undefined
Alex:
a production router that picks (21:48):
undefined
Alex:
the model for you um we try (21:50):
undefined
Alex:
to help you choose the model so we (21:53):
undefined
Alex:
we build lots we create lots of analytics both on (21:57):
undefined
Alex:
your account and uh and on our (22:01):
undefined
Alex:
rankings page to help you browse and discover the models that (22:03):
undefined
Alex:
like the power users are really using successfully on (22:06):
undefined
Alex:
a certain type of workload um because we (22:10):
undefined
Alex:
think like developers today primarily want to (22:13):
undefined
Alex:
choose the model themselves um switching between all (22:16):
undefined
Alex:
families can result in like a lot like very (22:19):
undefined
Alex:
unpredictable behavior but once you've (22:22):
undefined
Alex:
chosen your model um we try to (22:25):
undefined
Alex:
help developers not need to think about the provider there are (22:28):
undefined
Alex:
like sometimes dozens of (22:31):
undefined
Alex:
providers for a given model uh all kinds (22:34):
undefined
Alex:
of companies including the hyperscalers like aws google vertex and azure um (22:38):
undefined
Alex:
and uh like scaling startups like together fireworks deep infra um and a long (22:46):
undefined
Alex:
tail of providers that provide, (22:54):
undefined
Alex:
like very unique features, (22:57):
undefined
Alex:
very like exceptional performance. (23:00):
undefined
Alex:
There's all kinds of differentiators for them. (23:03):
undefined
Alex:
So what we do is we collect them all in one place. And if you want a feature, (23:06):
undefined
Alex:
you just get the providers that support it. (23:10):
undefined
Alex:
If you want performance, you get prioritized to the providers that have high performance. (23:12):
undefined
Alex:
If you really are cost sensitive, you get prioritized to the providers that (23:18):
undefined
Alex:
are really low cost today. and we basically create all these lanes. There's. (23:22):
undefined
Alex:
Innumerable ways you could get routed but (23:31):
undefined
Alex:
you're in full control of the of the overall user (23:35):
undefined
Alex:
experience that you're aiming for and that's (23:38):
undefined
Alex:
what that's what we found that was missing from the (23:41):
undefined
Alex:
whole ecosystem was just a way of doing that and uh (23:44):
undefined
Alex:
and you know we get like between on average five to ten percent uptime boosts (23:47):
undefined
Alex:
over going to um providers directly just by load balancing and sending you to (23:53):
undefined
Alex:
the top provider that's up and able to handle your request. (24:01):
undefined
Alex:
We really focus hard on efficiency and performance. (24:10):
undefined
Alex:
We only add about 20 to 25 milliseconds of latency on top of your request. (24:14):
undefined
Alex:
It all gets deployed very close to your servers up the edge. (24:19):
undefined
Alex:
We overall get just We stack providers. (24:25):
undefined
Alex:
We figure out what you can benefit from that everybody else is doing and just (24:29):
undefined
Alex:
give you the power of big data as a developer just accessing your model choice. (24:37):
undefined
Josh:
So it kind of allows you to harness the collective knowledge of everybody, right? (24:44):
undefined
Josh:
You get all of the data, you have all of the queries, you know which yields (24:48):
undefined
Josh:
the best result, and you're able to deliver the best product for them. (24:51):
undefined
Josh:
Now, in terms of actual LLMs, EJ has actually pulled this up just before, which is a leaderboard. (24:54):
undefined
Josh:
And I'm interested in how you guys think about LLMs, which are the best, (24:59):
undefined
Josh:
how to benchmark them, and how you route people through them. (25:03):
undefined
Josh:
Is there a specific... Do you believe that benchmarks are accurate, (25:06):
undefined
Josh:
and do you reflect those in the way that you route traffic through these models? (25:10):
undefined
Alex:
In general, we have taken the stance that we want to be the capitalist benchmark for models. (25:13):
undefined
Alex:
What is actually happening? (25:25):
undefined
Alex:
And part of this is that I really think both the law of large numbers and the (25:27):
undefined
Alex:
enthusiasm of power users are really, really valuable for everybody else. (25:36):
undefined
Alex:
Like when you're routing to (25:42):
undefined
Alex:
um like clod in (25:45):
undefined
Alex:
let's say you're routing to clod 4 and you're (25:50):
undefined
Alex:
based in europe um there you (25:52):
undefined
Alex:
know all of a sudden there might be like a huge variance in in throughput from (25:56):
undefined
Alex:
one of the providers and you're only able to detect that if like some other (26:00):
undefined
Alex:
users have discovered it before you and so we route around the provider that's (26:05):
undefined
Alex:
like running kind of slow in Europe and send you, (26:09):
undefined
Alex:
if your data policies allow it, (26:13):
undefined
Alex:
to a much faster provider somewhere else. (26:16):
undefined
Alex:
And that allows you to get faster performance. So, like, um... (26:18):
undefined
Alex:
That's, like, on the provider level, how, like, numbers help. (26:23):
undefined
Alex:
On the, like, model selection level, like, what you see on this rankings page (26:26):
undefined
Alex:
here, power users will, like, when we put up a model, like, we put up a new (26:30):
undefined
Alex:
model today from a new model lab called ZAI, (26:36):
undefined
Alex:
like, the power users instantly discover it. (26:40):
undefined
Alex:
We have this LLM enthusiast community that dives in and really figures out what (26:43):
undefined
Alex:
a model is good for along a bunch of core use cases. (26:51):
undefined
Alex:
The power users figure out which workloads are interesting, and then you can (26:54):
undefined
Alex:
just see in the data what they're doing. And everybody can benefit from it. (26:59):
undefined
Alex:
That's why we open up our data and share it for free on the rankings page here. (27:04):
undefined
Ejaaz:
I'm seeing this one consistent unit across all these rankings, (27:11):
undefined
Ejaaz:
Alex, which is tokens, right? (27:16):
undefined
Ejaaz:
And Josh and I have spoken about this on the show before, but I'm wondering (27:18):
undefined
Ejaaz:
how, like you've chosen this specific unit to measure how good or effective (27:23):
undefined
Ejaaz:
these models are or how consumed or used they are. (27:28):
undefined
Ejaaz:
Can you tell us a bit more as to why you picked this particular unit and what (27:32):
undefined
Ejaaz:
that tells you as like the open router platform as to how a user is using a particular model? (27:35):
undefined
Alex:
Yeah, I think dollars is a good metric too. (27:41):
undefined
Alex:
The reason we chose tokens is primarily because we were seeing prices come down really quickly. (27:45):
undefined
Alex:
Open Router has been around since the beginning of 2023. (27:57):
undefined
Alex:
And I didn't want a model to be penalized in the rankings just because the prices (28:02):
undefined
Alex:
are going down really dramatically now like there's a, (28:11):
undefined
Alex:
There's a paradox called Jevons paradox, which is that when prices decrease like 10x, (28:17):
undefined
Alex:
users' use of some component of infrastructure increases by more than 10x. (28:26):
undefined
Alex:
And so maybe they didn't get 10x at all. (28:35):
undefined
Alex:
But I thought there were some other advantages to using tokens, (28:39):
undefined
Alex:
too. Tokens don't have this penalty and don't rely on Jevon's Paradox, (28:42):
undefined
Alex:
which can have a lot of lag. (28:47):
undefined
Alex:
They also are a little bit of a proxy for time. (28:49):
undefined
Alex:
A model that is generating a lot of tokens and doing so for a while across a lot of users. (28:53):
undefined
Alex:
It means that a lot of people are reading those tokens and actually doing something with them. (29:02):
undefined
Alex:
And same goes for input. But if I really want to send an enormous number of (29:06):
undefined
Alex:
documents and the model has a really, really, really tiny prompt pricing, (29:11):
undefined
Alex:
I think that's still valuable and something that we want to see. (29:16):
undefined
Alex:
We want to see that this model is processing an enormous number of documents. (29:19):
undefined
Alex:
That's a use case that should show up in the rankings. (29:23):
undefined
Alex:
And so we decided to go with tokens. We might like add dollars in the future, (29:27):
undefined
Alex:
but I think tokens are, you know, they don't have this like Jevons Paradox lag. (29:33):
undefined
Alex:
And there wasn't anything else. Like nobody was doing any kind of like overall analytics. (29:42):
undefined
Alex:
We didn't see any other company even do it until Google did a few months ago (29:48):
undefined
Alex:
where they started publishing the total amount of tokens processed by Gemini. (29:54):
undefined
Alex:
So we'll see which use cases really need dollars. (30:00):
undefined
Alex:
But tokens have been holding up pretty well. (30:06):
undefined
Ejaaz:
Yeah, I mean, this dashboard is awesome. And I recommend anyone that's listening (30:09):
undefined
Ejaaz:
to this that can't see our screen to get on OpenRouter's website and check it out. (30:14):
undefined
Ejaaz:
I've been following it for the last two weeks kind of pretty rigorously, Alex. (30:20):
undefined
Ejaaz:
And what I love is you can literally see... (30:25):
undefined
Ejaaz:
So two weeks ago Grok 4 got released right (30:28):
undefined
Ejaaz:
and Josh and I were making a ton of videos on this we were (30:31):
undefined
Ejaaz:
using it with pretty much everything that we could do and (30:34):
undefined
Ejaaz:
then this other model came out of China pretty much a few days after called (30:38):
undefined
Ejaaz:
Kimi K2 and I was like oh yeah whatever this is just some random Chinese model (30:42):
undefined
Ejaaz:
I'm not going to focus on it and then I kept seeing it in my feed and I thought (30:47):
undefined
Ejaaz:
okay maybe I'll give this a go and I kind of like went straight to open rather than just (30:51):
undefined
Ejaaz:
almost gauge the interest from a wider set of AI users. And I saw that it was skyrocketing, right? (30:55):
undefined
Ejaaz:
And then I saw that Quen dropped their models last week. (31:03):
undefined
Ejaaz:
And again, I came to Open Router and it preceded the trend, right? (31:07):
undefined
Ejaaz:
People had already started using it. So I love how you describe Open Router (31:11):
undefined
Ejaaz:
as this kind of like prophetic orb, (31:14):
undefined
Ejaaz:
basically, where the enthusiasts and the community itself can kind of like front (31:18):
undefined
Ejaaz:
run very popular trends. And I think that's a very powerful moat. (31:22):
undefined
Ejaaz:
And kind of on this path, Alex, I noticed that a lot of these major model providers (31:26):
undefined
Ejaaz:
see the value in this, right? (31:33):
undefined
Ejaaz:
So if I'm not mistaken, OpenAI kind of like used your platform to kind of secretly (31:35):
undefined
Ejaaz:
launch their Frontier model before they officially launched it, right? (31:41):
undefined
Ejaaz:
Can you walk us through, you know, how that comes about and more importantly, (31:46):
undefined
Ejaaz:
why they want to do that and why they chose OpenRoddy to do that? (31:51):
undefined
Alex:
Uh open ai will sometimes (31:54):
undefined
Alex:
give uh early access (31:58):
undefined
Alex:
to their to models to some of their customers for (32:01):
undefined
Alex:
testing and we asked them if they (32:04):
undefined
Alex:
wanted to try a stealth model with us which we had never done before um it involved (32:08):
undefined
Alex:
like launching it as under another name and seeing how users respond to it without (32:14):
undefined
Alex:
having any bias or sort of inclination for against the model at the onset. (32:21):
undefined
Alex:
And it would be like a new way of testing it and a new way of... (32:30):
undefined
Alex:
It was like an experiment for both us and them. (32:35):
undefined
Alex:
And they generously decided to take the leap of faith and try it. And we... (32:38):
undefined
Alex:
Launched gpt 4.1 with (32:48):
undefined
Alex:
them at and we called it quasar alpha and (32:51):
undefined
Alex:
it was a million uh (32:55):
undefined
Alex:
token context length model opening us first very (32:59):
undefined
Alex:
very long context model and it was also optimized (33:02):
undefined
Alex:
for coding and the incredible (33:07):
undefined
Alex:
there were a couple incredible things that happened first (33:11):
undefined
Alex:
we have this community uh of benchmarkers (33:14):
undefined
Alex:
that run open source benchmarks and we give (33:18):
undefined
Alex:
a lot of them grants to help fund the benchmarks (33:20):
undefined
Alex:
grants of open router tokens they'll just run the (33:23):
undefined
Alex:
suite of tests against all the models and some of them are very creative like (33:27):
undefined
Alex:
there's one that tests uh like the ability to generate fiction there's one that (33:31):
undefined
Alex:
tests um like how like whether it can make a 3d object project in Minecraft called MCBench. (33:37):
undefined
Alex:
There are a few that test different types of coding proficiency. (33:45):
undefined
Alex:
There's one that just focuses on how good it is at Ruby, because Ruby is, (33:50):
undefined
Alex:
turns out a lot of the models are not great at Ruby. (33:54):
undefined
Alex:
There are a lot of like languages that all the models are pretty bad at. (33:56):
undefined
Alex:
And so we have this like long tail of very niche benchmarks, (34:00):
undefined
Alex:
And all the benchmarkers ran, you know, for free their benchmarks on Quasar (34:04):
undefined
Alex:
Alpha and found pretty incredible results for most of them. (34:11):
undefined
Alex:
And so the model got like, you know, OpenAI got this feedback in real time. (34:16):
undefined
Alex:
We kind of like helped them find it. (34:21):
undefined
Alex:
And they made another snapshot, which we launched as Optimus Alpha. (34:25):
undefined
Alex:
And they could compare the feedback that they got from the two snapshots. (34:32):
undefined
Alex:
Um, and, and then they, and then like two weeks later, they launched GPT 4.1 live for everybody. (34:36):
undefined
Alex:
So it was like, uh, uh, was it an experiment for us? (34:43):
undefined
Alex:
And, and we've done it, um, again since, uh, with, uh, another model provider (34:48):
undefined
Alex:
that, uh, that's still working on it. (34:54):
undefined
Alex:
Um, and it, and it's kind of like a cool way of learning of like crowdsourcing, (34:58):
undefined
Alex:
uh, benchmarks that you wouldn't have expected. and also getting unbiased community sentiment. (35:03):
undefined
Josh:
That's great. So now when we see a new model pop up and we want to test GPT-5, (35:09):
undefined
Josh:
we know where to come to to try it early. (35:13):
undefined
Josh:
We'll see because rumor is it's coming soon. So we'll be, we're on your watch list. (35:16):
undefined
Josh:
But having, I do want to ask you about open source versus closed source because (35:20):
undefined
Josh:
this has been an important thing for us. We talk about this a lot. (35:23):
undefined
Josh:
You have a ton of data on this. (35:26):
undefined
Josh:
I'm looking at the leaderboards there. There are open source models that are (35:28):
undefined
Josh:
doing very well, closed source. (35:30):
undefined
Josh:
What are your takes in general? How do you feel about open source versus closed (35:32):
undefined
Josh:
source models, particularly around how you serve them to users? (35:36):
undefined
Alex:
Both models, both types of models have supply problems, but the supply problems are very different. (35:40):
undefined
Alex:
Typically, what we see with closed source models is that there's there's very (35:47):
undefined
Alex:
few suppliers, usually just one or two. (35:50):
undefined
Alex:
Like with Grok, for example, there's Grok Direct and there's Azure. (35:54):
undefined
Alex:
Um with anthropic there's anthropic direct there's google vertex there's aws (35:58):
undefined
Alex:
bedrock um and then we also like deploy it in different regions like we have (36:04):
undefined
Alex:
an eu deployment um for customers who'd like only want their data like to stay in the eu, (36:08):
undefined
Alex:
and uh and we do custom deployments for (36:15):
undefined
Alex:
the for the closed source models too to just kind of guarantee good (36:18):
undefined
Alex:
throughput high and high rate limits for people um (36:21):
undefined
Alex:
but uh the (36:25):
undefined
Alex:
like a tricky part is that like the the demand usually the like the closed source (36:29):
undefined
Alex:
malls are doing most of the tokens on open router um it's it's dominant you (36:39):
undefined
Alex:
know it's probably 80-ish 70 to 80 percent closed source tokens today. (36:44):
undefined
Alex:
But the open source models have a much more fragmented supply, like cell supply. (36:49):
undefined
Alex:
Side order book um and and like (36:58):
undefined
Alex:
the rate limits for each provider is (37:02):
undefined
Alex:
like a like less stable on average um it (37:05):
undefined
Alex:
usually takes a while for the hyperscalers to serve a (37:09):
undefined
Alex:
new closed source a new open source model um so we so the load balancing work (37:12):
undefined
Alex:
that we do on um open source models tends to be a lot more valuable the load (37:18):
undefined
Alex:
balancing work that we do for closed source models tends to be very focused (37:24):
undefined
Alex:
on caching and feature awareness, (37:28):
undefined
Alex:
making sure you're getting clean cache hits and only transitioning over to new (37:31):
undefined
Alex:
providers when your cache is expired. (37:36):
undefined
Alex:
For open source models, there's way less caching. Very, very few open source (37:39):
undefined
Alex:
models implement caching. (37:45):
undefined
Alex:
And so switching between providers becomes more common. and (37:48):
undefined
Alex:
uh like we we also track a (37:52):
undefined
Alex:
lot of quality differences between the the open (37:55):
undefined
Alex:
source providers some of them will deploy at lower (37:58):
undefined
Alex:
quantization levels which means like it's kind of like a way of compressing (38:00):
undefined
Alex:
the model um generally doesn't have an impact on the quality of the output uh (38:04):
undefined
Alex:
but and yet we still see some odd things from some of the open source providers. (38:11):
undefined
Alex:
And so we run tests internally to detect those outputs. And we're building up (38:18):
undefined
Alex:
a lot more muscle here soon. (38:25):
undefined
Alex:
So that like, they get pulled out of the routing lane and don't affect anyone. (38:27):
undefined
Josh:
So closed source accounts for 80% or something like that, a very large amount. (38:33):
undefined
Josh:
Do you see that changing? (38:36):
undefined
Josh:
Because that post we just had, it's at nine out of the 10 fastest growing LLMs (38:38):
undefined
Josh:
last week, they were open source. (38:41):
undefined
Josh:
And every time it seems like China comes out with another model, (38:43):
undefined
Josh:
it was Kimmy K2 a week or two ago, it kind of really pushes the frontier of open source forward. (38:47):
undefined
Josh:
And the rate of acceleration of open source seems to be as fast, (38:54):
undefined
Josh:
if not faster than closed source, where it's just, it's making these improvements very quickly. (38:58):
undefined
Josh:
It has the benefit of being able to compound in speed because it's open source (39:02):
undefined
Josh:
and everyone can contribute. (39:06):
undefined
Josh:
Do you think that starts to change where the percentage of tokens you're issuing (39:08):
undefined
Josh:
are from open source models versus closed source? (39:11):
undefined
Josh:
Or do you continue to see a trend where it's going to be Google, (39:14):
undefined
Josh:
it's going to be OpenAI that are serving a majority of these tokens to users? (39:17):
undefined
Alex:
In the short term, we're likely to see open source models continue to dominate (39:20):
undefined
Alex:
the fastest growing model category on OpenRouter. (39:25):
undefined
Alex:
And the reason for that is that a lot of users who come for a closed source (39:29):
undefined
Alex:
model, but then decide they want to optimize later, (39:36):
undefined
Alex:
either they want to save on costs or try out a new model that's supposed to (39:41):
undefined
Alex:
be a little bit better in some direction that their app cares about or their use case cares about, (39:49):
undefined
Alex:
then they leave the closed source model and go to an open source model. (39:55):
undefined
Alex:
So open source tends to be like a last mile optimization thing, (39:58):
undefined
Alex:
making a big generalization because the reverse can happen too. (40:03):
undefined
Alex:
And so because it's a last mile optimization thing, (40:08):
undefined
Alex:
the jump from this model is not being used at all to this model is really being (40:12):
undefined
Alex:
used by a couple of people who have (40:17):
undefined
Alex:
left Claude 4 and want to try some new coding use case will be bigger. (40:19):
undefined
Alex:
Than the closed-source models, which start at a really high base and don't have (40:29):
undefined
Alex:
growth quite as dramatic. (40:34):
undefined
Alex:
So the other part of your question, though, was whether there's going to be like a flippening of. (40:36):
undefined
Josh:
Close or some sort of like chipping it away at that monopoly of close source tokens. (40:43):
undefined
Alex:
It's hard to predict these things because, you know, (40:49):
undefined
Alex:
I think like the the biggest problem today with open source models is that the (40:52):
undefined
Alex:
incentives are not as strong like the model lab and the model provider. (40:58):
undefined
Alex:
Um they've you know they're they're (41:05):
undefined
Alex:
sort of established incentives for how to (41:08):
undefined
Alex:
grow as a company and attract good high quality um ai talent and um and giving (41:10):
undefined
Alex:
the model weights away impairs those incentives now like we might see yeah this (41:19):
undefined
Alex:
is where we might see like decentralized providers, (41:27):
undefined
Alex:
helping in the future. (41:30):
undefined
Alex:
A way for like, (41:33):
undefined
Alex:
uh you know like a really good incentive scheme that (41:36):
undefined
Alex:
like allows high quality talent (41:40):
undefined
Alex:
to work on an open source model um (41:43):
undefined
Alex:
that remains open weights at least uh like could fix this i like i you know (41:46):
undefined
Alex:
i stay pretty i try to stay close to the decentralized providers um and like (41:54):
undefined
Alex:
learn a lot from them there's some like cool on the provider side on like on (41:58):
undefined
Alex:
running inference i I think there's some really cool incentive schemes being worked on. (42:02):
undefined
Alex:
But on actually developing the models themselves, I haven't seen too much, unfortunately. (42:07):
undefined
Alex:
So I think if we see one, flipping in the radar. And until we do, I personally doubt it. (42:12):
undefined
Josh:
TBD, do you have personal takes on how you feel about open source versus closed source? (42:20):
undefined
Josh:
Because this has been a huge topic we've been debating too. It's just the ethical (42:24):
undefined
Josh:
concerns around alignment and closed source models versus open source. (42:28):
undefined
Josh:
When you look at the competitors, China, generally speaking, (42:32):
undefined
Josh:
is associated with open source, whereas the United States is generally associated with closed source. (42:35):
undefined
Josh:
And we saw Llama and Meta release the open source models, but now they're raising (42:39):
undefined
Josh:
a ton of money to pay a lot of employees a lot of money to probably develop a closed source model. (42:45):
undefined
Josh:
So it seems like the trends are kind of split between US and China. (42:49):
undefined
Josh:
And I'm curious if you have any personal takes, even outside of OpenRouter, (42:53):
undefined
Josh:
of which you think serves better for the long term outlook on, (42:55):
undefined
Josh:
I mean, the position of the United States or just the general safety and alignment (43:00):
undefined
Josh:
conversation around AI? (43:05):
undefined
Alex:
I mean, like a very simple fundamental difference between the two is that an (43:06):
undefined
Alex:
innovation in open source models can be copied more quickly than an innovation (43:12):
undefined
Alex:
in closed source models. (43:17):
undefined
Alex:
So in terms of velocity and like how far ahead one is over the other, (43:20):
undefined
Alex:
that is like a massive structural difference. (43:26):
undefined
Alex:
That means that closed source models should be theoretically always ahead until (43:29):
undefined
Alex:
a really interesting incentive scheme develops, like I mentioned before. (43:37):
undefined
Alex:
Uh, I think, and I think that's, you know, I don't see like evidence that that's (43:41):
undefined
Alex:
going to change in terms of China versus the U S. (43:47):
undefined
Alex:
Um, it's, I think it's very interesting that China has not had like a major closed source model. (43:52):
undefined
Alex:
Um and i don't really (44:01):
undefined
Alex:
see a great reason why i'm (44:04):
undefined
Alex:
not aware of any reasons that's not that's not going (44:08):
undefined
Alex:
to be going to be the case in the future um my prediction (44:11):
undefined
Alex:
is that there's going to be a closed source model from china um (44:14):
undefined
Alex:
and uh you know if uh uh you know if like it's possible that DeepSeas and Moonshot (44:18):
undefined
Alex:
and Gwen have built up really sticky talent pools. (44:29):
undefined
Alex:
But generally with talent pools, after enough years have passed, (44:36):
undefined
Alex:
people quit and go and create new companies and build new talent pools. (44:41):
undefined
Alex:
And so we should see some of that. It's not the case that the AI space has NDAs (44:49):
undefined
Alex:
or non-competes that the hedge fund space has. (44:54):
undefined
Alex:
That might happen in the future too. But assuming that the current non-compete (45:00):
undefined
Alex:
culture continues, there should be more companies that pop up in China over time. (45:04):
undefined
Alex:
And I'm betting that some of them will be closed source. (45:11):
undefined
Alex:
And my guess is that the two nations will start to look more similar. (45:13):
undefined
Ejaaz:
Yeah, I guess that's why you have Zuck dishing out 300 mil to a billion dollar (45:18):
undefined
Ejaaz:
salary offers to a bunch of these guys, right? (45:24):
undefined
Ejaaz:
One more question on China versus the US. I kind of agree with you. (45:27):
undefined
Ejaaz:
I didn't really expect China to be the one to lead open source anything, (45:31):
undefined
Ejaaz:
let alone the most important technology of our time. (45:37):
undefined
Ejaaz:
Do you think is their secret source to building these models, Alex? (45:41):
undefined
Ejaaz:
And I know this might be out of the forte of (45:44):
undefined
Ejaaz:
open router specifically but as someone who has studied this technology for (45:48):
undefined
Ejaaz:
a while now i'm struggling to figure out you know what advantage they had you (45:52):
undefined
Ejaaz:
know they're discovering all these new techniques and maybe the simple answer (45:59):
undefined
Ejaaz:
is like constraints right they don't have access to all of (46:03):
undefined
Ejaaz:
nvidia's chips they don't have access to infinite compute so then maybe they're (46:06):
undefined
Ejaaz:
forced to kind of like figure out other ways around the same kinds of problems (46:11):
undefined
Ejaaz:
that western companies are focused on But it's pretty clear that America, with all its funding, (46:14):
undefined
Ejaaz:
hasn't been able to make these frontier breakthroughs. (46:20):
undefined
Ejaaz:
So I'm curious whether you are aware of or know some kind of technical moat (46:23):
undefined
Ejaaz:
that Chinese AI researchers or these AI teams that are featuring on Open Rata (46:29):
undefined
Ejaaz:
day in and day out have over the U.S.? (46:33):
undefined
Alex:
Well, I don't know. (46:41):
undefined
Alex:
There are certainly some that they've come up with that like DeepSeek had a (46:46):
undefined
Alex:
lot of very cool inference innovations that they published in their paper. (46:52):
undefined
Alex:
But a lot of what they published in the original R1 paper were things that OpenAI (46:58):
undefined
Alex:
had done independently themselves many months before. (47:04):
undefined
Alex:
So uh i like (47:07):
undefined
Alex:
on the inference side and on (47:11):
undefined
Alex:
uh some of the model side i think like deep seek we we (47:14):
undefined
Alex:
had talked to their team for years before r1 came (47:18):
undefined
Alex:
out they had many models before that and (47:21):
undefined
Alex:
they were always like a pretty sharp optimum like (47:24):
undefined
Alex:
team for doing inference um like they (47:27):
undefined
Alex:
came up with like the best user experience for caching prompts (47:30):
undefined
Alex:
long before deep cpr1 came out and they had very good pricing um they uh they (47:33):
undefined
Alex:
were just they were like you know by far the the strongest chinese team um that (47:41):
undefined
Alex:
we were aware of uh well before that happened and so i'm guessing there was like some talent. (47:47):
undefined
Alex:
Uh accumulation that they were working on in china (47:53):
undefined
Alex:
for people who wanted to stay in china and yeah that's (47:56):
undefined
Alex:
that's a huge advantage like american companies are obviously not (48:00):
undefined
Alex:
doing that there's a duck is very on (48:02):
undefined
Alex:
point that a lot of this is just based on talent (48:06):
undefined
Alex:
um there are a lot of (48:09):
undefined
Alex:
ai is open and out there and just like and (48:12):
undefined
Alex:
very composable like a big tree of knowledge (48:16):
undefined
Alex:
there's a paper that comes out and it cites like (48:19):
undefined
Alex:
20 other papers and you can go and read all (48:22):
undefined
Alex:
of the cited papers and then you like have kind of (48:25):
undefined
Alex:
a basis for understanding the paper but you really have to (48:28):
undefined
Alex:
go one level deeper and read all the cited papers two levels (48:30):
undefined
Alex:
down to really understand what's going on and it's. (48:33):
undefined
Alex:
Just that no very few people can do that um and (48:37):
undefined
Alex:
it takes like a lot of years of experience to like actually (48:40):
undefined
Alex:
apply that knowledge and learn all these (48:44):
undefined
Alex:
things that have not been written in any paper at all and uh (48:46):
undefined
Alex:
and there's just there's just such such it (48:50):
undefined
Alex:
like a small number of people um who can (48:53):
undefined
Alex:
really lead research on all the different dimensions that (48:56):
undefined
Alex:
go on to making a model and uh um and (48:59):
undefined
Alex:
and like the the border between china and the u.s is (49:03):
undefined
Alex:
is pretty defined you have to leave china move to the u.s (49:06):
undefined
Alex:
and really establish yourself here um so (49:09):
undefined
Alex:
i do think there's like country arbitrage there's like (49:13):
undefined
Alex:
there's you know the head the hedge fund background arbitrage there's uh there's (49:16):
undefined
Alex:
there's hardware arbitrage like there's like a ton of hardware that's only available (49:22):
undefined
Alex:
in china but not here vice versa that creates an opportunity um and this this (49:25):
undefined
Alex:
will just continue to happen. (49:32):
undefined
Ejaaz:
Yeah, I think this arbitrage is fascinating. (49:33):
undefined
Ejaaz:
I read somewhere that there's probably less than 200 or 250 researchers in the (49:37):
undefined
Ejaaz:
world that are worthy of working at some of these frontier AI model labs. (49:44):
undefined
Ejaaz:
And I looked into some of the backgrounds of the team behind Kimi K2, (49:49):
undefined
Ejaaz:
which is this recent open source model out of China, which broke all these crazy rankings. (49:54):
undefined
Ejaaz:
I think it was like a trillion parameter model or something crazy like that. (50:01):
undefined
Ejaaz:
And a lot of them worked at some of the top American tech companies. (50:05):
undefined
Ejaaz:
And they all graduated from this one university in China. (50:08):
undefined
Ejaaz:
I think it's Tsinghua, which apparently is like, you know, the Harvard of AI (50:11):
undefined
Ejaaz:
in China, right? So pretty crazy. (50:16):
undefined
Ejaaz:
But Alex, I wanted to shift the focus of the conversation to a point that you (50:19):
undefined
Ejaaz:
brought up earlier in this episode, which is around data. (50:25):
undefined
Ejaaz:
Okay, so here's the context that like Josh and I have spoken about this at length, right? (50:29):
undefined
Ejaaz:
We are obsessed with this feature on OpenAI, which is memory, right? (50:35):
undefined
Ejaaz:
And I know a lot of the other memory, sorry, a lot of the other AI models have memory as well. (50:39):
undefined
Ejaaz:
But the reason why we love it so much is I feel like the model knows me, Alex. (50:44):
undefined
Ejaaz:
I feel like it knows everything about me. It can personally curate any of my prompt. (50:49):
undefined
Ejaaz:
It just gets me. It knows what I want and it just serves up to me in a platter (50:54):
undefined
Ejaaz:
and off I go, you know, doing my thing. (50:59):
undefined
Ejaaz:
Now, Open Router sits on top of like kind of like the query layer, right? (51:02):
undefined
Ejaaz:
So you have all these people writing all these weird and wonderful prompts and (51:07):
undefined
Ejaaz:
kind of routing it through on towards like different AI models. (51:12):
undefined
Ejaaz:
You hold all of that data or maybe you have access to all of that data. (51:18):
undefined
Ejaaz:
And I know you have something called private chat as well, where you don't have access to it. (51:21):
undefined
Ejaaz:
Talk to me about like what OpenRouter and what you guys are thinking about doing (51:25):
undefined
Ejaaz:
with this data, because presumably, (51:29):
undefined
Ejaaz:
or in my opinion, you guys have actually the best mode, arguably better than (51:31):
undefined
Ejaaz:
ChatGPT, because you have all these different types of prompts coming from all (51:36):
undefined
Ejaaz:
these different types of users for all these different types of models. (51:40):
undefined
Ejaaz:
So theoretically, you could spin up some of the most personal AI models for (51:43):
undefined
Ejaaz:
each individual user if you wanted to. (51:47):
undefined
Ejaaz:
Do I have that correct? Or am I, you know, speaking crazy? (51:49):
undefined
Alex:
No, that's true. No, it's something we're thinking about. (51:54):
undefined
Alex:
By default, your prompts are not logged at all. (52:00):
undefined
Alex:
We don't have prompts or completions for new users by default. (52:05):
undefined
Alex:
You have to toggle it on in settings. (52:09):
undefined
Alex:
But the result, a lot of people do toggle it on. And as a result, (52:15):
undefined
Alex:
I think we have by far the largest multi-model prompt data set. (52:21):
undefined
Alex:
Uh, but what we've done today, we've barely done anything with it. (52:26):
undefined
Alex:
We classify a tiny, tiny, tiny subset of it. And that's what you see in the rankings page. (52:32):
undefined
Alex:
Um, but, uh, what it could be done on like a per account level is really, (52:37):
undefined
Alex:
um, like three main things. (52:43):
undefined
Alex:
One memory right out of the box. You can, you can get this today by like combining (52:45):
undefined
Alex:
open router with like a memory as a service. We've got a couple of companies (52:51):
undefined
Alex:
that do this, like Memzero and SuperMemory. (52:56):
undefined
Alex:
And we can partner with one of those companies or do something similar and just (52:59):
undefined
Alex:
provide a lot of distribution. (53:03):
undefined
Alex:
And that basically gets you a chat GPT as a service where it feels like the (53:05):
undefined
Alex:
model really knows you and the right context gets added to your prompt. (53:09):
undefined
Alex:
The other things that we can do are help you select the right model more intelligently. (53:16):
undefined
Alex:
There's a lot of models where there's like a super clear, like migration decision that needs to be made. (53:25):
undefined
Alex:
And, and we can just see this very clearly in the data. (53:33):
undefined
Alex:
But we right now we just like, you know, we have like a channel or like some (53:36):
undefined
Alex:
kind of communication channel open with the customer, we can just tell them (53:41):
undefined
Alex:
like, hey, and we know you're using this model a ton. (53:44):
undefined
Alex:
It's been deprecated. This model is significantly better. you (53:48):
undefined
Alex:
should move this kind of workload over to it or like (53:51):
undefined
Alex:
this workload you'll get way better pricing if you do this um (53:54):
undefined
Alex:
and and that's basically like that's the (53:58):
undefined
Alex:
only sort of guidance and kind of like (54:01):
undefined
Alex:
opinionated routing we've done so far and it could (54:04):
undefined
Alex:
be a lot more intelligent a lot more out of the box a lot more (54:06):
undefined
Alex:
built into the product um and then (54:09):
undefined
Alex:
the the last thing (54:13):
undefined
Alex:
we can do i mean there's there's probably tons of (54:16):
undefined
Alex:
things we're not even thinking about um but (54:19):
undefined
Alex:
like getting really (54:22):
undefined
Alex:
really smart about how (54:26):
undefined
Alex:
models and providers are responding to prompts and (54:30):
undefined
Alex:
uh showing you just the really coolest (54:33):
undefined
Alex:
data just like telling you (54:37):
undefined
Alex:
what kinds of of prompts um are (54:40):
undefined
Alex:
going to which models and how those models are replying and (54:44):
undefined
Alex:
just like characterizing the reply in all kinds of interesting ways (54:48):
undefined
Alex:
like did the model refuse to answer what's the refusal rate (54:51):
undefined
Alex:
did the model um did the. (54:54):
undefined
Alex:
Model like successfully make a tool call or did it decide to (54:57):
undefined
Alex:
ignore all the tools that you passed in that's a huge one (55:00):
undefined
Alex:
um did the model like pay (55:02):
undefined
Alex:
attention to its context did uh you know did what did did some kind of truncation (55:06):
undefined
Alex:
happening happen before you sent it to the model So there's all kinds of like (55:12):
undefined
Alex:
edge cases that cause developers apps to just get dumber and they're all detectable. (55:16):
undefined
Ejaaz:
I'm so happy you said that because I have this kind of like hot take, (55:25):
undefined
Ejaaz:
but maybe not so hot take, which is I actually think all the Frontier models (55:31):
undefined
Ejaaz:
right now are good enough to do the craziest stuff ever for each user. (55:35):
undefined
Ejaaz:
But we just haven't been able to unlock it because it just doesn't have the context. (55:40):
undefined
Ejaaz:
Sure, you can attach it to a bunch of different tools and stuff, (55:44):
undefined
Ejaaz:
but if it doesn't know when to use the tool or how to process a certain prompt (55:48):
undefined
Ejaaz:
or if the users themselves don't know how to read (55:53):
undefined
Ejaaz:
the output of the AI model themselves, like you just said, we need some kind (55:57):
undefined
Ejaaz:
of analytics into all of this, (56:01):
undefined
Ejaaz:
then we're just kind of walking around like headless chickens almost. (56:03):
undefined
Ejaaz:
So I'm really happy that you said that. One other thing that I wanted to get (56:07):
undefined
Ejaaz:
your take on on the data side of things is, I just think this whole concept (56:11):
undefined
Ejaaz:
or notion of AI agents is becoming such a big trend, Alex. (56:15):
undefined
Ejaaz:
And I noticed a lot of Frontier Model Labs release new models that kind of spin (56:21):
undefined
Ejaaz:
up several instances of their AI model. (56:28):
undefined
Ejaaz:
And they're tasked with a specific role, right? (56:30):
undefined
Ejaaz:
Okay, you're going to do the research. You're going to do the orchestrating. (56:33):
undefined
Ejaaz:
You're going to look online via a browser, blah, blah, blah, (56:37):
undefined
Ejaaz:
blah, blah. And then they coalesce together at the end of that little search (56:40):
undefined
Ejaaz:
and refine their answer and then present it to someone, right? (56:44):
undefined
Ejaaz:
You know, Grok4 does this, Claude does this, and a few other models. (56:48):
undefined
Ejaaz:
I feel like with this data that you're describing, OpenRouter could be or could (56:52):
undefined
Ejaaz:
offer that as a feature, right? (56:58):
undefined
Ejaaz:
Which is essentially, you can now have super intuitive, context-rich agents (57:00):
undefined
Ejaaz:
that can do a lot more than just talk to you or answer your prompts. (57:05):
undefined
Ejaaz:
But they could probably do a bunch of other actions for you. (57:08):
undefined
Ejaaz:
Is that a fair take, or is that something that maybe might be out of the realm of open router? (57:11):
undefined
Alex:
Our strategy is to be the best inference layer for agents. (57:18):
undefined
Alex:
And what I think developers want... (57:23):
undefined
Alex:
Is control over how their agents work. (57:30):
undefined
Alex:
And our developers at least want to use us as a single pane of glass for doing (57:35):
undefined
Alex:
inference, but they want to see and control the way an agent looks. (57:42):
undefined
Alex:
An agent is basically just something (57:48):
undefined
Alex:
that is doing inference in a loop and controlling the direction it goes. (57:52):
undefined
Alex:
So um what what (57:57):
undefined
Alex:
we want to do is just like build incredible docs (58:01):
undefined
Alex:
really good primitives that make that easy (58:04):
undefined
Alex:
to do so that you know like i think like (58:08):
undefined
Alex:
a lot of our developers are just people building agents and so (58:11):
undefined
Alex:
what they want is they want the primitives to (58:14):
undefined
Alex:
be solved so that they can just keep creating new (58:17):
undefined
Alex:
versions and new ideas um without worrying (58:20):
undefined
Alex:
about like you know re-implementing tool calling over (58:24):
undefined
Alex:
and over again and um and and (58:27):
undefined
Alex:
and so like at least for this is like a it's it's (58:31):
undefined
Alex:
a tough problem given how many models there's like a new model or provider every (58:34):
undefined
Alex:
day and uh and people actually want them and use them so uh to standardize this (58:37):
undefined
Alex:
like make make these tools like really dependable um that's kind of like where (58:44):
undefined
Alex:
we want to focus and uh so that like agent developers don't have to worry about it. (58:49):
undefined
Josh:
As we level up towards closer and closer to getting to AGI beyond, (58:54):
undefined
Josh:
I'm curious what Open Router's kind of endgame is. (58:58):
undefined
Josh:
If you have one, what is the master plan where you hope to end up? (59:00):
undefined
Josh:
Because the assumption is as these systems get more intelligent, (59:03):
undefined
Josh:
as they're able to kind of make their own decisions and choose their own tool (59:06):
undefined
Josh:
sets, what role does Open Router play in continuing to route that data through? (59:09):
undefined
Josh:
Do you have a kind of master plan, a grand vision of where you see this all heading to? (59:15):
undefined
Alex:
You're saying like as agents get better at choosing the tools that they use (59:19):
undefined
Alex:
what what becomes our role when like the agents are really good at that yes. (59:24):
undefined
Josh:
Yes and like where do you see open router fitting into the picture and what (59:31):
undefined
Josh:
would be the best case scenario for this this future of open router (59:34):
undefined
Alex:
Right now open routers bring your own tool, (59:38):
undefined
Alex:
platform um we don't have like a (59:42):
undefined
Alex:
marketplace of mcps yet uh and (59:45):
undefined
Alex:
and i i do think like a lot of the i think most of the most used tools will (59:49):
undefined
Alex:
be ones that developers configure themselves agents just work like they're given (59:55):
undefined
Alex:
access to it like i think like a holy grail for for open router is that. (01:00:00):
undefined
Alex:
The the ecosystem is going to like basically my (01:00:07):
undefined
Alex:
prediction for how the ecosystem is going to evolve is that um (01:00:12):
undefined
Alex:
all the models are going to be adding state and (01:00:16):
undefined
Alex:
other kinds of stickiness that just make you want to stick (01:00:19):
undefined
Alex:
with them so they're going to add server-side tool calls (01:00:22):
undefined
Alex:
they're going to add like um you know web search that that is stateful they're (01:00:25):
undefined
Alex:
going to add memory They're going to add all kinds of things that try to prevent (01:00:31):
undefined
Alex:
developers from leaving and increase lock-in. (01:00:35):
undefined
Alex:
And OpenRouter is doing the opposite. (01:00:42):
undefined
Alex:
We want developers to not feel vendor lock-in. (01:00:45):
undefined
Alex:
We want them to feel like they have choice and they can use the best intelligence, (01:00:49):
undefined
Alex:
even if they didn't before. (01:00:53):
undefined
Alex:
It's never too late to switch to a more intelligent model. That would be like, (01:00:56):
undefined
Alex:
you know, a good always on outcome for us. (01:01:00):
undefined
Alex:
And so what I think we'll end up doing is, is like partnering with other companies (01:01:05):
undefined
Alex:
or building the tools ourselves if we have to, so that developers don't feel stuck. (01:01:13):
undefined
Alex:
That's how I, you know, there's a lot of ways the ecosystem could evolve, (01:01:20):
undefined
Alex:
but that's how I would put it in a nutshell. (01:01:23):
undefined
Josh:
Okay, now there's another personal question that I was really curious about, (01:01:25):
undefined
Josh:
because I was also right there with you in the crypto cycle when NFTs got absolutely (01:01:29):
undefined
Josh:
huge, was a big user of OpenSea. (01:01:34):
undefined
Josh:
And it was kind of this trend that went up and then went down. (01:01:36):
undefined
Josh:
And NFTs kind of fizzled out, it wasn't as hot anymore, and AI kind of took the wind from the sails. (01:01:39):
undefined
Josh:
And it's a completely separate audience, but a similar thing where now it's (01:01:44):
undefined
Josh:
the hottest thing in the world. (01:01:48):
undefined
Josh:
And i'm curious how you see the trend continuing is this a cyclical thing that (01:01:49):
undefined
Josh:
has ups and downs or is this a one-way trajectory of more tokens every day more (01:01:54):
undefined
Josh:
ai every day is do you see it being a cyclical thing or is this a a one-way (01:01:59):
undefined
Josh:
trend towards up into the right nfts (01:02:03):
undefined
Alex:
Kind of follow uh crypto in a, (01:02:06):
undefined
Alex:
indirect way um when crypto (01:02:12):
undefined
Alex:
has ups and downs nfts generally lag a bit (01:02:15):
undefined
Alex:
but they they have similar ups and downs and um (01:02:18):
undefined
Alex:
and crypto is an extremely long-term play on like building a new financial system (01:02:22):
undefined
Alex:
and there are so many reasons that it's not going to happen overnight um and And they're like, (01:02:30):
undefined
Alex:
it's very, very entrenched reasons. (01:02:40):
undefined
Alex:
Whereas AI, there are some overnight business transformations going on. (01:02:43):
undefined
Alex:
And the reason AI, I think, moves a lot, one of the reasons that AI moves a (01:02:49):
undefined
Alex:
lot faster is it's just about making computers behave more like humans. (01:02:55):
undefined
Alex:
So if a company already works with a bunch of humans, then there's, (01:02:59):
undefined
Alex:
you know, there's some engineering that needs to be done. (01:03:05):
undefined
Alex:
There's some like thinking about how (01:03:07):
undefined
Alex:
to like scale this but (01:03:10):
undefined
Alex:
but in general i think that it's not like (01:03:14):
undefined
Alex:
after seeing what can be possible um inference (01:03:17):
undefined
Alex:
will be the fastest growing operating expense for all companies (01:03:21):
undefined
Alex:
it'll it'll be like oh we can just hire (01:03:24):
undefined
Alex:
high-performing employees at a click of a (01:03:27):
undefined
Alex:
and they they work 24 7 they (01:03:37):
undefined
Alex:
scale elastically it's like you know (01:03:41):
undefined
Alex:
it it's not that hard it's not like huge mental (01:03:44):
undefined
Alex:
model shift it's just like a huge upgrade to the way companies work today um (01:03:47):
undefined
Alex:
in most cases so it's just completely different from crypto there's there's (01:03:51):
undefined
Alex:
like other than both being you know than nfts i mean other than both being new (01:03:55):
undefined
Alex:
they're fundamentally very different changes. (01:04:00):
undefined
Ejaaz:
You're probably one of very few people in the world right now that has crazy (01:04:04):
undefined
Ejaaz:
insights to every single AI model. (01:04:11):
undefined
Ejaaz:
Definitely more than the average user, right? Like I have like three or four (01:04:13):
undefined
Ejaaz:
subscriptions right now and I think I'm a hotshot. (01:04:17):
undefined
Ejaaz:
You get access to like 400 and what is it? 57 models right now on OpenRata. (01:04:20):
undefined
Ejaaz:
So an obvious question that I have for you is (01:04:25):
undefined
Ejaaz:
I'm not going to say in the next couple of years, because everything moves way (01:04:29):
undefined
Ejaaz:
too quickly in this sector. (01:04:33):
undefined
Ejaaz:
But over the next six months, is there anything really obvious to you that should (01:04:34):
undefined
Ejaaz:
be focused on within the AI sector? (01:04:41):
undefined
Ejaaz:
Maybe it's like the way that certain models should be designed, (01:04:44):
undefined
Ejaaz:
or perhaps it's at the application layer that no one's talking about right now. (01:04:47):
undefined
Ejaaz:
Because going on from our earlier part of the conversation, you just pick these (01:04:52):
undefined
Ejaaz:
trends out really early. and I'm wondering if you see anything. (01:04:56):
undefined
Ejaaz:
It doesn't have to be open-racket related. It could just be AI related. (01:04:59):
undefined
Alex:
I've seen the models trending towards caring more about how resourceful they (01:05:03):
undefined
Alex:
are than what knowledge they have in the bank. (01:05:11):
undefined
Alex:
Not all of, I feel like a lot of the applications, I think the model labs maybe, (01:05:14):
undefined
Alex:
a lot of them, I don't know how many of them really deeply believe that, (01:05:21):
undefined
Alex:
but a couple of them uh talk about it and i don't think it's really hit the (01:05:24):
undefined
Alex:
application space yet um because people will will ask chat gpt things and if (01:05:31):
undefined
Alex:
the knowledge is wrong they think the model is stupid, (01:05:37):
undefined
Alex:
and that's just kind of a bad way of evaluating a model um (01:05:40):
undefined
Alex:
like whatever knowledge a person has whatever (01:05:44):
undefined
Alex:
a person like where calls happen at a certain time like (01:05:48):
undefined
Alex:
does not it's not a proxy for how smart they are um (01:05:51):
undefined
Alex:
like the the intelligence and usefulness of a model (01:05:55):
undefined
Alex:
is going to trend towards how good it is at using tools and (01:05:57):
undefined
Alex:
uh and and how good it is at like paying (01:06:01):
undefined
Alex:
attention to its context of a long long long long context and so it's like it's (01:06:05):
undefined
Alex:
it's total memory capacity and accuracy um so i think those two things need (01:06:11):
undefined
Alex:
to be like emphasized more um the. (01:06:17):
undefined
Alex:
Like it might be that that models pull all (01:06:23):
undefined
Alex:
of their knowledge from like online databases (01:06:26):
undefined
Alex:
from like real-time uh scraped (01:06:30):
undefined
Alex:
index indices of the web along with a (01:06:34):
undefined
Alex:
ton of real-time updating data sources um and (01:06:36):
undefined
Alex:
they're never they're always kind of like relying on some some sort of database (01:06:40):
undefined
Alex:
for knowledge but relying on their reasoning process for for tool calling you (01:06:44):
undefined
Alex:
know like we we put it We spend probably the plurality of our time every week (01:06:49):
undefined
Alex:
on tool calling and figuring out how to make it work really well. (01:06:56):
undefined
Alex:
Humans, the big difference between us and animals is that we're tool users and tool builders. (01:06:59):
undefined
Alex:
And that's where human acceleration and innovation has happened. (01:07:06):
undefined
Alex:
So how do we get models creating tools and using tools very, (01:07:11):
undefined
Alex:
very effectively? there's very little, (01:07:19):
undefined
Alex:
There are very few benchmarks. There's very little priority. (01:07:22):
undefined
Alex:
There's the Tau Bench for measuring how good a model is at tool calling. (01:07:25):
undefined
Alex:
But there's, and there's like maybe a few others. (01:07:29):
undefined
Alex:
There's Swee Bench for measuring how good a model is at multi-turn programming tasks. (01:07:32):
undefined
Alex:
It's very, very hard to run, though. It costs like, you know, (01:07:39):
undefined
Alex:
for Sonnet, it could cost like $1,000 to run it. (01:07:43):
undefined
Alex:
And it's like the user experience for kind of like evaluating the real intelligence (01:07:48):
undefined
Alex:
of these models is not good. (01:07:54):
undefined
Alex:
And so like I love, as much as we don't have benchmarks listed on OpenRouter (01:07:56):
undefined
Alex:
today, I love benchmarks. (01:08:00):
undefined
Alex:
And I think like the app ecosystem and like developer ecosystem should spend (01:08:02):
undefined
Alex:
a lot more time making very cool and interesting ones. (01:08:07):
undefined
Alex:
Also, we will give credit grants for all the best ones. So I highly encourage it. (01:08:10):
undefined
Ejaaz:
Well, Alex, thank you for your time today. I think we're coming up on a close (01:08:17):
undefined
Ejaaz:
now. That was a fascinating conversation, man. (01:08:22):
undefined
Ejaaz:
And I think your entire journey from just non-AI stuff, so OpenSea all the way (01:08:25):
undefined
Ejaaz:
to OpenRouter has just been a great indicator of where these technologies are (01:08:31):
undefined
Ejaaz:
progressing and more importantly, where we're going to end up. (01:08:37):
undefined
Ejaaz:
I'm incredibly excited to see where OpenRatter goes beyond just prompt routing. (01:08:39):
undefined
Ejaaz:
I think some of the stuff you spoke about on the data side of things is going (01:08:45):
undefined
Ejaaz:
to be fascinating and arguably one of your bigger features. So I'm excited for future releases. (01:08:49):
undefined
Ejaaz:
And as Josh said earlier, if GPT-5 is releasing through your platform first, (01:08:53):
undefined
Ejaaz:
please give us some credits. We would love to use it. (01:08:59):
undefined
Ejaaz:
But for the listeners of this show, as you know, we're trying to bring on the (01:09:03):
undefined
Ejaaz:
most interesting people to chat about AI and Frontier Tech. We hope you enjoyed this episode. (01:09:07):
undefined
Ejaaz:
And as always, please like, subscribe, and share it with any of your friends (01:09:14):
undefined
Ejaaz:
who would find this interesting. And we'll see you on the next one. Thanks, folks. (01:09:18):
undefined