All Episodes

July 17, 2025 • 47 mins

Jakob Uszkoreit is the CEO and co-founder of Inceptive, a biotech start-up. He’s also a co-author of “Attention is All You Need,” the paper that created transformer models. Today, transformers power chatbots like ChatGPT and Claude. They’ve also led to breakthroughs in everything from generating images to predicting the structure of proteins.

On today’s show, Jakob talks about the invention of transformer models. And he discusses how he’s using those models to try to invent new kinds of medicine, with a particular focus on RNA.

See omnystudio.com/listener for privacy information.

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:15):
Pushkin.

Speaker 2 (00:20):
If I were going to pick one paper from the
past decade that had the biggest impact on the world,
I would choose one called Attention Is All You Need,
published in twenty seventeen. That paper basically invented transformer models.
You've almost certainly used a transformer model if you have
used chat GPT or Gemini or Claude or deep Seek.

(00:42):
In fact, the tea in chat GPT stands for a transformer,
and transformer models have turned out to be wildly useful,
not just at generating language, but also at everything from
generating images to predicting what proteins will look like.

Speaker 1 (00:58):
In fact, transformers.

Speaker 2 (01:00):
Are so ubiquitous and so powerful that it's easy to
forget that some guy just thought them up.

Speaker 1 (01:06):
But in fact, some guy did.

Speaker 2 (01:08):
Just think up transform and I'm talking to him today
on the show. I'm Jacob Goldstein and this is What's
Your Problem, the show where I talk to people who
are trying to make technological progress. My guest today is
Yakub Uskolai. And just to be clear, Yakub was one

(01:28):
of several co authors on that transformer paper, and on
top of that, lots of other researchers were working on
related things at the same time, so a lot of
people were working on this, but the key idea did
seem to come from Yakub. Today, Yakub is the CEO
of Inceptive. That's a company that he co founded to
use AI to develop new kinds of medicine, and the

(01:51):
company is particularly focused on RNA. We talked about his
work at Inceptive in the second part of our conversation.
In the first part, we talked about his work on
transformer models. At the time he started working on the
idea for transformers, this is around a decade ago now,
there were a couple of big problems with existing language models.

(02:11):
For one thing, they were slow. They were in fact
so slow that they could not even keep up with
all the new training data that was becoming available. A
second problem, they struggled with what are called long range dependencies.
Basically in language, that's relationships between words that are far
apart from each other in a sentence. So to start,

(02:32):
I asked Yakab for an example we could use to
discuss these problems and also how he came up with
his big idea for how to solve them. So, pick
a sentence that's going to be a good object lesson
for us.

Speaker 1 (02:44):
Okay, so we could have the frog didn't cross the
road because it was too tired. Okay, so we got
our sentence. Yep.

Speaker 2 (02:52):
How would the sort of big, powerful but slow to
train algorithm in twenty fifteen.

Speaker 1 (02:59):
Have processed that sentence? So basically it would have walked
through that sentence word by word, and so it would
walk through the sentence left to right. The frog did
not cross the road because it was too tired.

Speaker 2 (03:15):
Which is logical, which is how I would think a
system would work.

Speaker 1 (03:18):
It's more or less how we read, right, it's how
we read, but it's not necessarily how we understand. Uh huh.
That is actually one of the integral I would say
for what we then how we then went about trying
to speak us all up?

Speaker 2 (03:32):
Well, I love that. I want you to say more
about it. When you say it's not how we understand,
what do you mean?

Speaker 1 (03:38):
So? On one hand, right linearity of time forces us
to almost always feel that we're communicating language in order
and just linearly. It actually turns out that that's not
really how we read, not even in terms of our secades,
in terms of our em movements. We actually do jump

(03:59):
back and forth quite a bit while reading, and if
you look at conversations, you also have highly nonlinear elements
where there's repetition, there's reference, there's basically different flavors of interruption.
But sure, by and large right, we would say we
certainly right them left to right right. So if you
write a proper text, you don't write it as you

(04:20):
would read it, and you also don't write it as
you would talk about it. You do write it in
one linear order. Now, as we read this and as
we understand this, we actually form groups of words that
then form meaning. Right. So an example of that is
you know adjective noun, right, it's or say, in this

(04:42):
case an article noun, it's not a frog, it's the frog. Right.
We could have also said it's the green frog or
the lazy frog.

Speaker 2 (04:50):
Right. Language has a structure, right, and there things can
modify other things, and things can modify the modifiers exactly exactly.

Speaker 1 (04:58):
But the interesting thing now is that structure in as
a as a tree structured clean hierarchy, only tells you
half the story. There's so many exceptions where statistical dependencies,
where modification actually happens at a distance.

Speaker 2 (05:14):
So okay, So just to bring this back to your
sample sentence, The frog didn't cross the road because it
was too tired. That word it is actually quite far
from the word frog. And if you're an AI going
from left to right, you may well get confused there, right,
You may think it refers to road instead of to frog.

(05:34):
So this is one of the problems you were trying
to solve. And then the other one you were mentioning before,
which is these models were just slow because after each word,
the model just recalculates what everything means, and that just
takes a long time.

Speaker 1 (05:48):
They can't go fast enough exactly. It takes a long time,
and it doesn't play to the strengths of the computers,
of the accelerators that we're using there.

Speaker 2 (05:57):
And when you say accelerators, I know Google has their
own chips, but basically we mean GPUs.

Speaker 1 (06:02):
Now right, we mean GPUs, We mean.

Speaker 2 (06:04):
The chips that Nvidia sells. What is the nature of.

Speaker 1 (06:08):
Those particular ships. Yeah, So the nature of those particular
chips is that instead of doing a broad variety of
complex computations in sequence, they are incredibly good. They excel
at performing many, many, many simple computations in parallel. And
so what this hierarchical or semi hierrachical nature of language

(06:32):
enables you to do is instead of having, so to speak,
one place where you read the current word, you could
now imagine you actually read every You look at everything
at the same time, and you apply many simple operations
at the same time to each position in your sentence.

Speaker 2 (06:55):
Huh So this is the big idea, I just want
to because this is it, right, this is the breakthrough happening. Yes,
it's basically, what if instead of reading the sentence one
word at a time from left to right, we read
the whole thing all at once.

Speaker 1 (07:10):
All at once. And now the problem is clearly something's
got to give, right, so there's no fore lunch in
that sense. You have to now simplify what you can
do at every position when you do this all in parallel,
but you can now afford to do this a bunch
of times after another and revise it over time or
over these steps. And so instead of walking through the

(07:32):
sentence from beginning to end, whether an average sentence has
like twenty words or so average sentence in pros, instead
of walking those twenty positions, what you're doing is you're
looking at every word at the same time, but in
a simpler way. But now you can do that maybe
five or six times, revising your understanding, and that turns

(07:53):
out is faster, way faster on GPUs and because of
this hierarchical nature of language, it's also better.

Speaker 2 (08:02):
So you have this idea, and as I read the
little note on the paper, it was in fact your idea.
I know you were working with a t but the
paper credits you with the idea. So let's let's take
this idea, this basic idea of look at the whole
input sentence all at once, yep, a few times, and
apply it to our frog sentence. Give me, give me
that frog sentence again.

Speaker 1 (08:23):
The frog did not cross the road because it was
too tired. Good.

Speaker 2 (08:28):
Tired is good because that's unambiguous.

Speaker 1 (08:30):
Hot could be either one. It could be the road
or the frog, right, Hot could be hot could be
the one exactly is in fact hot could either could
actually either one and non referential and non referential because
it was too hot outside.

Speaker 2 (08:41):
Outside it could be any of three things, the weather,
or the frog or the road exactly. I love that
tired solves the problem. So your model, this new way
of doing things, how does it parse that sentence, what
does it do?

Speaker 1 (08:58):
So basically, let's look at the word it and look
at it in every single step of these you know,
say a handful of times repeated operation. Imagine you're looking
at this word it, that's the one that you are
now trying to understand better, and you now compare it
to every other word in the sense. Okay, so you

(09:18):
compare it to the to frog that did not cross
the road because two and tired, there was two and
tire and initially in the first past. Already a very
simple insight the model can fairly easily learn is that

(09:40):
it could be strongly informed by frog, by road, by nothing,
but not so by two or by the or maybe
only to a certain extent by us. But if you
want to know more about what it denotes, then it

(10:00):
could be, you know, it could be informed by by
all of these.

Speaker 2 (10:04):
And just to be clear, that sort of understanding arises
because it has trained in.

Speaker 1 (10:09):
This way on lots of data.

Speaker 2 (10:11):
It's encountering a new sentence after reading lots of other
sentences with lots of pronouns with different possible antecedents.

Speaker 1 (10:19):
Yeah, exactly, exactly. So Now the interesting thing is that
which of the two it actually refers to, doesn't depend
on only on what those other two words are. And
this is why you need these subsequent steps because so
let's talk with the first step. So what now happens

(10:39):
is that, say the model identifies frog and road could
have a lot to do with the word it. So
now you basically copy some information from both frog and
road over to it, and you don't just copy it,
you kind of transform it also on the way, but
you refine your understanding of it. And this is all learned,

(11:02):
does not given by rules or you know, in any
way pre specifying.

Speaker 2 (11:07):
Right, just by training on loge, just by training this emergency,
and so that sort of the meaning of it after
this first step is kind of influenced by both frog
and road.

Speaker 1 (11:18):
Yes, both frog and road. Okay, so now we repeat
this operation again and we now know that it is
unsure or the model basically now has this kind of superposition. Right,
it could be road, it could be frog. But now
in the next step it also looks at tired, and
somehow the model has learned that when it means something inanimate,

(11:41):
that tired is not the thing. And so maybe in
context of tired, it is more likely to refer to frog,
and now you know, well, it is more likely and
now maybe the model has figured out already, maybe needs
a bit more, a few more iterations that it is
most likely to refer to frog because of the presence

(12:04):
of tired. So it has solved the problem. But it
has solved the problem.

Speaker 2 (12:08):
So you do, you have this idea, you try it out.
There's a detail that you mentioned that's kind of fun,
and we kind of skipped it, but you mentioned that
another one of the co authors, who has also gone
on to do very big things, was about to leave
Google when you sort of want to test this idea,
and and that fact that he was about to leave
Google was actually important to the history of this idea.

Speaker 1 (12:29):
Tell me about that it was important. So this Ilia Plususian,
he was at the time that this started to gain
any kind of speed, Elia was managing a good chunk
of my organization. And the moment he really made the
decision to leave the company, he had to wait ultimately

(12:51):
for his co for his co founder, and for them
to then actually get going together in earnest and so
he had a few months where he knew and I
also knew that he was about to leave and where
you know, the right thing would of course be to
transition his team to another manager, which we did immediately,

(13:11):
but where you then suddenly was in a position of
having nothing to lose and yet quite some time left
to play with Google's resources and do cool stuff with interesting,
interesting people. And and so that's one of those moments
where suddenly your appetite for risk as a researcher just spikes, right, huh,

(13:32):
because you have, for for a few more months, you
have these resources at your disposal, you've transitioned your responsibilities.
At that stage, you're just like, Okay, let's try this
crazy shit and and and it's and that's literally in
so many ways, was was one of the integral catalysts
because that also enabled, right, this kind of mindset of

(13:55):
we're going for this now, whatever the reason. It still
you know affects other people. And so there were others
who joined that collaboration really really early on, who I
feel were much more excited a result, much more likely
to really work on this and to really give it
there all because of his you know, nothing left to lose,

(14:17):
I'm going to go for this attitude at this.

Speaker 2 (14:19):
Point, Right, was there a moment when you realized it worked.

Speaker 1 (14:23):
There were actually a few moments. And it's interesting because
on one hand, right, it's a very gradual thing, right,
And initially, actually it took us many months to get
to the point where we saw significant first signs of
life of this not just being a curiosity but really
being something that would end up being competitive. So there

(14:45):
certainly was a moment when that started. There was another
moment when we for the for the first time had
one machine translation challenge, one language pair of the W
and T task as it's called, where our score, our
model performed better than any other single model. The point
in time when I think all of us realized this

(15:07):
is special was when we not only had the best
one in one of these tasks, but in multiple and
we didn't just have the best number. We also at
that point were able to establish that we've gotten there
with about ten times less energy or training compute spend.

Speaker 2 (15:27):
Wow, So you do one tenth the work and you
get a better result.

Speaker 1 (15:31):
One tenth the work and you get a better result
not just across one specific challenge, but across multiple including
the hardest or of one of the harder ones. Right.
And then at that stage we were still improving rapidly,
and then you realize, okay, this is for real. There's
because there right, It wasn't like we it wasn't that

(15:53):
we had to squeeze those last little bits and pieces
of gain out of it. It was still improving fairly rapidly,
to the point where actually, by the time we actually
published the paper, we again reduced the computer requirements, not
quite by an entire order of magnitude, but almost right,
so it still was getting faster and better at a

(16:14):
pretty rapid rate. Wow, so we had in the paper
we had some results that were those roughly ten x
faster on eighthpus and what we demonstrated in terms of
quality on those eight GPUs by the time we actually
published the paper properly we were able to do with one.

Speaker 2 (16:29):
GPU, one GPU meaning one chip of the kind that
people by one hundred thousand of now to build a
data center exactly. So the paper actually at the end
mentions other possible uses beyond language for this technology. It
mentions images, audio, and video, I think explicitly. How much

(16:51):
were you thinking about that at the time. Was that
just like an afterthought or were you like, hey, wait
a minute, it's not just language.

Speaker 1 (16:57):
By the time it was actually published at a conference,
not just the preprint. By December, we had initial models
on other modalities on generating images. We had the first
the first at the stay. At that time they were
not performing that well yet, but you know, they were
rapidly getting better. We had the first prototypes actually of
models working on genomic data, working on protein structure. That's

(17:20):
good for shadow good for shadowing exactly. But then we
ended up for a variety of reasons, we ended up
at first focusing on applications in computer vision.

Speaker 2 (17:31):
The paper comes out, you know, you're working on these
other applications, you're presenting the paper, it's published in various forms.

Speaker 1 (17:38):
What's the response like. It was interesting because the response
built in deep learning AI circles basically between the pre
print that I think came out and I want to
say June twenty seventeen, and then the actually actual publication,
to the extent that by the time the poster session

(17:59):
happened at the conference, there was quite a crowd at
the poster so we had to be shoved out of
the out of the hall in which the poster session happened.
About security and had very hors voices by the end
of the evening, you guys were like the Beatles of
the AI conference. I wouldn't say that because we weren't

(18:23):
the Beatles, because it was really it was still very specific.

Speaker 2 (18:26):
You were more that you were more of the cool
hipster band. You were the hipster.

Speaker 1 (18:29):
Band, certainly more the cool hipster band. But it was
an interesting experience because there were some folks and including
some greats in the field, who came by and said, Wow,
this is this is cool.

Speaker 2 (18:40):
What has happened since has been wild.

Speaker 1 (18:44):
It seems wild to say the least. Yes, Is it
surprising to you? Of course, many aspects are surprising. For sure.
We definitely saw pretty early on already back in twenty eighteen,
twenty nineteen, that something really exciting was happening here. Now

(19:05):
I'm still surprised by with the advent of chat GPT,
something that didn't go way beyond those language models that
we had already seen a few years before, was suddenly
the world's fastest growing consumer product.

Speaker 2 (19:23):
Ever, right, I think ever?

Speaker 1 (19:25):
Ever? Yes?

Speaker 2 (19:27):
And by the way, GBT stands for generative pre transformer, right,
transformer is your word, that's right? So there's an interesting
I don't know, business side to this right, which is,
you were working for Google when you came up with this.
Google presumably owned the idea, had intellectual property around.

Speaker 1 (19:47):
The idea has filed many a patent.

Speaker 2 (19:50):
Was it just a choice Google made to let everybody
use it? Like when you see the fastest growing consumer
product in this year of the world not only built
on this idea, but using the name like and it's
a different company that was five years later.

Speaker 1 (20:04):
Five years later.

Speaker 2 (20:04):
But a patent's good for more than five years? Is
that a choice?

Speaker 1 (20:08):
Is that a stret dig choice? What's going on there?
So the choice to do it in the first place,
to publish it in the first place, is really based
on and and rooted in a deep conviction of Google
at the time, And I'm actually pretty sure it still
is the case that it is. Actually these developments are

(20:31):
the tide that floats all votes, that lifts.

Speaker 2 (20:33):
All votes, like a belief in progress, a belief in progress,
a good old fashioned Now.

Speaker 1 (20:40):
It's also the case that at the time, organizationally, that
specific research arm was unusually separated from the product organizations.
And the reason why Brain or in general, the deep
learning groups were more separated was in part historical, namely

(21:02):
that when they started out there were no applications and
the technology was not ready for being applied, and so
it's completely understandable and just you know a consequence of
organic developments that when this technology suddenly is on the
cusp of being incredibly impactful, you're probably still under utilizing

(21:24):
it internally and potentially also not yet treating it in
the same way as you would have maybe otherwise treated
previous trade secrets.

Speaker 2 (21:34):
For example, as it feels like this out their research project,
not like what's going to be this consumer.

Speaker 1 (21:42):
Product exactly exactly, And to be fair, it took Open
a Eye in this case a fair amount of time
and to then turn this into this product, and most
of that time it also from their vantage point, wasn't
a product. Right. So up until all the way through
chat REPT, Open Eye have published all of their GPT developments,

(22:07):
maybe not all, but you know, their large fraction of
their work on this.

Speaker 2 (22:11):
Yeah, they're early models.

Speaker 1 (22:12):
The whole models were open exactly. They were more true
to their name really also believing in the same thing.
And it was only really after chat GPT and after
this to them also surprise to a certain extent success,
that they started to become more closed as well when
it comes to scientific developments in this past. You'll be

(22:37):
back in just a minute. Let's talk about your company.
When'd you decide to start Inceptive? The decision took a

(22:58):
while and was influenced by events that happened over the
course of about three months two to three months in
late twenty twenty, starting with the birth of my first child.
So when am I was born, two things happened. Number one,
witnessing a pregnancy and a birth during a pandemic where

(23:21):
there's a pathogen that's rapidly spreading, and so all of
that was a pretty daunting experience, and everything went great,
But having this new human in my arms also really
made me question if I couldn't more directly affect people's

(23:41):
lives positively with my work. And so I was at
the time quite confident that indirectly it would have effect
also on things like medicine, biology, etc. But I was wondering,
couldn't this happen more directly if I focused more on it.
The next thing that happened was that alpha fold two
results at CAST fourteen were published. CAST fourteen is this

(24:05):
biannual challenge for protein structure prediction and some other related problems.
This is the protein folding problem, and this is the
protein folding problem exactly.

Speaker 2 (24:13):
The machine learning solving the protein folding problem, which had
been a problem for decades given us chain of amino
acids predict the three D structure of approach precisely, and
humans failed and machine learning succeeded.

Speaker 1 (24:24):
Just amazing. Yes, it's a great example. Humans failed despite
the fact that we actually understand the physics fundamentally, but
we still couldn't create models that were good enough using
our conceptual understanding of the processes involve.

Speaker 2 (24:39):
You would think an algorithm would work on that one, right,
You would just think an old school set of rules,
like we know what the molecules look like, we know
the laws of physics. It's amazing that we couldn't predict
it that way. Right. All you want to know is
what shape is the protein going to be? You know
all of the constituent parts, you know every atom in it,
and you still couldn't predict it with a set of rules,

(25:00):
but AI machine learning could.

Speaker 1 (25:03):
Amazing, Yes, and it is amazing. Actually, when you put
it like this, it's important to point out that and
when we say we understand it, we make massive oversimplifying
assumptions because we ignore all the other players that are
present when a protein folds. We ignore a lot of
the kinetics of it because we say we know the structure,
but the truth is, we don't know all the wiggling

(25:25):
and all the shenanigans that happen on the way there, right,
and we don't know about uh, you know, chaperone proteins
that are there to influence the folding. We don't know
around all sorts of other I'm doing the physics one.

Speaker 2 (25:36):
I'm doing the assume a frictionless plane version of protein precisely.

Speaker 1 (25:40):
Precisely, precisely. And the beauty is that deep learning doesn't
need to make this assumption. AI doesn't need to make
this assumption. AI it just looks at data, and it
can look at more data than any human or even
humanity eventually could look at together. It's such a good
example problem to demonstrate that these models are ready for
prime time in this field and ready for lots of applications,

(26:02):
not just one or two, but men sold, and so
that happens, so sold exactly. And then the third thing
was that the COVID mRNA vaccines came out with astonishing
ninety plus percent out of.

Speaker 2 (26:17):
The gate that they were still so underraty. Under the
beginning of the pandemic, people were like, it'll be two
or three years, and if there's sixty percent effective, that'll be.

Speaker 1 (26:27):
Great, exactly exactly, And so everybody forgets. Everybody forgets it.
And when you look at it, this is a molecule
family that was for you know, most of the time
that we've known about it since the sixties, I suppose
we've treated it like a neglected stepchild of molecular biology,
because you're talking about marine in general. In general.

Speaker 2 (26:47):
Everybody loves DNA, right, DNA.

Speaker 3 (26:50):
Everybody loves DNA movie star, Yeah, exactly, exactly, even though
now looking back, DNA is merely you know, the place
where life takes its notes, maybe the hard drive and
the memory.

Speaker 1 (27:01):
It's the book, right, it's the book. So but but
at the end of the day, it was this molecule
family that was about to save, you know, depending on them,
tens of millions of lives and in rapid time. So
all these things hold, but we have no training data
to apply anything like alpha fold to this specific molecule family,

(27:21):
no training data to speak of. We had two hundred
thousand known protein structures at the time, I believe, maybe optimistically,
we had maybe twelve hundred known RNA structures. And on
top of that, it was also fairly clear that for
RNA going directly to function would be much much more important,
because it's in a certain sense a less strongly structured molecule,

(27:41):
and other aspects of the molecule might play a bigger role.
And then on top of that, the attention that generative
AI was receiving overall, also now in the field of
pharma or of medicine, was building, And so I ended
up finding myself in a conversation where very I would

(28:02):
say wise longtime mentor of mine pointed out that, you know,
maybe ten years from now or so, somebody could tell
my daughter that there was this perfect storm where this
MACLE molecule with no training data was about to save
the world and could do so much more in the
direction of positively impacting people's lives. We didn't have training data,

(28:24):
would be very expensive to create it, but using the
technology that I've been or technologies that I'd been working
on for the last I don't know, ten plus years,
and the ability because of the attention that people were
now giving to AI in this field the ability to
raise quite a bit of money. I, in that position,
chose to stay back at my cushy dream job in

(28:47):
big tech and not actually take this opportunity to really
positively impact people's lives, And that idea was not one
I was willing to entertain.

Speaker 2 (28:59):
You couldn't just coast it out at Google and let
somebody else go figure out RNA.

Speaker 1 (29:03):
Yeah, and it's not just RNA. I think RNA is
a great starting point at the end of the day,
but building models that learn from first of all, all
the publicly available data that we can possibly get our
hands on, but also from data that we can reasonably
effectively create in our own lab. How to design molecules

(29:24):
for specific functions is something that now is within reach
and that will in the next years, in the years
to come, have completely transformational impact on how we even
think about what medicines are. That any opportunity to speed
this up, to make this happen, even just a day

(29:44):
sooner than it could have otherwise happened, is incredibly valuable
in my opinion.

Speaker 2 (29:49):
As you're talking about this idea that the absence of
training data is kind of seems to be at the
center of it, right, It seems to be the core
yeah problem, which makes sense, right, Like the reason language
works so well is basically because of the Internet. I know,
now we're going beyond it, but like it just happened
to be that there was this incredibly giant set of
natural life language that became available. We don't have anything

(30:12):
like that for RNA, so are you. I mean, it's
kind of step one at inceptive creating the data. Is
that kind of what's happening?

Speaker 1 (30:22):
So step one that inceptive is learning to use all
the data or was I think we've made a lot
of focus in that direction, learning to use all the
data that is available already and identify what other data
we're missing, and then see how far we can get
with just the publicly available data and at the same
time scale up generating our own data. And it turns

(30:42):
out that actually, because of the nature of evolution, because
of how evolution isn't actually incentivized to really explore the
entire space of possibilities. It is almost always given that
if you are trying to design exceptional molecules, especially ones
that are not say, you know, natural formats, you are

(31:08):
basically gearing need to need novel training in it.

Speaker 2 (31:11):
Yeah, basically you're saying you build RNAs that don't exist
in the world that have therapeutic uses, and there's no
kind of definitionally no training.

Speaker 1 (31:19):
Yes, that exist. The funny thing is we have a
few of them, and so we have existence proofs of
OURNA molecules, for example, RNA viruses that actually exhibit incredibly
complex different functions in ourselves, that do all sorts of
things that we don't usually like. But if we could

(31:40):
use those, you know, for good, If we could use those,
you know, in ways that would actually be aimed at
fighting disease rather than creating them, those kinds of functions,
even just a small subset of them, would really transform
medicine already. And so we know it's possible. What are
you dreaming of when you say that, what are you
thinking of? Specific? Okay, So, for example, right, one estimate

(32:03):
is that in order for COVID to infect you, you
would need potentially as few as five COVID genomes inside
your organism that's already in five five viral particles. Five
viral particles. Yeah, you inhale those, you wouldn't have to
inject it you wouldn't even have to swallow it, you

(32:24):
inhale them.

Speaker 2 (32:25):
If we could have a medicine that worked as well
as a disease is a version of your.

Speaker 1 (32:29):
Truth, exactly exactly so at the end of the day, right,
this medicine is able to spread in your body only
into certain types of organs and tissues and cells. It
does certain things there that are really quite complex, right,
changing the cells behavior again not usually in this case
in favorable ways, but still in ways that wouldn't have

(32:50):
to be modified that much in order to potentially be
exactly what you would need for complex multifactorial medicine. And
if you could make all of that happen by just
inhaling five of those molecules, then again, that would completely
change how you think about medicine. Right, you have viruses
that aren't immediately active, but that are inactive for long
periods of time in your organism, and only under certain conditions,

(33:13):
say under certain immune conditions, really start being reactivated. Why
can't we have medicines that work in a similar way
where you actually not only in a vaccination sense, but
where you take a medicine for a genetic predisposition for
a certain disease that you are able to take a
metic design of medicine that you can take and that

(33:33):
waits until the disease actually starts to develop, and only
then and only where that disease then starts developed, becomes
active and actually affects it and potentially also then alarms
the doctor through a blunt test.

Speaker 2 (33:45):
Like for cancer cells or something. So you have some
kind of prophylactic medicine in your body and it is
encoded in such a way that it just hangs out there,
like herpes, to take a pathological example for example, and
only in certain settings does it do anything. And those
settings are if you see a cancer cell, destroy it,
otherwise just it there precisely.

Speaker 1 (34:07):
And if you can design those also in ways where
you can just make them all go away. When you know,
you take a say a completely harmless small molecule, and
that's again entirely feasible.

Speaker 2 (34:17):
Sure, So, I mean you're dreaming big. These are wonderful
big you know, science fiction andy dreams that I hope
you figure them out. On a practical level. What's happening
at the company right now? How many people work there,
what are they doing, and what are they figured out
so far?

Speaker 1 (34:31):
We're round forty. What we're doing is really exactly what
we just talked about. We're basically scaling data generation experiments
in our lab that allow us to assess a variety
of different functions of different mostly RNA molecules actually mostly
m RNA molecules at the moment, that are relevant to

(34:55):
a pretty broad variety of different diseases. And so this
ranges from things like infectious disease vaccines to sell therapies
that can be applied in oncology or an auto or
against autoimmune disease. We have mRNAs that we hope will
eventually be effective in enzyme replacement as enzyme replacement therapies

(35:16):
for families of a large family of rare diseases, and
the list goes on. And so we're creating this or
growing this training data set that eventually, on top of
foundation and models that we pre trained on all publicly
available data, allow us to tune those foundation models towards

(35:39):
designing exceptional molecules for exactly those applications and many more
sharing similar properties.

Speaker 2 (35:45):
So you basically build new mr and a model molecules
and test them, and then you give that data to
your model and presumably it tells you what to build next,
or it helps you figure out what to build next.
It's sort of a loop in that way.

Speaker 1 (35:59):
The models are definitely one interesting source for proposals if
you wish for what to synthesize and test next, they're
not the only such source, so we basically also explore
kind of and maybe less guided or heuristically guided ways,
but exactly so in some of the cases, it's really
quite iterative. For some of those functions and for some

(36:21):
of those modalities and diseases or disease targets, we're actually
already at a point where our models can spit out
entirely novel molecules that really are unlike anything they've ever
seen or we've ever seen in nature, that very consistently
perform quite favorably compared to pretty strong baselines by incumbents

(36:43):
in the field.

Speaker 2 (36:44):
When you say perform quite favorably compared to baselines by
incumbents in the field, and does that on some level
mean better than what experts would think.

Speaker 1 (36:54):
Up, better than what experts can think up, and also
better than more traditional machine learning tools can easily produce.

Speaker 2 (37:01):
It's like that famous moment in the Go match when
alpha go made some move that like no human being
ever would have thought of.

Speaker 1 (37:09):
Yes, so I would say we've long passed the move
thirty seven in the sense that our understanding of the
underlying biological phenomena is so incomplete that for most of
the things that we're able to design for, we don't
really understand why they happen.

Speaker 2 (37:28):
Huh, when you say weed, you mean at inceptive or
do you mean just medicine in general?

Speaker 1 (37:33):
I would say just medicine in general.

Speaker 2 (37:35):
Okay, So Inceptive is doing this very kind of high
level work, right, I mean building what will hopefully be
the foundation. What's the right amount of time in the
future to ask about when will we know if it works?
You think five years?

Speaker 1 (37:50):
So the general idea of using genitive AI and similar
techniques to generate therapeutics, there are some things in clinical
trials that were largely designed with AI. As far as
I know, we're still maybe now we have the first

(38:11):
trials just now starting for molecules that were truly entirely
designed by A.

Speaker 2 (38:17):
As opposed to sort of selected from a library.

Speaker 1 (38:20):
Or selected, influenced, exactly selected, adjusted to you, and tweaked,
et cetera. Right, So that's really still only happening just now,
but we will see I believe, the first success or
a first success of such molecules, certainly within the next
five years.

Speaker 2 (38:38):
What about more narrowly, the project at inceptive.

Speaker 1 (38:41):
It's a similar timeframe. We should be able to get
molecules into the clinic in the next few years, certainly
in the next handful of years. Now. These will not
be molecules with where the objective that we used in
their design is you know, even remotely as complex or

(39:02):
you know, kind of the different functions that we're designing
for are are not going to be even remotely as
diverse as say what you would find because we used
this example earlier in ourna virus. These will really be
more simpler. Those will be molecules that don't do things
that we couldn't possibly have done before, but that do

(39:24):
them much better in ways that are more accessible, in
ways that come with less side effects.

Speaker 2 (39:30):
What biotech largely is is they make protein drugs. And
so if you could make an mRNA drug where you
put the m RNA into the body and the body
makes the protein, it wouldn't be some crazy sleeper cell
that sits in your body for twenty years or whatever,
but it might be a more practical alternative to today's biotech drugs.

Speaker 1 (39:49):
Absolutely.

Speaker 2 (39:50):
So you've had a kind of crash course in biology
in the last few years, yes, And I'm curious, like,
what is what is something that has been particularly compelling
or surprising or interesting to you that you have learned
about biology.

Speaker 1 (40:03):
They're countless things. The biggest one, or the red thread
across many of them is really just how effective life
is at finding solutions to problems that on one hand
are incredibly robust, surprisingly robust, and on the other hand,

(40:28):
are so different from how we would design solutions to
similar problems.

Speaker 2 (40:36):
Uh huh.

Speaker 1 (40:37):
That really this comes back to this idea that we
might just not be particularly well equipped in terms of
cognitive capabilities to understand biology that basically, you know we
are we would never think to do it this way,
and how we think to do it is oftentimes much
more brittle.

Speaker 2 (40:58):
Uh huh. Brittle is an interesting world, less, less resilient,
less able to persist under different.

Speaker 1 (41:03):
Conditions, exactly exactly. I mean, you know, we still haven't
built machines that can fix themselves, for one.

Speaker 2 (41:09):
Which is fundamentally the miracle of being a human being.

Speaker 1 (41:12):
Just fundamentally exactly, exactly exactly and so and of course
this is true across the scales, right from from you know,
single cells all the way to complex organisms like ourselves
and and really just how many also very different kinds

(41:33):
of solutions life has found and or and or constantly
is finding. Uh. And you see this all over the place,
and it's both daunting, humbling, but also incredibly inspiring when
it comes to applying AI in this area, because again
I think that at least so far, it's the best

(41:54):
tool and maybe actually the only tool we have so
far in face of this kind of complexity. Really design
interventions that medicines that go way beyond what we were
able to do or are able to do, just based
on our own conceptual understanding.

Speaker 2 (42:14):
We'll be back in a minute with the lightning round.
M hm, let's finish for the lightning round. As an
inventor of the Transformer model, are there particular possible uses

(42:38):
of it that worry you flash make you sad?

Speaker 1 (42:42):
I am quite concerned about the p doom doomerism, whatever
you want to call it, existential fear instilling rhetoric that
is in some cases actually also promoted by people by
entities in the space.

Speaker 2 (43:00):
So just to be clear, you're you're not worried about
the existential risk. You're worried about people talking.

Speaker 1 (43:05):
I'm worried about the about the existential risk being inflated
or the perception being inflated to the extent that we
actually don't look enough at some of the much more
concrete and much more immediate risks. Right. I'm not going
to say that the existential risk is zero. That would

(43:27):
be silly.

Speaker 2 (43:27):
What is a concrete an immediate risk that is you
think under.

Speaker 1 (43:32):
Discuss these large scale models are such defective tools in
manipulating people in large numbers already today, and it's happening
everywhere for many, many different purposes by in some cases
benevolent and in many cases malevolent actors that I really

(43:53):
firmly believe we need to look much more at things
like enabling cryptographic certification of human generated content, because doing
that with the machine generated content is not going to work.
But we definitely can cryptographically certify human generated content as.

Speaker 2 (44:09):
Such basically watermarking or something some way to say this
a human made this.

Speaker 1 (44:14):
Exactly what would you be.

Speaker 2 (44:15):
Working on if you were not working in biology on
drug development?

Speaker 1 (44:20):
Education using using artificial intelligence to democratize access to education.

Speaker 2 (44:26):
What have you seen that has been impressive or compelling
to you in that regard?

Speaker 1 (44:31):
There are lots of little examples so far and really countless.
It's what's happening at the con Academy. There are many
examples of AI applied to education problems in places like China,
for example. You have a bunch of very compelling examples
in fiction. A book I really like, like a named
Neil Stephenson, The Diamond Age or a Young Ladies Illustrated

(44:54):
primer that I recommend if you.

Speaker 2 (44:56):
Just everybody in AI talks about that, Well now they do.

Speaker 1 (44:59):
Yeah, it's yeah, well.

Speaker 2 (45:01):
Now they do.

Speaker 1 (45:01):
You liked it before?

Speaker 2 (45:02):
It was cool?

Speaker 1 (45:03):
I'm sure at one point I thought it was really
really important and sure that Neil students know is that
we are about to be able to build the primary
and so I ended up having coffee with him to
tell him, oh, that's great. So at the end of
the day, maybe the biggest inspiration there is my daughter.

(45:25):
She's four and a half now, and I think she
could today read. She can read read okay, but she
could read, you know, grade school level if she had
access to you know, an AI tutor teaching her how
to read?

Speaker 2 (45:41):
Does your daughter use AI use you know, AI chat
butts not directly without me, But we've.

Speaker 1 (45:49):
Actually used chat GPT to implement an AI reading tutor
that works reasonably well. I mean we basically, you know,
kind of as I call it now, vibe coding, vibe coded.
And I wasn't there for all of it. Took some time,
but she was there for some of it. Oh, you
vibe coded it with her? Yeah, well, I mean she was,
she was there. You know, she witnessed a good chunk

(46:12):
of it, Yes, although she was more interested in the
image generation parts. But yeah, we have a sketch of
one that she quite enjoys. So that's kind of like
the extent of her at the sage using I directly.
Yakabust is the CEO and.

Speaker 2 (46:32):
Co founder of Inceptive and the co author of the
paper Attention Is All You Need. Just a quick note,
This is our last episode before a break of a
couple of weeks, and then we'll be back with more episodes.
Please email us at problem at Pushkin dot fm. We
are always looking for new guests for the show. Today's

(46:53):
show was produced by Trinamanino and Gabriel Hunter Chang. It
was edited by Alexander Garretton and engineered by Sarah muguerrett
Advertise With Us

Popular Podcasts

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.