AI in Clinical Research: Four Listener Questions w/ Jeremy Franz

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
SPEAKER_00 (00:00):
I wish there was just one model that we could

(00:01):
pick and that was the best foreverything and we could move on
with our lives.
But the reality is, especiallywhen you're building products on
top of models, is that you haveto test the task that you're
asking the model to do andevaluate its performance on that
task.
So when we do this, we...

SPEAKER_01 (00:37):
Hey Jeremy, we're back again.
Hey John, good to be back.
Yeah, so we did this topic a fewweeks ago on practical AI in
clinical research and got somegreat feedback.
Appreciate everybody wholistened and who sent us a
direct message, a comment, orjust sent me a note in general.
The funny thing is in response,we actually got some unpractical
questions around AI.

(00:58):
So I think there's definitely alot of individuals interested in
AI, interested in how it works,interested in application around
clinical research.
There's actually a lot ofquestions about sort of more
foundational things around howdoes it work on the back end and
how do you deal with challengesand gaps they've maybe heard
about or they've seen, butthey're not really sure how they
practically work in these toolslike what we're doing today.

(01:21):
And so I thought what we coulddo is we took four questions.
I thought we could unpack themjust kind of live here together
and hopefully they'll help ourlisteners to really think about
and maybe understand just asmall segment of AI and how it
applies to research a littledifferently.
With that, one of the keyquestions we got was what are
LLMs and how do they work?
So Jeremy, start by unpackingthat.

SPEAKER_00 (01:41):
Yeah, absolutely.
LLM stands for large languagemodel.
So think of it as a big computerprogram that ingests tons of
data.
Thinking by tons of data, I meanall the text on the internet.
It takes in all that data and itis trained to predict data
predict the next word inwhatever string of text you
provide.

(02:01):
So given a sentence, it willpredict the next word.
And so this model has beentrained on just massive and
massive amounts of data.
And it's all been trained topredict the next word.
And on top of that, we've beenable to build a lot of
interesting kind of applicationsusing that pretty relatively
simple goal or task for theprogram.
Yeah.
And then LLMs,

SPEAKER_01 (02:21):
like you think about the brand of LLMs, can you talk
a little bit about what thebrands are so people sort of can
start to put the names in theright order?

SPEAKER_00 (02:29):
Yeah.
So, you know, you kind of startwith the model providers.
So the big three model providersbeing OpenAI, Anthropic, and
Google.
They are these companies thatare building these new LLMs from
scratch or building them on topof each other.
And the actual model itself iswhat you may know as GPT-4,
Claude 3.5, Gemini 2.5.

(02:53):
They typically have kind of afamily name and then a version
number, essentially.
OpenAI is famously really bad atnaming them.
And the version number doesn'talways mean what you think it
would mean.
GPT 4.1 is actually more the...
more advanced than GPT 4.5.
I don't know why they chosethat, you know, that naming

(03:13):
scheme, but the model name isthe actual kind of record, the
thing that you're interactingwith the most.

SPEAKER_01 (03:19):
Yeah, it makes sense.
And then when you think, soyou've got these LLMs, then you
have the models.
One of the questions we gotthrough the second question that
came up quite a bit was, youknow, what are the models, but
it was more around how do youpick the best model?
Like what's the number one modelto use and do you use it?

SPEAKER_00 (03:34):
Yeah, absolutely.
I wish there was just one modelthat we could pick and that was
the best for everything and wecould move on with our lives.
But the reality is, especiallywhen you're building products on
top of models, is that you haveto test the task that you're
asking the model to do andevaluate its performance on that
task.
So when we do this, we specify atask.

(03:57):
It's, let's say, aclassification problem.
So we have a task that ispredicting the quality of
someone's response.
So a one to five scale, it has arubric with a description of
what is a one, what is a five,what is everything in between.
And we test that on new modelsas they come out.
So for example, O3 is kind ofthe largest, most advanced

(04:21):
reasoning model that OpenAI hasreleased to date.
And we tested it on this qualityscore task and it actually
performed really bad, worse thanGPT 4.1, worse than even GPT 4.
And when we dug into the dataand looked at the results it was
the reason it was doingperforming so poorly was because

(04:43):
it was making up what it thoughtwas the best answer possible to
a question and saying theydidn't talk about this this and
this therefore it's not a goodanswer and so you have you know
it just proves that just becauseit's the most You know, the most
advanced model doesn't mean it'sactually the best at performing
the tasks that you want it toperform.

SPEAKER_01 (05:03):
So the model was smarter than the task that it
had to complete by a long shot.
Wow, that's crazy.
Obviously, like this is anevolving space.
By the time we publish this,it's going to have changed 10
more times, I'm sure at thisrate, or some new companies are
going to pop up on it.
Thanks.
I hope that helps people becausethis is a really dynamic
question.
And I think the key is the wordbest model, which we hear a lot

(05:25):
is not the case, right?
It's models for purpose.
It's fit for purpose use.
One of the other questions thatwe got, and maybe it's just
because people like using theword hallucination, but this
question came up, which is, whatare these hallucination And how
do I avoid them?
I can't have any of them inclinical research.
Too risky.
So how do you make sure thereare zero hallucinations?

(05:46):
So when someone asks you that,Jeremy, or the word
hallucinations come up, how doyou answer that?
How do you unpack that forpeople?

SPEAKER_00 (05:52):
Yeah, I mean, it's a real challenge.
I think these models are so goodat confidently responding to
whatever input you provide it.
It's going to give you aconfident answer more often than
not.
And it's really easy to justtrust the confidence.
You know, as humans, we're auncertainty or hedging or things
like that.
Whereas these models are trainedto always provide the right

(06:14):
answer and where things can gowrong is when they're
overconfident in the answer thatit's providing to you.
And I think for us, that meanshaving to build specific steps
within the process in order tovalidate the output from an LLM.
So it's a validator step thattakes something that the model
said was part of the responsefrom a study in our case and

(06:37):
looking at the source data andensuring, does this string of
text actually exist in oursource data?
If it doesn't, we need toescalate this and get it
corrected.
And that's one example of justworking within the limitations
of the model and adding someextra checks to try to reduce

(06:58):
the likelihood that ahallucination would be shown to
an end user

SPEAKER_01 (07:04):
yeah i like that so let me say it kind of um non-ai
english i'm going to try my besthere if you have participants
like representative patients fora study and you have some
principal investigator studycoordinators and you you screen
them You go out and we recruitthem.
You ask them to answer a seriesof questions, whether it's
Likert scales or talking totheir phone, their voice, and

(07:25):
they get feedback.
And then that data is in thesystem and you're using AI to
actually process that databecause there's a lot of it.
The way you're making sure youdon't have hallucinations in the
mix is because you've basicallytrained and you have a process
and a model that referenceseverything to make sure that
that voice came from patient 001the AI didn't make up what it

(07:48):
thought the voice could say,which is where you're going to
get to the hallucination.
So it, in effect removes thatand adds a reference to, so that
somebody can trust and know thatthat's the data that came
directly out of their study.
Is that, I get that right?

SPEAKER_00 (08:00):
Exactly.
No, that's exactly right.
We have a ground truth of whatexactly was said in the study.
And we were able to compare thatwith what the AI references or
cites.
And that's how we main integritywith our systems.

SPEAKER_01 (08:12):
Yeah, it's really interesting.
I don't know if I told you, butthis actually happened to me
like a couple of weeks ago.
I was doing some research ondecentralized clinical trials
and I was asking it for somegood quotes of the week.
And it gave me these threequotes.
And one of the quotes was fromme.
And I read the quote and I waslike, that sounds awesome.
I didn't say that at all.
And I think I messaged you rightaway.
I was like, this looks like ahallucination.

(08:33):
And you were like, yeah, youprobably said it.
And I literally like searchedand couldn't find it.
And so funny enough, I just leftthe chat window open and I asked
it for the reference.
I said, can you reference thisfor me?
And long story short, I'm prettysure that like ChatGPT just
said, hey, if you post it onLinkedIn today, it'll be your
quote and then I can referenceit for you.
And so I thought, no, that'scalled a hallucination.

(08:54):
Like that's incorrect.
data and there was no referencepoint for it.
So I think you're right, likethese systems and these
processes to manage and makesure the data is referenceable
and doesn't have hallucinationsare so important.
And so great question and Iappreciate those who offered it.
So Jeremy, those were the fourquestions.
Lots more we'll take and maybewe'll add this to a future

(09:14):
session.
So if someone wants to get ahold of you, learn more about
what you're up to in AI or justask you a really specific
question, what's the best waythey can get you?

SPEAKER_00 (09:21):
Yeah, just find me on LinkedIn.
Message me there or call commenton on this video and be happy to
answer any questions you mighthave

SPEAKER_01 (09:30):
that's great well hey Jeremy thanks for your time
and uh if we get more of thesequestions I'll see you again
real soon thanks

All Episodes

Episode Transcript

Popular Podcasts

On Purpose with Jay Shetty

Cardiac Cowboys

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}AI in Clinical Research: Four Listener Questions w/ Jeremy Franz

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}On Purpose with Jay Shetty

Cardiac Cowboys

Crime Junkie

All Episodes

AI in Clinical Research: Four Listener Questions w/ Jeremy Franz

On Purpose with Jay Shetty