Uppsala Reports Long Reads – Weeding out duplicates to better detect side effects - Drug Safety Matters

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Fredrik Brounéus (00:09):
Duplicate reports are a big problem when
it comes to signal detection,but with the help of machine
learning and new ways ofcomparing reports, we may more
effectively detect them.
My name is Fredrik Brouneus andthis is Drug Safety Matters, a
podcast by Uppsala MonitoringCenter, where we explore current
issues in pharmacovigilance andpatient safety.

(00:29):
This episode is part of UppsalaReports Long Reads series,
where we select the most topicalstories from our news site,
Uppsala Reports, and bring themto you in audio format.
Today's article is "Weeding Outduplicates to better detect
side effects, written by JimBarrett, senior data scientist
at Uppsala Monitoring Center,and published online in April

(00:52):
2024.
After the read, I sit down withJim to learn more about
duplicate detection and otherways that we can use artificial
intelligence inpharmacovigilance.
So make sure you stay tunedtill the end.
But first let's hear thearticle read by Jim Barrett.

Jim Barrett (01:14):
VigiBase is fast approaching 40 million reports
of adverse events followingdrugs and vaccines, with no
indication of slowing down itsgrowth.
So far in 2024, VigiBase hasreceived on average about 50,000
new reports per week.
The sheer size of VigiBasemakes it an amazing resource for
pharmacovigilance.
However, a natural consequenceof this high rate of reporting

(01:37):
is that we can sometimes getmore than one report in VigiBase
about the same adverse event inthe same patient.
There are many ways this canhappen.
Sometimes there are multiplereporters of the same event or a
single patient may report tomultiple places.
Another is that follow-upinformation can be mistakenly
unlinked to the original report.

(01:57):
Duplicate reports pose severalproblems for pharmacovigilance.
A key example arises when doingstatistical signal detection,
which is when we try to identifythe adverse events which are
happening more frequently incombination with a drug than we
would otherwise expect to see bychance.
Imagine we have some adverseevent reports for a drug that
specify experiencing a headache.

(02:18):
Given the background rates ofreporting on headaches, we would
expect 10 of the reports tomention headache by chance.
Then imagine that for each ofthe patients who experienced the
headache, VigiBase had receivedtwo independent reports of
their adverse event.
Suddenly, this combinationlooks like it's happening twice
as often as we would expect.
This might lead us toinvestigate the combination as a

(02:41):
potential safety signal,wasting valuable time that could
be spent investigating otherpotential signals.
Clearly, it would be better toremove duplicate reports from
the database before we do ourstatistical analyses.
For VigiBase, this task isimpossible to do manually due to
the large number of reports itreceives daily, so it becomes
necessary to come up with analgorithm to do it for us.

(03:04):
This is a more challengingproblem than it sounds.
Just because two reports areduplicates of one another
doesn't mean that they lookidentical.
Different reports might usedifferent terms to describe the
same adverse event, or theymight include more or less
information about the patient.
Conversely, two reports may notcontain enough information to

(03:26):
reliably decide whether they areduplicate reports or not.
Previous efforts to detectduplicates have focused on
probabilities, comparing thelikelihood of a specific
combination of drugs, reactions,sexes, ages and so on occurring
on a given pair of reports,based on the background
reporting rates derived fromVigiBase.
If it seems too unlikely tohave occurred by chance, then we

(03:50):
suspect they're duplicates.
This approach has been usedwith great success by Upps ala
Monitoring Center for severalyears.
However, methods like these canrun into problems, especially
in databases as large anddiverse as VigiBase.
One place where previousapproaches are known to perform
poorly is with reports ofadverse events following

(04:11):
vaccinations.
Consider the vaccine againsthuman papillomavirus.
Most vaccine recipients aregoing to be girls around the
same age, with many patientsbeing vaccinated on the same day
.
If you have two HPV vaccinereports and both report the same
sex, age, date of vaccinationand adverse event, this may

(04:32):
still not be sufficient evidenceto suspect them of being
duplicates.
These challenges have madeduplicate detection among
vaccine reports unreliable.
Over the past two years,researchers at UMC have been
working on a new algorithm forduplicate detection for both
drug and vaccine reports.
It builds upon the strengths ofearlier approaches, but also

(04:52):
implements new methods comparingpairs of reports.
For example, we use a new wayof capturing any date
information mentioned on thereport, from drug administration
periods to the start and enddate of the drug, to dates
contained in the free textnarrative.
We use this date information todetermine whether the timelines
described in the reports arecompatible.
If they are, the reports aremore likely to be duplicates.

(05:15):
If they aren't, then they maybe separate reports.
The method also uses machinelearning to understand how to
effectively weigh evidence fromdifferent parts of the reports
to decide whether to suspect apair as being duplicates.
In all our tests, this newapproach works as well as, or
better than, previous approachesfor both drugs and vaccines.

(05:35):
Effective duplicate detectionis just one cog in the machine
of pharmacovigilance, but oncethe new method is in place,
pharmacovigilance practitionersworldwide will have a sharper
tool to find true safety signals, ultimately improving patient
safety.

That was Jim Barrett, senior data scientist
at Uppsala Monitoring Center,reading his article Weeding out
duplicates to better detectside-e ffects", and he's with me
here in the studio now.
Welcome back to the show, Jim.

Thanks for having me, Fredrik.
Good to be back.

It's been a couple of years, but listeners
interested in AI will rememberthe last time you were here to
tell us about an algorithm thatyou and your colleagues had
developed to improve signaldetection in VigiBase.

Yes, exactly.
So, I was here a couple ofyears ago speaking about a
method called VigiGroup, whichis a kind of a new way of doing
signal detection by groupingtogether similar reports.
I think at that time we werespecifically talking about how
we would use that method for theCOVID vaccine rollout, looking
for new and unknown side effectsin that time.

Right, and today we're going to talk about
an other algorithm, one that'sspecifically designed to detect
duplicate reports in VigiBase.
And you said VigiGroup; doesthis one have a name yet?

It does.
So we have an existingalgorithm that has been in use
for several years at UMC calledVigiMatch, which is designed to
do the same problem withdetecting duplicates, and I
think we're going to continuewith the brand there and name
this new algorithm an improvedVigiMatch or just keep calling
it VigiMatch.

In the media, I mean, we often see terms such
as AI, algorithms and machinelearning used almost
interchangeably, and I waswondering whether you could
perhaps give us just a quickrundown of the meaning of these
concepts, because they're notcompletely the same, are they?

No, they're not completely the same.
I mean, I think it's kind offunny with AI specifically, it's
very difficult to chase down areal concrete definition.
It feels like if you put threedata scientists in a room you'd
come away with five definitionsof AI.
So for me personally, I like adefinition that some of my
colleagues have adopted, whichis that AI is being a branch of

(07:57):
computer science that involvesthe ability of a machine to
emulate aspects of humanbehavior and to deal with tasks
that are normally regarded asprimarily proceeding from human
cerebral activity.
This is a definition first putforward by Jeffrey Aronson a few
years ago.
So I quite like that definitionfor AI, but it's quite a broad
definition, necessarily, andthen going on to a definition of

(08:20):
machine learning, I would callmachine learning kind of a class
of algorithms which basicallylearn from data, so you don't
have any sort of hard-codedknowledge in them necessarily.
They instead learn by example.
And then algorithms is an evenmore broad kind of definition, I

(08:40):
would say.
It's just, you would kind ofclass an algorithm as a set of
instructions to follow toachieve a certain task.

How do we go about developing algorithms here
at UMC?
Do we build them from scratchor do we depart from models
created by other actors, suchas, we hear about OpenAI and
ChatGPT, and do we have somekind of basis and then tweak it

(09:09):
for our own specific needs?

Yeah, so we work on quite a wide, diverse set of
problems within research anddata science at UMC, and so I
would say the answer to thisquestion varies a lot depending
on the problem we're working on.
So, for example, we do a lot ofwork in the area of NLP
(natural language processing)which is learning from and
inferring things from free textor natural language, and in

(09:34):
those cases we typically takemodels off the shelf and then
tweak them to our use case.
Or, as you mentioned, OpenAIwe've been doing some work and
investigation into usingOpenAI's GPT models for certain
tasks, but then for other tasks,such as VigiMatch, which I was
describing in this article, thekind of precursor to the new

(09:55):
algorithm which I described inthe article was developed
completely in-house and then theimprovements I've made on it
have been largely figuring outhow to best represent features
on the reports to compare themto one another.
So a lot of the work has beendeveloped from scratch in-house
in that instance.

And then the next question is how do we
evaluate their performance, thealgorithms, when we have
developed them?

Yeah, absolutely.
I mean this is is an enormousproblem, an enormous topic.
We could probably do three morepodcast episodes just on this.
I mean, as it happens, I wasrecently in San Diego at the DIA
Global, Drug InformationAssociation Global Conference,
which is a conference where manydrug manufacturers and

(10:43):
developers and regulators meetto discuss current topics, and I
was chairing a session onexactly this problem how do we
evaluate AI solutions in thecontext of pharmacovigilance?
And it's very much not an easyproblem.
I think we all came away fromthat session with more questions
than answers.
I mean, we can talkspecifically in the context of

(11:03):
VigiMatch.
So one of the real difficultieswith VigiMatch is duplicates are
very rare.
If you were to just pick tworandom reports from VigiBase,
you would expect them to be aduplicate pair about one time in
250 million.
And the issue then with this isthat when you're trying to kind

(11:26):
of generate a number of examplesof true duplicates so that you
can test to see if youralgorithm is successfully
finding them, your data set thenis going to be necessarily
biased, because you can't justkind of randomly sit there,
label billions of pairs ofreports with the hope of finding
a few duplicates.

(11:47):
So this presents a significantchallenge, and it becomes an
exercise more in understandingand correcting for the biases in
the way you're evaluating thisalgorithm than anything else.
Another significant challengethat we faced, I think, and has
been a challenge facingVigiMatch for quite a long time,

(12:09):
is that VigiBase is extremelydiverse.
It's a global database withover 150 contributing countries
at this point, and not allcountries have exactly the same
pharmacovigilance landscape.
They don't necessarily have thesame standards or best
practices of reporting, and sowe sometimes see that in some

(12:33):
countries the reporting kind ofdistribution, the reporting
patterns, can be significantlydifferent from other countries,
and so making sure that thealgorithm is performing well in
all settings and not just in themost common setting, for
example, also presents asignificant challenge.
And the reality of it is thatwe just have to kind of roll our

(12:58):
sleeves up and go in and reallyverify the algorithm and look
at real examples of where it'ssucceeding and where it's
failing in all of these cases toget a good sense of how well
it's performing.

So just to have an idea; approximately how
many algorithms are we talkingabout, all in all that we have
developed here at UMC?

Yeah, I was trying to count these on my way into
work this morning.
I mean, it's certainly in thetens.
It's a difficult thing toquantify, I would say, but yes,
so I mean, as I mentionedearlier, we work on a diverse
set of problems and there's manykind of problems within
pharmacovigilance which cansomewhat yield to data science

(13:38):
techniques.
So we have approaches to manyof these.
But yeah, I would say it's adifficult thing to count.

You mentioned in your article that VigiBase is
approaching now 40 millionreports of adverse drug events
and yeah, speaking aboutdifficult numbers to count, but
how many of those may beduplicates?

I mean, so this is another extremely challenging
problem to count these, and it'ssomething that I would really
like to take another stab atmaking a good estimate.
So once we've published the newVigiM atch method, we're in a
better place to maybe quantifythis a bit better.
So the kind of classicalestimate of this is that,

(14:22):
roughly speaking, around aboutone in 10 reports will have a
detectable duplicate somewherein the database.
The true rate of duplication isextremely difficult to measure,
especially seeing as, as Imentioned in the article,
sometimes you have a pair ofreports which simply don't have
enough information to beassessed as being duplicates,

(14:49):
even though they may truly beduplicates.
So, moreover, you havedifferent factors affecting
duplication.
So if a patient has suffered aserious or fatal adverse event,
then that may well motivate morepeople to report that or kind
of stimulate a greater deal ofreporting.
So this is not necessarilyuniform across all different
adverse events.
So this is definitely a studythat I would very much like to

(15:11):
do to try and figure out abetter, get a better handle on
this number.
I would say we don't reallyknow.
I think this one in 10 numberis about correct-ish, or not a
bad estimate.

That's a fair amount of duplicates.

It's a fair amount of duplicates, yes.

Yeah, so, but let's say, then, that our
algorithm has helped us identifyduplicate reports; how do we
then decide which report to keep?
We are talking about weedingout duplicates here, and so
which do we keep and which do weweed out?
And because my guess is thatthe reports, although they are

(15:49):
about the same case, that theymay differ significantly, for
instance, with regards to theirlevel of detail and perhaps how
useful they are to us.
So what do we keep?
What do we weed out?

Yeah, absolutely.
So, you're completely right.
So the way that we choose whichreport is the "preferred report
is how we say it, or the kindof canonical report, is we use
an algorithm which was developedsome time ago to quantify how
complete a report is.
It's called VigiGrade and ittakes into account various

(16:22):
aspects on like whether dateshave been reported and dosages
and if there's free textinformation and things like this
.
So if you have a set ofduplicates, then you would
choose the most complete reportof those.
If you then find that you haveseveral reports which are
equally complete, then we go tothe one which has the most

(16:42):
recent update.
So, yeah, if you find that wehave several reports which have
the same completeness, then wechoose the report which has the
most recent update in VigiBase.

Could there ever be a case where a less
complete report would have moreinteresting or valuable
information than the morecomplete report?

Yeah, I mean that's definitely a possibility and I
think this speaks to a pointwhich is nice to raise actually,
which is that duplicatedetection can kind of be used in
several different settings,right?
So I I described in my articlethat duplicates can be a big
problem for statistical signaldetection, where you're
typically just looking at thedrugs and events on reports and

(17:27):
in that instance it doesn'treally matter which is the more
complete one, you only care thatyou have the events and the
adverse events and drugs there.
But if you are a signalassessor and you're sitting with
your case series of 100 reportsand you're deduplicating that,
then typically the way we use itin practice is we don't delete

(17:48):
and hide from you the duplicates.
We instead flag them and say,oh, these are the ones that are
duplicates, so that then in thecase that you mentioned, when
there's multiple reportsreferring to the same case and
maybe they have differentinformation, then the signal
assessor can look at those andmake an informed judgment about

(18:09):
that

As you write towards the end of your article,
you say that effectiveduplicate detection is just one
cog in the machine ofpharmacovigilance.
What other parts of thepharmacovigilance machine are we
using AI or machine learningfor?
And you already told us aboutthe VigiGroup algorithm, and now

(18:32):
we have this other algorithmalso, for assessing how complete
a report is.
But do you have any otherexamples for us?

Sure.
So one of the problems that I'vebeen working quite a bit on,
I've had a couple of master'sstudents over the last couple of
years working on this problemwith me, has been to try and
extract information from productlabels.
So the product labels are adocument which is published
alongside when a drug isauthorized in a certain market,

(19:06):
describing all sorts of thingslike guidelines for how to use
that drug, known adverse eventsfrom the clinical trials or from
post-marketing surveillance,and typically these documents
are just a free text document.
They're just published as a PDFor a Word document on the
website of the regulatoryauthority and it turns out that

(19:30):
it's a very useful thing to beable to know what adverse events
are already known for a drug,and these tend to only be listed
in these documents.
The reason that it's importantto know this is that when we're
doing signal detection at UMC orwherever you are, you don't
want to waste your time lookingat stuff that's already known.
You want to try and find thestuff that's hurting people

(19:54):
which is not already known.
So we've been using AI, machinelearning techniques to try and
mine the natural language inthese documents to try and
extract all of the known adverseevents for a given drug, so
that we can then use thatinformation to prioritize which

(20:14):
combinations are looked at bysignal assessors downstream.

Looking at how fast things are moving in these
areas.
Where do you think we will bein, say, two years' time, with
regards to both how thecommunity is using this
technology and how we at UMC areusing it?
What do you think?

So I think, I mean the elephant in the room to a
certain extent Is these largelanguage models that have really
taken the world by storm in thelast year or two.
And I think the next one ortwo years is going to see a real
explosion in the use of thesewithin the context of
pharmacovigilance.
We've already beenexperimenting with them, both

(20:56):
for extracting information fromthese product labels, as I
mentioned earlier for thatproblem, but also for
summarizing or making inferencesabout case reports, looking for
those cases which aresuggestive of possible causal
things.
But I think the whole communityis still kind of trying to

(21:20):
understand how best to use thesemodels and also, critically,
how to understand how to usethem effectively and safely,
because we all know that thesemodels hallucinate a lot.
So I would say yeah, in the inthe next next couple of years,
that's where I imagine thebiggest shift is going to come
from, like grappling with andbeginning to understand how to

(21:43):
really leverage these largelanguage models in the context
of pharmacovigilance.

So, you said now, you mentioned
hallucinations and here, when weare heading into this future,
are there any specific pitfallsyou think we need to be
particularly mindful of?

Yes, I think there can be a tendency, because these
models, when we use them, Imean they appear to be so good,
there's a tendency to overtrustin them, I think.
So building systems, either forevaluation or just safety nets
in any implementation of thesein practice to avoid that kind

(22:24):
of cognitive bias of blindlytrusting them, is going to be
extremely important goingforward.

Thank you very much, Jim, and to finish off,
do you have a dream algorithmthat, you know, something that
you would like to pursue, givenunlimited resources?

Yeah, given unlimited resources.
So this is a very, verypreliminary experiment we were
running recently at UMC, butwhat we were playing around with
was using large language modelsto take a case series and
perform a signal assessment Inthe sense.
So basically what the algorithmwould do is it would go through

(23:06):
and it would look for differentpieces of evidence in each
report and then summarize thatat the end saying or there is,
you know, evidence from five outof the 10 reports for a
dechallenge, like the reactionstopping after the drug was
discontinued, or differentpieces of information like this
following the Bradford Hillcriteria for pharmacovigilance,
for case series assessment.

(23:28):
And, you know, given unlimitedresources and unlimited research
time, I think there's a lot ofpromise in this approach for
being able to really bringforward and highlight to our
human signal assessors to lookat in more depth the most
suggestive case series.
Yeah, I think that it's areally promising and exciting

(23:49):
area, but it would requireunlimited resources, I think, to
pull this off.
It would be expensive.

Thank you very much for coming to the show,
Jim.
I learned a lot today and lookforward to having you here again
someday soon and hear moreabout what you're working on.

Thanks for having me.

If you'd like to know more about artificial
intelligence inpharmacovigilance, check out the
episode show notes for usefullinks.
That's all for now, but we'llbe back soon with more long
reads, as well as our usualin-depth conversations with
medicine safety experts.
In the meantime, we'd love tohear from you.
Reach out on Facebook, LinkedInand X.

(24:31):
Send comments or suggestionsfor the show or questions for
our guests next time we open upfor that, and visit our website
to learn more about what we doto promote safer use of
medicines and vaccines foreveryone, everywhere.
If you like the podcast, pleasesubscribe to make sure you
won't miss an episode.
And spread the word so otherlisteners can find us too.

(24:51):
For Drug Safety Matters, I'mFredrik Brouneus.
Thanks for listening.

All Episodes

Uppsala Reports Long Reads – Weeding out duplicates to better detect side effects

Episode Transcript

Popular Podcasts

Stuff You Should Know

24/7 News: The Latest

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Uppsala Reports Long Reads – Weeding out duplicates to better detect side effects