NPM Nightmare: & Cloudflare AI That Secured End Users From 2 Billion Weekly Malicious Downloads

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Welcome back to upwardly Mobile. I'm George and I'm Sky.

Speaker 2 (00:03):
Today we're jumping straight into something huge that hit early
September twenty twenty five, a really massive NPM supply chain compromise.

Speaker 1 (00:11):
Yeah, this wasn't some small incident. We're talking about a
major event in the open.

Speaker 2 (00:15):
Source world exactly. And when we say massive scale, we
mean it. Attackers got into trusted maintainer accounts. They pushed
malicious code into eighteen you know, really widely used NPM packages,
actures like chalk debug.

Speaker 1 (00:29):
Yeah, things developers use.

Speaker 2 (00:30):
Constantly, right, and these libraries they account for get this,
over two billion downloads every single week, So.

Speaker 1 (00:38):
The potential impact is just staggering. And for you listening
as a mobile developer, this is crucial. It wasn't just
about traditional websites, not at all.

Speaker 2 (00:46):
Think about modern cross platform apps, you know, the ones
built with React Native or Ionic Cordova. They lean heavily
on these NPM dependencies.

Speaker 1 (00:54):
Which means the malicious code wasn't just on some server,
It was executing right inside the mobile apps JavaScript run.

Speaker 2 (01:00):
Time exactly, directly impacting end users, potentially stealing data and
compromising the entire development supply chain feeding into those apps.

Speaker 1 (01:09):
Okay, so our mission today seems clear. We need to
unpack the two quite distinct and dangerous attack methods they used.

Speaker 2 (01:17):
And then look at the defenses, specifically the advanced AI
driven stuff that actually proved effective against this will this
really novel threat.

Speaker 1 (01:25):
Let's start right at the beginning. Then, how did they
even get in? These are trusted maintainers presumably following good
security practices.

Speaker 2 (01:32):
It really boils down to the classic vulnerability, doesn't it
the human factor? But uh executed with scary precision. So
phishing a highly targeted fishing campaign. Yeah, the emails looked
exactly like official NPM security alerts, urgent tone, demanding maintainers
update their two FA credentials like right now.

Speaker 1 (01:51):
Threading account lock ad i, bet yep, a hard deadline.

Speaker 2 (01:54):
The works designed to create panic and bypass careful thought.

Speaker 1 (01:57):
And the domain name they used was key right.

Speaker 2 (02:00):
Very deceptive, incredibly so npmjs dot help close enough to
the real npmjas dot com to fool someone who's busy
or just not paying close enough attention.

Speaker 1 (02:10):
So fundamentally, it wasn't a technical flaw in NPM itself.
It was social engineering, pure and simple, that's the core
of it.

Speaker 2 (02:17):
One trusted developer, maybe rushing, maybe distracted, clicks the link
enters their credentials and boom.

Speaker 1 (02:23):
The domino effect. That single compromise opens the door exactly.

Speaker 2 (02:27):
It cascaded downstream, exposing potentially thousands of systems. Once the
attackers had those credentials, they moved fast, injecting heavily obfuscated
JavaScript into those eighteen packages.

Speaker 1 (02:38):
Okay, code injected. What was plan A? What did they
go after first? Was it immediate profit or something more
long term?

Speaker 2 (02:44):
Well, that's what made this so nasty. They did both
at the same time. The most visible part the quick
cash grab was this end user payload. Let's call it the.

Speaker 1 (02:52):
Cryptohijacker cryptohijacker okay, So websites and apps using these compromise packages.

Speaker 2 (02:58):
Instantly became silent cryptodrainers just siphoning funds without the user knowing.

Speaker 1 (03:02):
And the mechanism for this crypto stealing. It wasn't just
generic malware, was it. It sounds like it was.

Speaker 2 (03:09):
Quite specific, Oh, absolutely, very targeted. The malicious code was
designed to activate right there on the client side, so
in the browser, or crucially for our audience, in the
mobile app's WebView component.

Speaker 1 (03:21):
It wasn't just sniffing all network traffic that ye, yeah, No.

Speaker 2 (03:24):
It was laser focused, actively hunting for API requests going
to known crypto wallet interfaces.

Speaker 1 (03:31):
You mean, like if my React native app tries to
talk to a wallet app.

Speaker 2 (03:34):
On the phone, precisely, it specifically targeted things like Ethereum's
window dot ethereum interface. So when a user tries to
make a transaction Bitcoin Ethereum, Solana, whatever, this JavaScript intercepts.

Speaker 1 (03:47):
It right there on the device. Yeah, before it even.

Speaker 2 (03:49):
Leaves exactly midflight. On the client side, it silently swaps
the intended recipient's crypto address with the attacker's address. Then
the transaction goes through, signed and sealed, but to the
wrong place.

Speaker 1 (04:00):
And the user likely wouldn't notice until it's far too late.
Complete seamless happening client side. So for mobile developers, that
really underscores a harsh reality. Your back end APIs can
be locked down tight, but if your client side JavaScript
dependencies are compromised.

Speaker 2 (04:16):
Your app's execution environment is the vulnerability. It confirms that
your whole dependency stack is now a critical attack surface,
just as critical as say an exposed API gateway Okay, so.

Speaker 1 (04:27):
That's the crypto hijacker training user funds. But you said
there was a second payload, something darker aimed at the
developers themselves.

Speaker 2 (04:34):
Yes, alongside the crypto theft, they deployed something far more
insidious for long term compromise. This is where the stolen
CICD tokens become critical. They deployed a worm.

Speaker 1 (04:46):
Or worm you mean, like self replicating exactly.

Speaker 2 (04:48):
They nicknamed it the shi Hulud worm, and this represents
a serious escalation. It's designed for autonomous propagation. It spreads
itself through the victim's development infrastructure.

Speaker 1 (04:57):
How did it spread? What was the mechanism?

Speaker 2 (04:59):
It leveraged those stolen GitHub access tokens first off, and
then it weaponized a core often trusted NPM feature post
install scripts.

Speaker 1 (05:09):
Uh, the scripts that run automatically after NPM install, usually
for setup or compilation, right.

Speaker 2 (05:14):
But here they were turned into vectors. The worm used
them to spread laterally across developers projects, across their code base.

Speaker 1 (05:21):
Okay, so once Shahoulude got a foothold inside a build
environment using these post install scripts and stolen tokens, what
did it actually do? What were its goals?

Speaker 2 (05:31):
It executed this really devastating three stage automated sequence, all
orchestrated via the Geithub API using those stolen credentials.

Speaker 1 (05:40):
Three stages code Stage one.

Speaker 2 (05:42):
Stage one repository theft, pure and simple. It listed every
repository that compromised account had access to public, private, organizational
repos everything, not.

Speaker 1 (05:51):
Just cloning the main branch assume now a.

Speaker 2 (05:53):
Full mirror clone the entire commit history, all branches, total capture.
And then the really damaging part. It programmatically pushed all
those stolen private repos to a public mirror under the
attackers control.

Speaker 1 (06:05):
Oh wow, just instant mass date exposure, complete loss of
intellectual property. That's a nightmare, total nightmare.

Speaker 2 (06:12):
Then came stage two, workflow injection, compromising the build process itself.

Speaker 1 (06:18):
Wow.

Speaker 2 (06:19):
The worm injected malicious gethub actions workflows into the repositories
it could access, often named something benign like shi halud
dashworkflow dot EML.

Speaker 1 (06:29):
And the purpose of this workslow file.

Speaker 2 (06:31):
Just one purpose ye wait for the next CICD run,
then systematically scrape every secret exposed during that build run
time apikeys, cloud tokens, environment variables, everything and exfiltrate them
immediately to web hoooks controlled by the attackers.

Speaker 1 (06:45):
So even if you found and fixed the original compromised
NPM package, the attackers now have persistent access through your
build secrets.

Speaker 2 (06:53):
They own the bipeline precisely, they established persistence, and then
stage three was kind of a final brute force sweep harvesting.

Speaker 1 (07:00):
What did that involve?

Speaker 2 (07:01):
The womb actually programmatically downloaded and ran a legitimate, well
known open source secret scanning tool.

Speaker 1 (07:06):
You might have heard of it, truffle hog using a
security tool against the victim. That's bold, isn't it?

Speaker 2 (07:12):
The irony using truffle Hog to scan the entire local
file system the cloned repo contents, looking for any high
entropy strings, keys, SERTs, anything sensitive that wasn't maybe exposed
as an environment variable but was still lying around in
the code or can fig.

Speaker 1 (07:29):
Files, and then probably tried to delete the tool to
call its.

Speaker 2 (07:32):
Trass you got it, attempted clean up afterwards.

Speaker 1 (07:34):
That is, that's an incredibly sophisticated attack architecture. The crypto
hijacker got a lot of attention because it was immediate
and widespread.

Speaker 2 (07:43):
Right it was detected relatively quickly.

Speaker 1 (07:45):
But trend Micro mentioned they haven't actually seen active detections
of this shi hulud worm in the wild yet, which
suggests maybe it was deployed more selectively, or it's just.

Speaker 2 (07:55):
Waiting, or it's incredibly stealthy. Either way, its complexity signals
are really significant. Escalation in threats targeting developer environments. We
need to be ready for this kind of thing becoming
more common.

Speaker 1 (08:06):
Which perfectly transitions us to the defenses. Because amidst all
this bad news, there was good news. Some systems did
catch this. Let's talk about how advanced security solutions like
cloud Flare, Paige Shield detected this attack even though it
was novel and heavily obfuscated.

Speaker 2 (08:23):
Yeah, this is where things get interesting on the defense side,
because you're right, traditional methods would likely struggle here.

Speaker 1 (08:28):
I mean, the scale alone is mind boggling. You mentioned
cloud Flare assesses something like three point five billion scripts
a day. That's like forty thousand per second. Signature based
scanning just can't cope with that volume.

Speaker 2 (08:41):
In the novelty exactly, signatures rely on matching known bad
patterns known strings. Obfuscation is specifically designed to break that,
so you need a different approach.

Speaker 1 (08:51):
So if obfuscation hides the words of the code. How
did the defense read the intent?

Speaker 2 (08:56):
That's the perfect way to put it. Instead of looking
at the words, this AI approach accept the grammar and
the sentence structure.

Speaker 1 (09:02):
Of the code. How does that work in practice?

Speaker 2 (09:04):
Okay? So they take the JavaScript, obfuscated or not, and
pre process it into something called an abstract syntax tree
an AST.

Speaker 1 (09:11):
Right, the AST, it's like a structural blueprint of the code,
stripping away variable names and comments, focusing on the.

Speaker 2 (09:19):
Logic flow precisely. It reveals the underlying structure, regardless of
how the attacker tried to hide it. This AST, this
structural map is then fed into a specialized AI model,
a message passing graph convolutional network or MPGCN.

Speaker 1 (09:34):
Okay, a graph neural network. So it's analyzing the relationships
within the code structure. You got it.

Speaker 2 (09:39):
It's not trained on specific malicious code snippets. It's trained
on the patterns of malicious behavior reflected in the structure.
Things like the attempt to access sensitive browser APIs like
window dot ethereum, the logical steps involved in redirecting data,
the typical structure of code designed to exfiltrate information.

Speaker 1 (09:58):
Ah. So, because the underlying logic the attack remains the
same even if the code looks different due to obfuscation,
the graph model can still spot it.

Speaker 2 (10:06):
That's the key. It makes the defense inherently resilient to obfuscation.
It can recognize novel attack stuff it's never seen before
that wasn't in its training data, purely based on the
malicious shape of the code's logic.

Speaker 1 (10:18):
And the performance numbers were impressive too.

Speaker 2 (10:20):
Staggering, really inferencing. Making the decision happens in under point
three seconds per script, and cloud Flare confirmed this MPGCN
approach would have successfully flagged all eighteen of those compromised
MPM packages. It demonstrates real resilience against zero day threats
like this.

Speaker 1 (10:37):
That speed is critical at their scale. But here's a
potential issue. If the AI is that good at spotting
suspicious behavior patterns in scripts, doesn't it generate a ton
of false positives?

Speaker 2 (10:50):
Ah? Yeah. That is the absolute hardest part of this
whole field, because lots.

Speaker 1 (10:54):
Of legitimate scripts do weird things right, heavy user tracking,
dynamic script injection, complex analytics.

Speaker 2 (11:00):
Exactly legitimate scripts, especially in areas like e commerce or
advertising tech, often looks structurally similar to malware. They read
form inputs, they monitor network activity. They might even use
their own deobpuscation techniques to protect their IP.

Speaker 1 (11:12):
So how bad is the false positive problem?

Speaker 2 (11:15):
They mentioned seeing roughly two complex false positives every second.
That requires a massive amount of human review effort to
sift through.

Speaker 1 (11:22):
That just doesn't scale long term with human analysts alone.
What's the path forward? How do they improve the accuracy?

Speaker 2 (11:30):
The next step seems to be moving towards what they
called agentic AI approaches, meaning meaning combining this static analysis
looking at the code structure the AST with dynamic analysis
actually executing the suspect code in a safe sandbox environment
and observing what it does. Ah.

Speaker 1 (11:49):
Okay, so you look at the structure and you watch
its behavior. Right.

Speaker 2 (11:52):
This combination should help resolve those really tricky cases, for example,
dynamically checking if the domains the script tries to to
connect to are actually trustworthy. That's often the key differentiator
between a sophisticated but legitimate tracker and actual malware. Static
analysis might struggle with that, but dynamic analysis can verify it.

Speaker 1 (12:11):
Okay, This whole incident paints a really vivid picture. It's
modern supply chain warfare hitting users and developers. So let's
bring it home for the mobile security pros and developers listening,
what are the absolute must do actions right now.

Speaker 2 (12:24):
Based on this okay four key immediate actions. First, audit
your dependencies seriously. Dig into your package dashlock dot json
are equivalent. Review recent updates, especially anything pulled in around
early September twenty twenty five. Pin your dependencies to known, good,
verified versions. Don't just trust latest.

Speaker 1 (12:43):
Given the stealth and danger of that shy huludworm. The
second action feels non negotiable absolutely.

Speaker 2 (12:48):
Second, credential rotation. Now revoke and reissue all CICD tokens,
cloud credentials, API keys, anything potential exposed in your build environment.
Assume compromise. Assume those price repos were cloned and.

Speaker 1 (13:01):
Scanned, encircling back to the root cause the human element.

Speaker 2 (13:04):
Third, MFA and least privilege. Enforce multi factor authentication everywhere
on developer accounts, on CICD service accounts everywhere, and critically
tighten those access policies. Implement true least privilege. No single
compromised account should have the power to, for example, perform
a full mirror clone of all your organizations or repositories.

(13:24):
Limit that blast radius.

Speaker 1 (13:25):
Makes sense, and the final action proactively hunting for traces
of that developer focused payload. Yes.

Speaker 2 (13:32):
Fourth monitor for scanning tools. Actively scan your build logs,
your system logs for any signs of unauthorized repository scanning.
Look for unexpected network activity during builds. Specifically, look for
evidence of tools like truffle Hog being run where they
shouldn't be. Automated defenses are vital, but some manual targeted

(13:53):
auditing for this specific threat is crucial right now.

Speaker 1 (13:56):
Excellent summary. This has been a really thorough breakdown of
the source material on this NPM incident.

Speaker 2 (14:01):
Thank you, my pleasure. It's critical stuff to understand and
just for transparency. This analysis was put together using insights
from human security researchers and reporting, with some assistance from
AI tools to synthesize the information.

Speaker 1 (14:14):
Right. So here's a final thought for everyone listening. We
keep seeing how fragile the open source ecosystem can be.
Huge chunks of global software infrastructure fundamentally rely on the
security of individual developers, sometimes down to just their email
password security. It feels like the core challenge for the industry,
doesn't it.

Speaker 2 (14:31):
It really does.

Speaker 1 (14:32):
So the question is how do we really shift our
security models. Is it all about zero trust? Is it
more AI defenses? Like we discussed, how do we mitigate
the inevitable fallout from the next time human error opens
the door? And how fast can we get these more
advanced defences deployed universally so that one compromised account doesn't

(14:53):
risk grinding everything to a halt. Something to think about,
Definitely something to think about.

Speaker 2 (14:57):
Thanks for tuning into upwardly Mobile. We'll catch you next time.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}NPM Nightmare: & Cloudflare AI That Secured End Users From 2 Billion Weekly Malicious Downloads

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Dateline NBC

All Episodes

NPM Nightmare: & Cloudflare AI That Secured End Users From 2 Billion Weekly Malicious Downloads

Stuff You Should Know