Course 13 - Network Forensics | Episode 4: Log Analysis, SIM Correlation, and Network Attack Signature Detection - CyberCode Academy

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Welcome back to the deep dive. Today, we are peering
into what you might call the silent history of your network.
We're going to be doing a deep dive into log analysis.
I mean, that's the record of every single tiny event
happening across your devices, your riders, your switches, everything.

Speaker 2 (00:17):
Think of it as the complete diary of your entire infrastructure.
And our mission today is really to explain why logging
the right information isn't just a chore, it's really the
foundational first line of defense in cybersecurity. We're going to
break down how analysts turn this this absolute tidal wave
of digital noise into alerts that actually mean something.

Speaker 1 (00:38):
Okay, so let's jump right in with the biggest problem,
the one that every security operations center faces, which is
just the noise. The noisy as especially with something like
an intrusion detection system and IDs right out of the box,
they just seem to generate a crazy number of false positives.

Speaker 2 (00:53):
Oh, it's a massive issue as one that really undermines
the whole security posture of an organization. I mean, when nine,
maybe even ninety five percent of your alerts are just noise,
it's just junk. Yeah, it's just junk, and it creates
a serious psychological fatigue. It's a real human problem. If
the system is constantly crying wolf, what do you think happens.

Speaker 1 (01:14):
Well, your security teams, they're just conditioned to ignore it.
They just tune it out completely.

Speaker 2 (01:18):
Exactly, And we have a chilling real world example of
where that exact thing happened.

Speaker 1 (01:24):
I think I know the one you're talking about.

Speaker 2 (01:25):
The home depot breach. They had FireEye appliances that were
correctly alerting them to suspicious activity. The system was doing
its job.

Speaker 1 (01:36):
But no one was listening.

Speaker 2 (01:37):
No one was listening. The organizational culture was already so
conditioned to disregard this high volume of false positives that
the critical, legitimate warnings were just missed. Wow, a sophisticated attack,
completely missed because the staff had literally learned to tune
out the noise.

Speaker 1 (01:54):
That really drives home a crucial point. Then, before you
can even think about analysis, you can't just collect data blows.
You have to have a.

Speaker 2 (02:01):
Baseline, You absolutely must. You have to define what normal
looks like in your environment. Only then do you have
any idea what the right stuff to log even is.

Speaker 1 (02:10):
Otherwise you're just creating the conditions for failure yourself.

Speaker 2 (02:13):
That preparation is it's mandatory. So when we talk about
the best practices for logging, you have to start at
the strategic level. You need a formal logging strategy. You
need to structure your log data consistently, which is so
critical for machines, to process it, centralize it all using
EQUE ideas, and of course keep it all in real time.

Speaker 1 (02:34):
So let's talk about the integrity of that record. You
say logs are this historical record for accountability. How far
does that go. It sounds like you have to treat
them almost like they're sacred.

Speaker 2 (02:44):
That's a great way to put it. Yeah, they must
be treated exactly like a general ledger in accounting. Okay,
think about it. If someone can go back and change
the history of financial transactions, the entire record is meaningless worth.
Same thing is true for security logs. Because they provide
that accountability, their integrity is paramount. This means your storage,

(03:05):
your system, it has to enforce read only access. Even
your administrators shouldn't be able to write over old files.

Speaker 1 (03:12):
But wait, if an attacker gets control of the machine
that's generating the logs, couldn't they just change them before
they're even sent, or just turn logging off.

Speaker 2 (03:22):
That is the nightmare scenario, and it's why we have
layers of integrity controls. So for logs that are already
stored centrally, we rely on technical countermeasures. Yeah, the main
one is hashing.

Speaker 1 (03:33):
Okay, so you mean like SAHA two fifty six or
something similar.

Speaker 2 (03:36):
Precisely, as soon as a log file is generated and stored,
you immediately run a strong hashing algorithm on it. Then
you store that resulting hash somewhere else, somewhere secure. Sometimes
you'll even use something called hash chaining.

Speaker 1 (03:49):
And hash chaining that means if you change one log
it messes up the entire sequence after it.

Speaker 2 (03:53):
Exactly, if you modify even a single character in that
log file, the cryptographic hash changes completely, immediate proof of tampering.
But you have to store that hash off device, because
if the attacker owns the log file, they can just
make a new hash for their new file.

Speaker 1 (04:08):
Got it. So that separate secure hash is your audit mechanism,
and so hashing is for integrity. But what about just
keeping people from reading them? Confidentiality?

Speaker 2 (04:18):
Right, And that's where encryption comes in. Logs can have
really sensitive stuff in them, personal info network blueprints, so
they have to be stored in an encrypted form. So
you've got encryption for confidentiality protecting the content, and hashing
for integrity protecting the history.

Speaker 1 (04:33):
Okay, let's pivot a little to the sheer scale of this.
All these controls for every single network event that must
create a monster of a storage problem. I know NIST
has guidance on this, but what's the real world limitation here?

Speaker 2 (04:48):
The limitation is pretty simple. You cannot store everything forever.
It's just not feasible, and retention policies usually dictate, you know,
ninety days, six months, maybe a year, and attack They
know this. They are keenly aware of this storage problem,
which is why they use low and slow tactics.

Speaker 1 (05:06):
And how does low and slow play on those retention limits, So.

Speaker 2 (05:09):
They'll spread their attack out over a very long period,
say eighteen months, just incredibly subtle little moves. By the
time an analyst finally starts seeing activity, it looks like
a breach.

Speaker 1 (05:18):
The initial logs are gone.

Speaker 2 (05:19):
The initial reconnaissance logs from a year ago have already
been cycled out of retention poof gone, and that makes
correlating the full timeline of the attack, well, basically impossible.

Speaker 1 (05:31):
So to fight that, we centralize. I know cislog is
sort of the old standard for collection, but what about
for the actual analysis.

Speaker 2 (05:38):
Yeah, cislog is still the most common open source protocol
for just getting all the logs in one place. But
cislog is really just the delivery truck. It's not the brain,
not at all. To really analyze and correlate logs from
all these different sources. You've got files, operating systems, network
traffic applications, you need a security, information and event management system.

Speaker 1 (06:00):
So the SIM is the brain.

Speaker 2 (06:01):
The SIM is the correlation engine exactly. It's not a
logging system itself. It's an analysis utility that takes in
all those different log formats, normalizes them, and then gives
you near real time notifications by connecting the dots between
events that might be hours or even days apart on
totally different systems.

Speaker 1 (06:18):
Okay, let's get into some practical indicators then, starting with
what you said is the most basic but mandatory requirement
log all unsuccessful authentication attempts. Why is that always the
first giveaway?

Speaker 2 (06:32):
It's purely a numbers game. You have your baseline, right,
Let's say your company averages. I don't know one hundred
unsuccessful log ins atay, people forget passwords, they have typos.
It's normal, sure, But then suddenly your sim dashboard lights
up with ten thousand unsuccessful attempts in a single day,
all coming from a handful of ips.

Speaker 1 (06:52):
Yeah, that's not a typo.

Speaker 2 (06:53):
That is not a typo. That exponential spike is an
undeniable indicator of a brude force or a dictionary attack.

Speaker 1 (06:59):
No question, It's not just the failures. The successes can
be just as telling, maybe even more so, when an
account is already compromised.

Speaker 2 (07:05):
Oh absolutely, you have to log successful authentications. So if
your baseline is a thousand successful log ins a day
for your eight hundred employees, but all of a sudden
you see twenty thousand successes and they're all coming from
just three user accounts from locations that are geographically impossible.

Speaker 1 (07:21):
That means the credentials are out there. Someone is using
them repeatedly.

Speaker 2 (07:26):
To get into multiple different resources. It's a clear sign
of compromise.

Speaker 1 (07:30):
This data problem keeps bringing us back to this idea
of reduction. You mentioned audit reduction tools or preprocessors. Why
do we still need to strip away volume. Even with
all the modern big data tools like hadoop and machine learning.

Speaker 2 (07:43):
Well, it really comes down to cost and the performance
of the sim itself. These audit production tools, they're designed
to shrink the sheer volume of records before they ever
hit that expensive correlation engine. Okay, and while yeah, big
data and mL help you analyze law more efficiently, they
still can't efficiently process irrelevant junk. You're just wasting cycles.

Speaker 1 (08:05):
And money, which leads us right into this cautionary tale
about the massive cloud bills. Tell us what the financial
risk if you get this step wrong.

Speaker 2 (08:13):
It's an operational failure story. I've seen way too many times. Unfortunately,
I've had customers they get so excited about deploying a
new correlation sensor like Splunk, but they make one catastrophic mistake.
They bypass normalization and they just point all their raw
vent logs, all their network traffic directly at the correlation system.
And the consequence of that is the consequence is a

(08:34):
sudden massive bill. The system is designed for filtered data,
but they're just flooding it with raw, unprocessed traffic. And
since the cost for a lot of these cloud based
sims is based directly on data ingestion. Oh no, in
a bill. In one case, I remember it was upwards
of eighty thousand dollars. Oh for that first period of ingestion.
Just because they flooded the sensor, the panic sets in,

(08:56):
they shut it down, but the damage is done. Shows
you that filtering and reduction aren't just for efficiency, they're
a critical financial guardrail.

Speaker 1 (09:05):
All right, this is where the deep dog gets really technical.
Let's look at the actual signatures attackers leave behind even
when they're trying to cover their tracks. Starting with a
basic ping or ICMP.

Speaker 2 (09:15):
The beauty of ICMP is how predictable it is. The
standard ping request. The payload has this really recognizable pattern.
It's just the alphabet abcd fgh repeating. That's your baseline.
So if a log shows an ICMP packet with a
payload that is anything other than that standard sequence, it
should trigger a high severity alert immediately.

Speaker 1 (09:38):
Why what would a different payload even mean.

Speaker 2 (09:40):
It's a very strong indicator of covert communication of tunneling.
Attackers are trying to hide data exfiltration or command control
traffic inside the ICMP protocol, trying to make it look
like a simple network health check. You basically just spotted
a secret tunnel.

Speaker 1 (09:55):
We can also track the very foundation of network communication,
right the TCP three way handshake s y n ctodss ack.
Which applications should we be watching for after we see
that handshake.

Speaker 2 (10:07):
Well, if you see a successful handshake and then activity
on port twenty one, that's FTP file transfer. The logs
for that session will show you everything in plain text.
The log in the password directory changes, any.

Speaker 1 (10:18):
File deletions, and port twenty three.

Speaker 2 (10:20):
Port twenty three is telnet. And if you see talentet
activity and it has references to nt land Manager NTLM authentication,
that's a huge red flag.

Speaker 1 (10:28):
Why is NTLM specifically such a big deal.

Speaker 2 (10:31):
NTLM is a legacy Windows authentication protocol. It's much less secure.
Modern networks should be using Carberos. So if an attacker
is somehow triggering NTLM instead, it suggests they're trying to
force a downgrade attack or exploit older vulnerabilities. It's a
little technical nugget hiding right there on the logs.

Speaker 1 (10:51):
Now, let's talk about spotting reconnaissance. How do the logs
help us tell the difference between just a single connection
and someone who's actively mapping out our whole network.

Speaker 2 (11:01):
It all comes down to pattern recognition. You're looking for
predictable changes in either the destination IP or the destination port. Yeah,
a ping sweep is when the source IP stays the same,
but the destination IP just increments.

Speaker 1 (11:14):
Predictably, like it's going down the list dot thirty four,
thirty five, dot thirty.

Speaker 2 (11:18):
Six, exactly, just checking to see who's home.

Speaker 1 (11:20):
And a port scan is just the opposite of that.

Speaker 2 (11:22):
Yes, in a port scan, the source IP is constant,
but now the destination ports are changing. You're scanning one
machine for all its open services, and if you see
it going sequentially port one, part two, Port three, that's
a really sloppy scan that any IDs will catch.

Speaker 1 (11:36):
But the modern scanners are smarter.

Speaker 2 (11:38):
Oh yeah, way smarter. A tool like endmap, it's not
going to be sequential. It'll jump around to common ports.
Check four forty five for FMB, then jump to three
three eighty nine for RDP, then maybe one thirteen. It
does this specifically to try and avoid those simple detection rules.

Speaker 1 (11:53):
We can also look for specific port numbers that are
basically known calling cards for malicious tools.

Speaker 2 (11:58):
Definitely, logs will often just light up with non trojan
activity based on the port Port one two three four five,
for example, is famously links to the old netbus trojan
and of course the classic port three one three three.

Speaker 1 (12:10):
Seven spells elite in lite speak, the.

Speaker 2 (12:12):
Old school hacker calling card, a known indicator of backdoor access.

Speaker 1 (12:16):
Okay, let's get even deeper into the mechanics of how
an idea something like snort actually identifies these scans by
looking at the TCP flags.

Speaker 2 (12:24):
This is where it gets really granular. A normal TCP
packet starting a connection should just have the s YN
flag set. Yeah, that's it. So if an I DSc
is a packet coming from the outside that is an
SYN packet, it's suspicious.

Speaker 1 (12:35):
Well like a null scan.

Speaker 2 (12:36):
A null scan is a perfect example. That's a packet
with no flag set at all, that is basically an
anatomically impossible packet for a legitimate connection to guaranteed alert.

Speaker 1 (12:46):
And then there's the infamous Christmas scan, the one that's
all lit up.

Speaker 2 (12:49):
The Christmas scan, right, it's defined by having the FIN,
the urgent, and the PSH flags all set at the
same time. And to know why that's bad, you have
to know what they mean. Fin means I'm done with
this connection. PSH means push this data now, RG means
that data is.

Speaker 1 (13:08):
Urgent, So you're trying to close the connection while also
urgently pushing data.

Speaker 2 (13:13):
It makes no logical sense, No legitimate connection would ever
do that. The flags are all lit up like a
Christmas tree, hence the name, and it's designed to confuse firewalls.

Speaker 1 (13:21):
And the IDs is so specific it can even tell
what tool was used.

Speaker 2 (13:24):
Yes, the rules are that specific. A good IDs will
have rules that can tell the difference between say an
n MAP Christmas scan which only sets fin URG and PSH,
and a more traditional one that might set all the flags.
You know, not just what's happening, but maybe even the
exact tool they're using.

Speaker 1 (13:38):
So this whole deep dive really reinforces that the end
goal is always correlation. Taking these raw logs from files,
the OS, network traffic applications, and then using a SIM
to put it all together into one single actionable picture.

Speaker 2 (13:54):
And while the industry is just saturated right now with
marketing about AI and machine learning and give you precognitive abilities.

Speaker 1 (14:01):
Like seeing attacks before they happen.

Speaker 2 (14:03):
Right like the pre cogs and Minority Report, the reality
is we still rely heavily on very carefully tuned baselines,
very precise rules, and the ability to ruthlessly, ruthlessly cut
through the noise.

Speaker 1 (14:15):
It's that tension, isn't it, between the sheer scale of
the data and the fact that you just can't store
or process everything forever.

Speaker 2 (14:21):
That is the defining challenge, which brings us to a
final thought for you, the listener. Okay, if you are
a security analyst and you're dealing with millions of log
entries every day and your storage is severely limited, what
single non authentication data point would you prioritize logging above
all else to uncover a really sophisticated, low and slow
internal threat. Think about priority, integrity and reduction as you

(14:44):
moll that over

All Episodes

Course 13 - Network Forensics | Episode 4: Log Analysis, SIM Correlation, and Network Attack Signature Detection

Episode Transcript

Popular Podcasts

Stuff You Should Know

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Course 13 - Network Forensics | Episode 4: Log Analysis, SIM Correlation, and Network Attack Signature Detection