Course 12 - Maltego Advanced Course | Episode 4: Custom Entity Design and Implementation in Maltego - CyberCode Academy

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Welcome to the deep dive, the shortcut to being well informed. Today,
we are getting into the architecture of information itself.

Speaker 2 (00:07):
We are.

Speaker 1 (00:08):
I mean, if you're in complex investigations, cybersecurity, even data science,
you know that the standard software labels things like IP
address or email, they just don't always cut it right.

Speaker 2 (00:20):
You run into these unique, specialized bits of data and
you need a way to integrate them properly into your
analysis platform.

Speaker 1 (00:26):
So that's the challenge we're tackling.

Speaker 2 (00:28):
That is the essential challenge. Yeah, when you're tracking, say
a fraud ring or a specific piece of malware, you
might find some kind of proprietary ID or a custom
role that your software just sees as generic text.

Speaker 1 (00:41):
And it has no idea what to do with it exactly.

Speaker 2 (00:43):
So our mission today is to wank you through the
methodical process of creating what we call a custom entity,
a specialized piece of information within one of these major
analysis platforms.

Speaker 1 (00:55):
And this isn't just about making it look different, right.
This process actually defines its functionality.

Speaker 2 (01:00):
It absolutely does.

Speaker 1 (01:01):
We're talking about designing a data type that's smart enough
to know what analysis actions or transforms it can even perform.

Speaker 2 (01:08):
Precisely. We're going to uncover the systematic steps, from crucial
naming conventions all the way to advance features like dynamic visuals.
The goal is an entity that's both informative and really
efficient for the investigator.

Speaker 1 (01:23):
If you can master this, you move from just being
a user of the tools to being a creator of
intelligence assets.

Speaker 2 (01:29):
That's a great way to put it. So let's start
with the foundation. What exactly is an entity in this context?

Speaker 1 (01:35):
It's essentially a set of rules. You can think of
it as a specification that lives on the client's machine.
It tells the software two main things. Okay, first how
this piece of information should look to the user it's
visual representation, and second what it can do, meaning which
transforms or analysis actions are available for it.

Speaker 2 (01:53):
And every single entity has to have at least one
property we're not to called the main property.

Speaker 1 (01:57):
That's the core value.

Speaker 2 (01:58):
That's the core value, and you'll interact with the most
Any other information you want to add, like a status
or a location, you can display right on the graph
using these things called overlays.

Speaker 1 (02:08):
Now, before we even start building, the one thing I
saw stressed over and over was naming convention. It's absolutely critical.
Why is that technical id so important.

Speaker 2 (02:17):
It's all about preventing conflicts, especially if you ever plan
to share your work or your transforms with anyone else.
Every entity needs a completely unique type.

Speaker 1 (02:26):
ID, and the standard ones have their own prefix.

Speaker 2 (02:29):
Yes, the platform standard entities, they always start with multig
that's multigo dot. It works like a name space, you know,
to keep things separate.

Speaker 1 (02:37):
So for our custom entities, we need our own distinct
name space exactly.

Speaker 2 (02:43):
The recommended way to do it is to start with
a unique identifier for your organization, maybe followed by a
category and then a short name.

Speaker 1 (02:50):
Can you give an example.

Speaker 2 (02:51):
Sure, Let's say you work for an organization called in
vecta Tech. You might create an entity ID like investitech,
dot gov, dot customer. That unique ID prevents so many
headaches down the road.

Speaker 1 (03:01):
Walk us through what happens if you ignore that. Let's
say two different teams in my company both decide to
create a new entity and they just call it customer.

Speaker 2 (03:09):
Oh, that's a recipe for disaster. Yeah. I mean if
one team's customer expects a Social Security number and the
other expects a customer ID. What happens when you try
to merge your data?

Speaker 1 (03:19):
The software has no idea which one is which it is?

Speaker 2 (03:21):
No idea, you get entity collisions, data gets lost, transforms
start failing because they're looking for properties that just aren't there.
It's a mess.

Speaker 1 (03:29):
So even for personal use, you should use a prefix always.

Speaker 2 (03:33):
Even something simple like my do worker is enough to
keep things clean.

Speaker 1 (03:37):
Okay, that makes the risk very clear. Let's actually create
that basic worker entity. Now we're modeling an employee. What's
step one?

Speaker 2 (03:45):
First we need to give it a human readable name
and some context. So step one identification, you set the
display name what the user actually sees like worker and
a short description, maybe a person working at a job.

Speaker 1 (03:58):
And that description shows up in the endity palette.

Speaker 2 (04:00):
It does. It's the first piece of guidance for any
analyst using it.

Speaker 1 (04:03):
Okay, So then step two classification. This is where that
unique name space comes in.

Speaker 2 (04:09):
Right, that's it. You enter the unique type name, our
secure ID, my dot worker. Then, just to keep things
organized in the tool, you pick a category something like personal.

Speaker 1 (04:20):
And finally step three visuals, Right.

Speaker 2 (04:24):
You pick an icon that will show up on the graph.
We could use a standard person icon, maybe find one
with a little gear on it to show they're a worker.

Speaker 1 (04:31):
All right, Now for the most important part defining the
main property.

Speaker 2 (04:35):
This is the critical field absolutely for our worker, we'll
give it a display name of worker name and a
unique name of just name. Then we have to pick
a data type like string integer exactly, string date integer
double for a name, string is almost always the right choice.

Speaker 1 (04:51):
Now, you mentioned the sample value, like putting John Doe
in there. Why is that so important? It feels like
just a placeholder.

Speaker 2 (04:57):
It's much more than that. It's really an enforce hint.
When an analyst drags that new worker entity onto the graph,
it comes pre filled with John Doe.

Speaker 1 (05:05):
Ah, so it shows them the form out of the
data you're expecting.

Speaker 2 (05:08):
Precisely. If your entity needs a complex sixteen digit alphanumeric key,
putting a valid example in the sample value shows the
user exactly what to type. It drives consistency from the
very first click.

Speaker 1 (05:22):
That's a great little detail for ensuring data quality. Okay,
let's talk about a huge efficiency booster. Inheritance. This is
where we stop building from scratch.

Speaker 2 (05:30):
Inheritance is just a foundational concept in good software design,
and it's invaluable here. It lets your new specialized entity
just absorb all the properties and most importantly, all the
transforms from an existing base entity.

Speaker 1 (05:43):
Can you give us a classic network example?

Speaker 2 (05:45):
Of course, think about a standard DNS name entity. It
already knows how to run transforms like find the IP
address or look up the registrar. Okay, Now, if you
create a more specific website entity and you tell it
to inherit from DNS name your website entity, he automatically
gets all those lookups. You don't have to write a
single line of new code.

Speaker 1 (06:04):
So I'm saved from rewriting an IP lookup transform for
every new kind of website or domain data I create.

Speaker 2 (06:10):
You are, But here's where the specialization comes in. Because
a website is more specific than a DNS name, you
could add transforms to it that wouldn't make sense on
the generic one, Like what well, a website entity could
have a transform like find email addresses on the homepage
that requires scanning web page content, which is something a
generic DNS name entity just shouldn't be doing.

Speaker 1 (06:33):
That sounds incredibly powerful, But what's the biggest trap here?
Is there a risk of doing it wrong?

Speaker 2 (06:38):
The trap is the main property. It's a huge one.
Let's say we want our worker entity to be compatible
with all the standard person related transform.

Speaker 1 (06:47):
By public record lookups exactly.

Speaker 2 (06:49):
So we should inherit from the built in multi got
person entity. But here's the crucial limitation. When you do that,
you have to decide right then and there if you
want to reuse the person entity's main property, which is
probably full name.

Speaker 1 (07:03):
And what if I get it wrong. Say I decide
later that I actually needed the worker's I D number
as the main property, not their name.

Speaker 2 (07:09):
You cannot change it. The main property cannot be changed
after the entity is created. You have to delete the
entire entity definition and start over from scratch.

Speaker 1 (07:17):
Wow, so all the other properties I might have added,
they're gone.

Speaker 2 (07:20):
Too, all gone. That choice is set in stone the
second you click create. It really underlines why that upfront
design and systematic approach is so non negotiable.

Speaker 1 (07:31):
Okay, that is a high stakes moment to remember. So
now that we have foundations and inheritance, let's talk about
maximizing the data's utility and reliability.

Speaker 2 (07:40):
Reliability really comes down to how you define your properties. See,
transforms can often return extra bits of information, which we
call dynamic properties, and those are useful, but but you
should never rely on them as input for your next
analysis step. If you need a piece of data to
be reliably present For another transform to work, you absolutely
must define it in the entity specification itself.

Speaker 1 (08:01):
And what kind of specialized property types can we use
beyond just strings and numbers?

Speaker 2 (08:06):
Well, there are some really useful ones like boolean, color,
and ur. And the type isn't just a label, It
actually changes the user interface. How so a date type
gives you a calendar picker, a color type gives you
a color wheel that spits out an HTML code. A
boolean is just a simple checkbox. They enforce correct data entry.

Speaker 1 (08:25):
That's smart. Now, to really maximize utility, we have to
talk about calculated properties. How do we make one property
get its value automatically from other properties?

Speaker 2 (08:36):
This is how you ensure data cleanliness without any manual work.
Calculated properties use special functions, usually in the property's default
value field. Imagine you have separate fields for first name
and last.

Speaker 1 (08:47):
Name, but you need a single clean full name field
for a report.

Speaker 2 (08:51):
Exactly, so we use special annotations to stick them together.
We'd use property to grab the value from the first
name field, add a space, then use property again for
the last name.

Speaker 1 (09:00):
But what if one of those fields is empty? You
could end up with a stray space.

Speaker 2 (09:04):
And that's where the other critical annotation, TRIM comes in.
You wrap the entire calculation in TRIM and it automatically
removes any leading or trailing white space. It guarantees clean
display ready data every single time.

Speaker 1 (09:16):
Excellent clean data is actionable data. Let's move on to
the visuals. I love this idea that an entity's icon
can literally change based on its data.

Speaker 2 (09:26):
It's a huge workflow accelerator. You can set the large
image on the graph to dynamically pull from the value
of one of its properties.

Speaker 1 (09:33):
So if a property's value is say danger, and I
have an iconfile named danger dot png.

Speaker 2 (09:40):
The platform will instantly display that icon on the graph.
It's incredibly powerful for at a glance analysis.

Speaker 1 (09:46):
And what about the overlays you mentioned? How do we
use those without making the graph unreadable?

Speaker 2 (09:52):
That is the key question. Overlays are small visual cues
you can place in five spots around the entity. They
can show text, color dot, or even a small image.
But you have to beware the Christmas tree effect.

Speaker 1 (10:04):
The Christmas tree effect, too many lights, too much noise.

Speaker 2 (10:07):
Exactly that. I've seen graphs where every entity has four
different color dots and you need a legend to figure
out what you're even looking at.

Speaker 1 (10:14):
So the rule is only use an overlay if it
provides genuinely critical information for an instant decision.

Speaker 2 (10:20):
That's the rule. A single red dot for critical status perfect,
A rainbow of colors for different data sources. Probably not helping.

Speaker 1 (10:28):
Okay, there's one more advanced concept we need for making
sure our data is reliable from the moment it's entered.
Regular expressions ah rej as.

Speaker 2 (10:38):
It plays a massive role in validating data. It has
two main jobs here, matching and extraction.

Speaker 1 (10:44):
Matching seems straightforward. It's how the platform recognizes an email
address when I paste it in right.

Speaker 2 (10:49):
That's it. You paste a random string and the platform
runs it against the library of ridgix patterns to see
if it matches the format of an email or a
phone number or an IP address, and then it creates
the right entity automatically.

Speaker 1 (11:01):
An extraction that sounds more involved.

Speaker 2 (11:03):
It is. Using rejx groups, you can match a complex
piece of data like a GPS coordinate in a single
string and parse it on the fly. You capture the
latitude part in one group and the longitude in another,
and route those values into separate properties on.

Speaker 1 (11:17):
Your entity, so it validates and organizes the data in one.

Speaker 2 (11:20):
Shot, exactly ready for analysis right away.

Speaker 1 (11:23):
Okay, let's bring all this home. Let's build our advanced
worker entity inheriting from that Multigo Dot person entity and
using all these cool dynamic features.

Speaker 2 (11:32):
Right. So, first we define three extra properties to describe
our worker, gender, skin, tone, and job. We're going to
mark them as required to force the analysts to fill
them in, and we'll give them good defaults like unknown.

Speaker 1 (11:46):
Now for the magic, the part that drives that dynamic icon,
we need to create a pointer that combines those three
variables into one string.

Speaker 2 (11:55):
Yes, we add a fourth property, let's call it combined,
and this is key. We mark this property is hidden
and read only. The analyst should never see it or touch.

Speaker 1 (12:03):
It, and its default value is a calculated property.

Speaker 2 (12:06):
Correct. We said it's default value to a function that
concatenates the others with underscores. So property gender, property, skin, done, property.

Speaker 1 (12:13):
Job, and that resulting string like female medium programmer is
designed to perfectly match the file name of a custom
icon we've loaded.

Speaker 2 (12:20):
That's the connection. That calculated string becomes the direct pointer
to the visual element. We then go into the entity's
display settings and tell a large image to use the
value of our combined property.

Speaker 1 (12:32):
So the system handles the visual taxonomy automatically based on
the data. A huge efficiency game.

Speaker 2 (12:38):
It's massive. And to finish it off, we can display
the job titled directly on the entity using an overlay.

Speaker 1 (12:44):
Okay, how do we set that up?

Speaker 2 (12:45):
We set the north overlay location to display the job property.
But here's the crucial setting. You have to make sure
the overlays type is set to text.

Speaker 1 (12:54):
Ah, because if you leave it as the default, which
is image, it'll look for an icon file name programmer
and find nothing.

Speaker 2 (13:01):
And display nothing. You set it to text and it
displays the actual job title clear as day.

Speaker 1 (13:06):
So what we have now is an advanced worker entity.
It inherits all the standard person functions, but it has
its own specialized dynamic personality, all driven by structured and
calculated properties.

Speaker 2 (13:18):
It's a perfect blueprint for building reliable, intelligent data types.
And you know, one last pro tip. You could even
hide this entity from the main palette if you want.

Speaker 1 (13:26):
Why would you do that?

Speaker 2 (13:27):
You do it if the entity is very complex and need,
say a database key to be valid, you force it
to be created only by a specific transform, which guarantees
its structural integrity from the start.

Speaker 1 (13:39):
So designing these custom entities, it's really a structured exercise
and reliability and clear communication. It's about turning messy specialized
data into a real asset.

Speaker 2 (13:49):
Indeed, we saw how a specialized website entity can just
automatically use transforms made for a generic DNS name just
by inheriting from it. So here's something for you to
think about. If you were designing a new entity for
a very specific type of malware and you wanted it
to automatically use analysis tools that were designed for generic
software files, what would you need to do to its

(14:12):
entity definition to make sure you get that essential cross compatibility.

Speaker 1 (14:15):
We hope this deep dive helps you structure your own
custom data for better, faster analysis. Until next time,

All Episodes

Course 12 - Maltego Advanced Course | Episode 4: Custom Entity Design and Implementation in Maltego

Episode Transcript

Popular Podcasts

Stuff You Should Know

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Course 12 - Maltego Advanced Course | Episode 4: Custom Entity Design and Implementation in Maltego