Traefik API Gateway for Microservices: With Java and Python Microservices Deployed in Kubernetes

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Welcome back to the deep dive. Today, we are wrestling with, uh,
probably the defining architectural challenge of the last few years,
maybe the.

Speaker 2 (00:09):
Decade definitely feels like it.

Speaker 1 (00:11):
How do you reliably route traffic when the ground beneath
your feet is constantly shifting. We're talking about the shift
from stable, predictable monoliths.

Speaker 2 (00:19):
Right, the quarterly release cycle kind of.

Speaker 1 (00:22):
Thing exactly, to this well sometimes chaotic world of micro
services that scale up and down constantly.

Speaker 2 (00:28):
Yeah, it's like comparing I don't know, a printed map
from the nineties to Google Maps in a city where
roads just appear and disappear every few minutes and buildings
resize themselves.

Speaker 1 (00:39):
That's a great analogy. And those older load balancers, the
ones built for the static map, they just can't cope.

Speaker 2 (00:45):
They're stuck in that static config mindset. They pretty much
melt down when faced with how dynamic a modern cloud
environment really.

Speaker 1 (00:52):
Is and that operational headache. That's why we're diving deep
into Treevik today. It's an open source API gateway and
it's build specific to handle that dynamic complexity.

Speaker 2 (01:02):
Right. The idea is to simplify deploy micro services, especially
if you're in the kuber eddies world, which, let's face it, many.

Speaker 1 (01:08):
Are so our mission today.

Speaker 2 (01:10):
Our mission is to unpack how trific acts as this
crucial link. Think of it as the intelligent gateway tier.
It connects that volatile ecosystem of services to the outside world.

Speaker 1 (01:23):
And by the end you listening should have a pretty
good handle on the cutting edge of network routing.

Speaker 2 (01:29):
A shortcut maybe, yeah, a shortcut to being well informed
about this stuff. Resilience patterns too.

Speaker 1 (01:34):
Okay, let's start with the big picture, the monolith problem.
We probably all remember it, right, tight coupling.

Speaker 2 (01:39):
Slow releases, Oh the.

Speaker 1 (01:42):
Pain, and that really expensive all or nothing scaling. Need
more horsepower for login scale.

Speaker 2 (01:48):
The whole thing, huge waste of resources. That worked. Okay,
I guess with the classic three tier model presentation, application data, simple.

Speaker 1 (01:56):
Enough, but micro services break that model completely.

Speaker 2 (01:59):
When you shatter that happened to I don't know, dozens
hundreds of tiny services. You need a different architecture.

Speaker 1 (02:04):
You have to evolve the four tier model.

Speaker 2 (02:06):
Exactly, build for distributed systems.

Speaker 1 (02:08):
And that fourth tier is where Trafiic lives, right, that's.

Speaker 2 (02:11):
His home turf. Right, So the four tiers you really
need are first, content delivery, the UI, the client stuff.

Speaker 1 (02:17):
Okay.

Speaker 2 (02:17):
Second, the gateway tier, that's STRAFIK, discovery, routing, correlating requests,
aggregating responses sometimes all that happens here the traffic hap
sort of. Yeah. Then third is the services tier, your
actual decoupled business logic units, high cohesion, loose coupling, all
that good stuff. And finally, the data tier databases, message queues,

(02:40):
but now ideally exclusive to the services that own that data.

Speaker 1 (02:43):
Okay, So the gateway tier is critical, it's the front door.
What does a modern gateway like trafic absolutely have to
do to handle that chaos in tier three?

Speaker 2 (02:52):
Right, It's got to be more than just a simple
port forwarder. Layer seven routing is non negotiable.

Speaker 1 (02:56):
Where seven meaning application layer.

Speaker 2 (02:59):
Exactly routing based on HTDP headers, host names, paths, maybe
even stuff in the request body, not just layer four
like TCP or UDP ports. And it needs to speak
different languages essentially HDTP one, HGDP two, gRPC rest it
shouldn't care.

Speaker 1 (03:12):
And security that feels like a huge piece, especially with
all those services chattering away behind the gateway.

Speaker 2 (03:18):
Oh massive, Absolutely, the gateway must handle TLS termination, you know,
decrypting the incoming.

Speaker 1 (03:25):
Public traffic standard stuff, right, But.

Speaker 2 (03:27):
Then inside the cluster for service to service chat you
need mutual tls MTLs.

Speaker 1 (03:33):
So both sides prove who they are precisely.

Speaker 2 (03:36):
It's not just the client showing ID. The server demands
ID back, show me your papers too. It's essential for
locking things down inside your perimeter if something goes wrong,
limits the blast radius.

Speaker 1 (03:47):
Okay, that makes sense, which leads us right to maybe
the killer feature, autoconfiguration. Because, like you said, hundreds of services,
maybe thousands of instances, updating config files byhand impossible right there, It.

Speaker 2 (03:59):
Just doesn't scale. That's where Treyfik fundamentally solves the service
discovery problem. Instead of a human editing.

Speaker 1 (04:05):
A file, which is always error prone.

Speaker 2 (04:07):
Always, instead, Trephi talks directly to a service registry. Think Console,
etca a Kubernetes itself. These things are like near real
time databases of where every active service instance lives on
the network.

Speaker 1 (04:19):
Ah, so Triffic doesn't need its own map. It just
asks the map maker constantly.

Speaker 2 (04:23):
Exactly perfect analogy. Treyfik calls these map makers providers. It
has first class support baked in, just sits there and
watches the provider. A new service instance spins up. Treyfix
C is it yep, an old one dies, treyfix E
is that too, and it automatically reconfigures its own routing
tables crucially without needing a restart or dropping existing connections.

Speaker 1 (04:44):
Hot reloads zero downtime. That's the dream.

Speaker 2 (04:47):
That's critical. Dynamic configuration and hot reloads are absolutely key.

Speaker 1 (04:52):
How tricky is it if you're running say, Docker and
Kubernetes and maybe console, can one Trafiic instance watch all
of them?

Speaker 2 (05:00):
Yeah? Surprisingly easily. That's the beauty of the provider concept.
Trayfik kind of abstracts away the specific details of talking
to Kubernetes versus talking to Console, so you can centralize
routing even in a mixed environment.

Speaker 1 (05:13):
So developers just deploy to whatever platform.

Speaker 2 (05:15):
They use, and Treyfik figures out how to find it
and send traffic there. Developers focus on code. Treefiic handles
the routing complexity.

Speaker 1 (05:22):
Okay, so Treyfik knows where everything is. Now let's talk
about actually sending the traffic efficiently. We all know basic
round robin, right, just deal them out equally fine for
stateless stuff.

Speaker 2 (05:35):
Yeah, simple, effective, if all your servers are identical, but.

Speaker 1 (05:38):
They rarely are so weighted Round Robin WRR. How does
that work?

Speaker 2 (05:42):
Right? WRR is about being smarter with resources. Maybe you
have an older, cheaper server with less CPU. You don't
want to getting the same traffic as your brand new
beat cloud instance makes sense, So WRR lets you assign weights.
You could say, send three requests to the powerful guests
B one group for every one request you send to
the older guest D two, a three point one ratio

(06:04):
for example.

Speaker 1 (06:04):
So it's not just load balancing, it's cost optimization too.

Speaker 2 (06:08):
Definitely in the cloud especially, WRR helps you squeeze maximum
value out of cheaper or older instances alongside the new ones.
Keeps everything utilized efficiently, saves money, no resource just sitting
idle or getting totally slammed.

Speaker 1 (06:21):
Okay, let's flip that. What about apps where the user's
state matters, like a shopping cart stored in memory on
one specific server instance, Round Robin would break that.

Speaker 2 (06:30):
Yeah, that needs sticky sessions. If a user's second request
hits a different server, poof their cart is gone or
they get logged out. Bad experience.

Speaker 1 (06:39):
So how does trophy candle that it uses cookies.

Speaker 2 (06:41):
Typically when the first request hits a back end instance,
treefix sets a cookie in the response for subsequent requests
from that same user, trific reads the cookie and make
sure to send the request back to that same original instance.
Keeps a session alive.

Speaker 1 (06:55):
Okay, sticky sessions makes sense, But underlying all this balancing,
you need health right. Making sure you're not sending traffic
to a dead.

Speaker 2 (07:03):
Server absolutely fundamental. You only want to route traffic to
instances that are actually healthy, usually meaning they return a
two XX or a three X HTTP status code. Anything
else is an error.

Speaker 1 (07:15):
Doesn't constantly poking every instance ad overhead though a performance tax.

Speaker 2 (07:19):
That's a fair question. It's a trade off. Trefik does
use active checks where it sends a probe and passive
checks watching responses. But you can figure the interval. You
tune it so you find a balance, right, you said it, so.
The monitoring overhead isn't painful, but it's frequent enough to
pull an unhealthy instance out of the pool quickly when
it does fail. It's crucial for the stability of things

(07:41):
like round robin.

Speaker 1 (07:42):
Let's shift gears a bit to more advanced resilience patterns.
Traffic mirroring sometimes called shadowing, sounds useful for testing.

Speaker 2 (07:49):
Oh, it's fantastic for canary deployments, really safe testing. The
idea is you take your live production traffic, the real stuff,
the real stuff, and you copy a small percentage of it,
say ten percent, and send that copy asynchronously to a
new test environment, maybe your guess V two.

Speaker 1 (08:05):
Version, asynchronously, so the original user isn't waiting exactly.

Speaker 2 (08:09):
And critically trafiic ignores the response from that mirror request.
It just fires it off and forgets about it unless
you see how your new code behaves under real load stability,
resource use without any risk to the actual user experience.

Speaker 1 (08:23):
That's clever. Okay, so we've handled load and safe testing.
But what about when things actually fail, not just one instance,
but maybe a whole downstream database or API becomes slow
or unresponsive in a micro services world. That seems like
it could cause chaos.

Speaker 2 (08:38):
It absolutely can. That's the dreaded cascading failure scenario. One
slow dependency makes its callers wait, they.

Speaker 1 (08:45):
Run out of threads or connections.

Speaker 2 (08:47):
Exactly, and then they fail, taking down the services to
call them. It ripples outwards.

Speaker 1 (08:52):
So how does trific act as a ble kid prevent
that ripple.

Speaker 2 (08:55):
That's the job of the circuit breaker pattern. Trafic middleware
can implement this. It watches for failures going to a
particular back end service.

Speaker 1 (09:03):
Failure is meaning errors or timeouts.

Speaker 2 (09:06):
Both typically yeah, if the failure rate or maybe latency
crosses a threshold you.

Speaker 1 (09:11):
Define like too many errors in the last minute.

Speaker 2 (09:13):
Right, or responses are taking too long. If that happens,
Trefix trips the breaker. It stops sending requests to that
struggling service altogether for.

Speaker 1 (09:21):
A period and just returns an error immediately.

Speaker 2 (09:23):
Yep, usually a five zero three service unavailable. It does
this instantly without even trying the failing service. This protects
the calling services from getting bogged down and saves resources
across the system. It's like the system saying nope, that
are closed for now, try again later.

Speaker 1 (09:38):
And the conditions for tripping. It can be quite sophisticated.

Speaker 2 (09:41):
I saw yeah. Treefix implementation is pretty powerful. It's not
just simple failure counts. You could use expressions like trip
if latency at quantil ms fifty point zero hundred meaning
the meeting response time is over one hundred.

Speaker 1 (09:54):
Milliseconds, or based on error ratio exactly.

Speaker 2 (09:57):
Response cut a ratio five hundred, six hundred point twenty
five trip if more than twenty five percent of recent
responses were five xx errors gives you fine grain control.

Speaker 1 (10:05):
Okay, circuit breakers handle the big failures. What about those
little annoying transient glitches like a brief network kickup that
just needs a quick retry.

Speaker 2 (10:14):
Perfect use case for retries middleware, Just like getting refresh
in your browser when it page times out right, TRIFIC
could be configured to automatically retry a request, maybe once
or twice if it fails with specific errors like a
connection timeout or maybe a five h two bad gateway.
It provides a basic level of self healing for those
intermitt network blips.

Speaker 1 (10:31):
Makes sense. So we've got routing balancing resilience. But when
things do go wrong despite all this, we need to
figure out why. Let's talk observability.

Speaker 2 (10:41):
Crucial observability isn't just knowing that something is wrong, but
having the data to understand why. And TRIFIC, sitting at
the entry point, is perfectly placed to collect that data.

Speaker 1 (10:53):
Across the three pillars right, logs, traces, metrics exactly.

Speaker 2 (10:57):
Let's start with logs.

Speaker 1 (10:59):
Now people off and say application logs alone aren't enough
in micro services. What makes trifix logs actually useful here?

Speaker 2 (11:06):
Well, it generates standard error logs, of course, but the
real value is often in the access logs. The trick
is logging everything for every request can be really resource intensive.

Speaker 1 (11:16):
Yeah, generates huge amounts of data.

Speaker 2 (11:18):
So trific lets you filter them intelligently. You might say,
only lawged requests that resulted in a redirect status codes
three hundred to three h two, or only log requests
that took longer than say, five seconds to complete using
a mind duration filter.

Speaker 1 (11:32):
Ah, so you capture the interesting or problematic events without
drowning and routine data.

Speaker 2 (11:37):
Precisely optimizes performance, gets you the diagnostic data you actually need.

Speaker 1 (11:40):
Okay, logs tell us what happened at the edge, But
to follow a request through multiple services, we need tracing.

Speaker 2 (11:46):
Right request tracing stitches the whole journey together. Each piece
of work done by a service is a span. All
the spans for one user request combine into a single trace, like.

Speaker 1 (11:57):
A timeline of the request's life.

Speaker 2 (11:59):
Exactly, and Trafik being the first point of contact, can
generate standardized trace headers, often B three propagation headers, things
like XB three trace seed. Think of them like a digital.

Speaker 1 (12:10):
Passport, and it passes that passport along.

Speaker 2 (12:13):
It injects those headers into the request before forwarding it
to the first back end service. That service, if it's
trace aware, adds its own span and passes the headers on.
So even if the request hits five different micro services,
you can.

Speaker 1 (12:25):
See the whole chain in a system like Zipkin or
Jaeger exactly.

Speaker 2 (12:29):
End to end visibility invaluable for debugging distributed systems.

Speaker 1 (12:32):
And the third pillar metrics the numbers yep.

Speaker 2 (12:35):
Treyfix exposes key application level metrics, things like total request counts,
request latencies, average quantiles, error rates, information about the.

Speaker 1 (12:43):
Back end servers, and you feed that into.

Speaker 2 (12:45):
Standard monitoring systems, typically Prometheus. Prometheus scrapes these metrics from
Treyfi periodically. Then you can use tools like Rafona to
visualize trends, plan capacity, and set up automated alerts if say,
aer rates spike or latency degrades.

Speaker 1 (13:01):
Got it? Okay, let's bring this home to the place
where treefix seems most popular. Kubernetes. You mentioned earlier that
the original Kubernetes ingress API wasn't great.

Speaker 2 (13:11):
Yeah, it was. Let's say a bit under specified vague,
which forced vendors like treyfick in Jinks and others to
rely heavily on custom.

Speaker 1 (13:19):
Annotations, annotations being those kind of messy tech strings. In
the Yamo.

Speaker 2 (13:23):
Exactly, you'd have dozens of vendor specific annotations to configure
basic things like timeouts or retries or sticky sessions. It
wasn't clean, wasn't standardized.

Speaker 1 (13:33):
So how did trefik improve on that? They gave up
on Ingress in treyfiic v two.

Speaker 2 (13:36):
They shifted strategy. They embraced Kubernetes's custom resource definitions or crds.
They introduced their own resources like ingress, root middleware TLS.

Speaker 1 (13:45):
Option, So instead of annotations, you define routing rules using
these custom but still native feeling Kubernetes's objects.

Speaker 2 (13:51):
Precisely, it's a much nicer experience. As they say, configuration
becomes structured, version controllable Kubernetes YAML, just like your deployment services.
Any Kubernetes engineer can understand it. It follows familiar patterns,
no more digging through annotation documentation for different vendors.

Speaker 1 (14:08):
That sounds like a huge improvement. And you also touched
on TLS simplification getting certificates is often a real pain.

Speaker 2 (14:15):
Oh historically it was awful manual requests, validation hoops, remembering
to new high chance of error, high risk.

Speaker 1 (14:23):
So how does trifick fix that.

Speaker 2 (14:25):
It integrates directly with the ACME protocol, which is the
standard let's encrypt uses for automating certificate issuance for public domains.

Speaker 1 (14:32):
Let's encrypt the free certificate authority right.

Speaker 2 (14:35):
When in trifick you basically just configure a cert resolver
pointing to let's encrypt. Than when you define an ingress
route for a public host.

Speaker 1 (14:43):
Name, trifiic just handles it.

Speaker 2 (14:44):
It handles the entire life cycle automatically. It requests the certificate,
handles the domain validation challenge, often using something called the
TLS ALPN zero one challenge. It's quite neat, retrieves the certificate,
installs it, and even handles renew before it expires.

Speaker 1 (15:01):
Wow. So the developer just defines the route asks for
TLS and trefick and let's encrypt do the rest.

Speaker 2 (15:08):
Pretty much focus on the application logic. The complicated, error
prone task of certificate management just happens.

Speaker 1 (15:15):
So wrapping it up, trefix core value seems to be
replacing that old, rigid manual configuration world.

Speaker 2 (15:21):
Which just breaks under micro service dynamism.

Speaker 1 (15:24):
With the dynamic self configuring system built for that reality.
It's the traffic cop that learns the roads automatically as
they get built or torn down well put.

Speaker 2 (15:33):
And there's a final thought, maybe a provocative one, tied
to that certificate automation. We just discussed why traditionally certificate
management was so painful and manual. People did it infrequently,
maybe once a year. This meant certificates were valued for
a long time. If one got compromise somehow, an attacker
had a year long window.

Speaker 1 (15:50):
Right. Long lived credentials are risky.

Speaker 2 (15:52):
Very yeah. But because Trefix integration with let's encrypt automates
the renewal process, certificates typically only live for ninety days now,
and the renewals automatic, often no human touch needed.

Speaker 1 (16:05):
So it drastically shrinks the window of opportunity for an
attacker using a compromise certificate.

Speaker 2 (16:11):
Exactly here removes a tedious, error prone operational task and
significantly improves your security posture. By enforcing short certificate lifetimes.
That whole category of operational security risk just kind of
melts away thanks to automation.

Speaker 1 (16:25):
That's a really powerful side effect of adopting modern tooling.
A fantastic insight to end on, Thank you for taking
us through this deep dive into trafit.

Speaker 2 (16:33):
My pleasure is fascinating technology.

Speaker 1 (16:35):
Then thank you our listeners for joining us. We'll catch
you on the next deep dive.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

The Burden

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Traefik API Gateway for Microservices: With Java and Python Microservices Deployed in Kubernetes

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

The Burden

All Episodes

Traefik API Gateway for Microservices: With Java and Python Microservices Deployed in Kubernetes

Stuff You Should Know