Building Better Platforms with Dapr: Abstractions, Portability, and Durable Systems with Mark Fussell

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:04):
You're listening to thePlatform Engineering Podcast, your
expert guide to thefascinating world of platform engineering.
Each episode brings you indepth interviews with industry experts
and professionals who breakdown the intricacies of platform
architecture, cloud operationsand DevOps practices. From tool reviews
to valuable lessons from realworld projects, to insights about

(00:27):
the best approaches andstrategies, you can count on this
show to provide you withexpert knowledge that will truly
elevate your own journey inthe world of platform engineering.
Welcome back to the PlatformEngineering Podcast. I'm your host
Cory O'Daniel and today I havewith me Mark Fussell, co creator
of Dapr, CEO of Diagrid.Before starting Diagrid, he led platform

(00:50):
strategy at Azure, spent yearsthinking about how developers build
distributed systems. WithDapr, he helped create a standard
runtime for cloud native apps,one that's now widely adopted by
platform teams. At Diagrid,he's focused on helping teams operate
these systems more reliably inproduction. Mark, thanks for coming
on the show today. Superexcited to talk about Dapr.
It's fantastic to be here,Cory, and I'm excited to talk to

(01:12):
you about all things distributed.
Yeah, yeah. So why don't yougive us a little bit about your background,
like what you were doing atAzure and like what kind of drove
you towards, you know,creating Dapr and Diagrid.
Yeah, yeah. Actually I spentmany years at Microsoft. I spent
20 plus years there. When Ifirst joined actually I worked on
XML technologies which waslike the hot thing in 2000. Of course

(01:33):
no one mentions XML anymorenow, but that was good. But after
going through lots ofdifferent developer technologies
and working on databases, Iended up in 2008 starting a project
which was fairly new at thetime early in Azure called Service
Fabric. And it was all aboutbuilding a distributed systems platform
for Azure to build and hostand run their hardest services, which

(01:54):
were their database serviceslike SQL Azure and Cosmos DB and
things like this.
Very cool.
And anyway, it taught a lotabout building distributed systems,
building a platform, runningthings at scale and understanding
how you really manage andoperate these things. Everything
from sort of replicationtechnologies all the way up to what
it provided, which was anapplication programming not only

(02:18):
model but also a descriptionof how the applications had dependencies
between each other. And soyeah, Service Fabric was a great
understanding of learning andtransforming from what was in the
days of client server and howyou built things to kind of the modern
way now of effectivelydistributed compute and running it
on multi-machines. It was afun, fun journey.

(02:39):
Yeah, that sounds very cool.It's funny that you say xml. I had...
I don't know if you rememberthis... I remember I got it, I think...
No Barnes and Noble. What' the one that went out of business
before Barnes and Noble?... Iremember buying like the XML Bible
in like the early 2000s, like2002. It was legit like two and
a half inches thick. And I waslike, "Man, once I read this thing,
I'm going to be the bestprogrammer on the planet." I did

(03:01):
not make it past the first chapter.
There's xml, Transform xml,xslt, there's all this sort of stuff.
And actually I spent also fivevery difficult years working on all
the W3C protocols. So W3Cschema, W3 secure conversation and
all that sort of things, whichnow is kind of in the past.

(03:24):
Yeah, yeah, very cool. So whatI'd love to do is I'd love to just
kind of start from the top andlike get... I feel like you're in
a place where you'reinteracting with a lot of platform
teams today.
Yes.
And it's interesting because Ithink that, you know, th idea...
I'm a big fan of creatingabstractions... think that, honestly
that' what I've always feltDevOps was about, was creating abstractions

(03:46):
t act as the collaborationversus like making other people do
your job - which is how it'ssurfaced for many teams. Seeing what
Dapr does and seeing the typesof teams that you're working with
in the platform engineeringspace, I'd love to just get wha
you think the current state ofplatform engineering is today. And
what are some of the biggestproblems some of these platform teams
are trying to solve with andwithout Dapr?

(04:09):
I mean, let me give you alittle bit of overview of what Dapr
is and then we'll sort of divein and answer that question. And
just to kind of continue mybackground, when I was at Microsoft,
so six years ago, we startedthe Dapr project. And then left Microsoft
three and a half years ago toform Diagrid because we sort of wanted
to continue innovation withit. So today, you know, I'm the cofoco-founderunderas
the co-creator of Dapr.ofdapr.ButI mean,what Daprisinessenceisit's

(04:33):
a set of developer APIs thatallow developers to not reinvent
the pattern, as we say, or notreinvent the wheel for common abstractions
that they need. For example,decoupled messaging. So if you do
event driven system, eventdriven application, you do Pub/Sub
messaging, it allows you toprovide an API for Pub/Sub messaging.
Or if you want to kind ofcommunicate a request reply semantics

(04:56):
between two services, it hasthis thing called service invocation
which does discovery, securecalls, and if an application fails
and falls over to anothermachine, it will reconnect it all.
Or things like secretsmanagement. So these common APIs
are really what developersneed in order to sort of be productive
building their applicationsrunning on top of any different type

(05:18):
of compute platform likeKubernetes for example. So to go
to your point of like, what dowe see in platform engineering today?
What we see a lot is platformengineering teams focus on, "Okay,
I'm building a platform, it'sgot some services in it, like a database
service, or a messagingservice like Kafka or RabbitMQ, or
you can get some secretsmanagement from it as well like Hashicorp

(05:39):
Vault or one of the secretsmanagement for say from AWS or Azure."
And they want to provide allthese services, but they sort of
somewhat ignore the actualcontract to the developers themselves.
So what happens is you seelike a developer goes, "Well, okay,
I'm just going to use theKafka SDK." And so they pull in the
Kafka SDK. Now if you want todo Pub/Sub messaging between services,

(06:00):
first the Kafka SDK isn't verynice. You have to sort of build this
sort of Pub/Sub semantics ontop of it all - how I publish a message
and how I subscribe - becauseit's a sort of streaming thing to
begin with. So you have tosort of do that first. Then you sort
of bake the Kafka SDK intoyour code and then you start using
it all. And then you know, loand behold, kind of in about three

(06:21):
months or four months time,someone says, "Oh, we have to move
everything to AWS now" or someother cloud, and you have to rip
out all that Kafka code andinstead, you know, replace it with
AWS SNS or Azure Service Busor one of these other ones. And all
of a sudden you've got thishuge pain point where developers
have tightly coupled theircode to the underlying platform.
You know, you don't want toreally do this. One of the key benefits

(06:44):
that Dapr provides is a veryclean abstraction between a Pub/Sub
behavior and semantics of anAPI and the ability to swap out the
underlying infrastructure -which is called as components. So
you can literally keep thecode exactly the same with the right
behaviors and swap betweenKafka and say RabbitMQ or SNS literally
within hours with a componentthat's been very well tuned for that

(07:07):
particular message bus becauseall the right SDK has been wrapped
in right behavior. It has aYAML metadata description... I know
you're a YAML fan.
I love it.
So you have this descriptionof this component and it has all
the metadata about it all. Soyou literally go and take the Kafka
YAML description and swap itfor the AWS YAML SNS description,

(07:28):
drop it in place, and boom,you've literally migrated from Kafka
to SNS within hours. And offyou go. And so that's one of the
things that we see happen alot today, this sort of tension and
fight between the applicationdevelopers and what the platform
provides and this sort oftight coupling. Also it provides
really for portability ofcode. So to address the problem...

(07:52):
we see a lot of financialcompanies that have to deploy multi-cloud,
they have to deploy on thiscloud and that cloud. We also see
a lot of people who have tohave choices as well in a platform
- I want Kafka and SNS in bothof them... and also provide for design
flexibility. So as you moveforward, and and youdecideoneparticularmessage
broker might not satisfy you,and you want to adopt and change,

(08:14):
you can. So those are kind ofthe benefits we get from a platform
engineering team perspective.Does that make sense?
Yeah, yeah, yeah, it does. Soit's funny that, like, as you're
describing this Kafka thing...the word I like is calcify. Like
it just gets crusted in thereand it's all over the place. And
it's funny because, like, whenpeople talk about cloud lock-in,
like, I very rarely feel likecloud lock-in is your data. It's

(08:37):
like, "Oh, I've got some stuffin an event broker somewhere." Okay,
you write a little tiny lambdathat just attaches and pulls that
out to another one. Like youcan get that data elsewhere. I feel
like the real cloud lock-inhappens in two places. Security level,
because you just can't see it.
Yeah.
And you forget about it. Andthen when you move to the new cloud,
you're like, "Oops." Then theother place I feel like you really
feel cloud lock in is in thecode base.

(08:59):
Yes.
Because their SDKs have justkind of permeated every single file.
And it's like, man, to moveoff of S3. It's not hard to sync
an S3 bucket to GCS or AzureBlob Store - it's very trivial. To
change the code...
Yes.
To change the code is going totake so long.
It's so long.
And it's like one of thosethings. It's like, it's funny. Like

(09:21):
if your ops person sits downto move the data and the developer
sits down to change the SDK,the part that we think about as cloud
lock-in happens way faster.Like moving that data is done...
Exactly.
But changing that SDK, it'slike, "Man, all of our test mocks
are toast. Like so much stuffis toast.
Yes.
That is pretty cool. It'sfunny, I feel like this is kind of

(09:42):
covered a little bit by like,you know, hexagonal architectures
or something like that. Butlike that als feels like, for many
teams, a big, hard concept toswallow. Like that's one of those
things that like, we haven'treally dried up. And it sounds like
that's kind of what Dapr isdoing. It's saying, "Like, look,
there are these abstractionsthat make sense to developers." I
buy this. I know many peopledo. I know there's some Ops people

(10:04):
that still struggle with this,but developers tend to not. I don't
want to say they don't care,but they're not as concerned.
No. Yeah. I mean, you get alot of, "I want to try and solve
this problem myself." So thebiggest thing you rub against is
the developers want to build aplatform themselves and get all very
excited about this. But that'snot, you know, the business value
to the company.
No.
And you know, when thebusiness value to the company is

(10:25):
like a level of portability,cross cloud. I mean, there's also
the same thing, comes down tothe fact that developers can build
things locally. So you canbuild against this API and you can
run it. For example, often wesee that people use Redis as a local
Pub/Sub testin environment,and then they can just swap out Redis
and put AWS SNS in, and theymove from local development to cloud

(10:47):
with no code changes. And likethat concept of like, I just switch
between this - I mean thedevelopers love all this. And it'
a very clean API around thesethings. Pub/Sub epitomizes this but
you've got things like jobs aswell, a cro job API. You've got
secrets management, you've gotthis concept of sort of bindings
where I can talk to anunderlying state store or a SQL database

(11:08):
and then just swap that out.So, you know, all of these common
API definitions make it veryclean for developers to build these
apps.
The local development testone's another one that's really interesting
to me. So we lean heavily intothis idea internally, but it's a
ton of boilerplate for us.I've written about this a bit on
LinkedIn. Like we just movedfrom AWS to just completely cloud

(11:29):
native... we can define thatterm later, I think... like fully
into Kubernetes. An when wedid this, we were just heavily into
adapters and it's I'mvery I'mvery into TDD and I S3everytimeIwrite
likeaBlobstoretest,right?Andsoit'slike test. And so it's like
we have an abstraction that'slike Blob storage. Development and
test, it's production.It'sS3,right? memory stuff. In production

(11:52):
it's S3, itsoundslikethisgivesyou thatwithouteveryteamhavingtobejustlike
sounds like this gives youthat, without every team having to
be just like spendingintroduceacloud service,we createan
abstractionaroundit.Wecreateatestversion, aprod
Ve.Anditsoundslikethisjustlikegets yougoing.
v ersion, w
Exactly. That's exactly whatit's designed for. And we just saw

(12:13):
this time and time againrepeated across all these companies.
And it' not only that, you'vegot to start putting behaviors inside
it all. So let's take messagin as an example. Often when
you start to build thesedistributed applications and you
want to publish messages, youwant to be able to say, "Well, I
only want these otherservices, from a security perspective,
to be able to receive them alland not these ones." So you want

(12:35):
to be able to lock down whocan receive the messages, rather
than anyone to be able to seethem all. So in the messaging behavior
of the Pub/Sub API, you cansay only these applications are allowed
to subscribe these messagesand these ones can't. So you can
be very explicit aboutsecurity. In fact, security is prevalent
across most of the concept ofDapr. A very key element to Dapr,

(12:59):
which again kind of getsmissed, is that the actual pieces
of code have identityassociated with them. And the fact
that you have identityassociated with your code is you
can say these applicationscan't receive these messages, or
this piece of code is the onlyone that's allowed to connect to
the Kafka service, and theseones can't. So the platform team
can now deny certainapplications attaching to them or

(13:20):
can approve certain ones as well.
Hmm, that's cool.
Or the application developercan say, "Only these messages or
this application can talk tothis", because you have Identity.
And this is another key thingthat gets missed in application development.
Identity allows you to do lotsof super cool things, including flowing
the identity of the actualprocess down to the underlying infrastructure

(13:43):
itself. And we do this in Daprwith SPIFFE as an identity. So a
SPIFFE identity getsassociated with a piece of code.
Like in the AWS world, theycan actually identify SPIFFE with
an underlying service likeSNS. And now a developer literally
has no concerns whatsoever oflike this piece of code providing
its identity and talking tounderlying service. And the platform

(14:05):
team can approve or disallowwhich applications talk to all based
upon that identity. This issomething that's also super key that
the platform teams lovebecause now they have control over
what applications cancommunicate. Plus the developers
love it because security justflows naturally.
Yeah, we'll put a note in theshow notes and a link to SPIFFE -
this is the secure productionidentity framework for everyone.

(14:29):
Yes.
I'll tell you what, that's aspiffy acronym. I'll give it to them
- they landed that one.
Yeah.
Yes. So this is the first timeI'm hearing about SPIFFE. I was googling
it real quick as you're sayingit, so I could put a link. Can you
kind of tell us wha SPIFFE isand ho that actually integrates?
Like, is it at the... so it'sat the code level, not at the service
level. So if I had like two different...

(14:49):
Yes.
You know, two differentdomains in my project that are storing
data in different ways. Likeeach of those domains - like my user
space and maybe whatever myother business domain is - would
have its own identity, even ifit's in the same code base.
See, what developers want tobe able to do is jus say, "I want
to talk to the orderprocessing application and I want
to call the order process onit. And that's it." Invariably what

(15:14):
you don't want is developersto say, "I have to look at this DNS
location and open up a portand think about everything at the
networking level." And all ofthat's just horrible for a developer.
So instead, developers simplythink in terms of identities of pieces
of code. So now I make myapplication, it has identity... like
it has Mark's application,Cory's application. On Cory's application

(15:35):
it has method A, B and C. Andso from a developer's perspective
I just say, "Go and call theorder method on the order processing
application wherever it'srunning in my Kubernetes cluster."
I don't care where it is, itcan be on a machine further away.
I'm not thinking about portsand endpoints and anything else like
this, which is sort of thenatural behavior of a platform team.

(15:57):
And so all of a sudden you'vegot easy ways to send messages, communicate,
discover things. And thatidentity... going back to a SPIFFE
identity that getsallocated... can be flown by federation
because you can actually gointo the underlying infrastructure
itself and federate that backto the identity that was given to
the actual process itself...it can now be federated to the underlying

(16:20):
service, like for example inthe case of awAWSsSNSservice. And
now you can make sure thatonly this application with this SPIFFE
identity... which is basicallyan X.509 certificate, SPIFFE is just
a way of wrapping a piece ofidentity like an X.509 certificate
or any other piece of identitycredentials and then flowing that
down to the infrastructure...This concept of identity means that

(16:43):
it's very easy for anapplication developer to call other
services and to have securityin the underlying services. And it's
something that, you know, jusprevents application developers having
to think of ports and networksand other sorts of horrible things
like this, which is sort ofnot the natural base or knowledge
of a platform engineering teamas a whole.
It's also just like, it's thatundifferentiated heavy lifting. It's

(17:06):
like if everybody on the teamhas to think about it all the time,
it's just like... that's justlike slices of brain that are like
not put towards the product. Right?
Exactly. Well, there are threecrosscuttcross-cuttingingconcerns.Soas wellasAPIs,
therearethreethingsthataplatformteam has to think about
to service an applicationteam. They are security, end to end
and what that looks like.There's observability of how you

(17:28):
track data and how it flows.And then there's resiliency in terms
of what you do in the event ofnetwork failures. And so this is
another thing that becomes aburden on the application teams.
So let's take an example ofobservability. Now if you can get
all your APIs flowing throughDapr, every single one of them is
tracked with OpenTelemetrycalls. So you can go call from...

(17:49):
say to a secret store to gethold of a secret. You can send a
message with Pub/Sub messagingand call to a database. And in that
whole call flow, Dapr writesout OpenTelemetry events, which you
can push into yourOpenTelemetry store. OpenTelemetry
is another CNCF protocolstandard that's pretty much well
adopted now in terms ofevents. And then you can see the
whole call graph of what thatlooks like inside your Datadog, New

(18:15):
Relic, relic CloudWatch,whatever you choose as the environment
around these things. That'sbeautiful as well, because without
the observability, you can'trun production applications. I mean,
you have no idea what'shappening because you can't see.
And so now you can sort ofpinpoint, "Oh, this happened here,
this happened here, thishappened here." You can see the latency
between the calls and thissort of thing. And invariably this

(18:36):
is a thing that struggles alot when developers pul in the Kafka
SDK, and then they pull insome other SDK, and they're sort
of disconnected and having totry and find a continuous way of
doing diagnostics andobservability across it. So it solves
the observability problem as well.
Ops teams, you're probablyused to doing all the heavy lifting

(18:57):
when it comes toinfrastructure as code wrangling
root modules, CI/CD scriptsand Terraform, just to keep things
moving along. What if yourdevelopers could just diagram what
they want and you still gotall the control and visibility you
need? That's exactly whatMassdriver does. Ops teams upload
your trusted infrastructure ascode modules to our registry. Your
developers, they don't have totouch Terraform, build root modules,

(19:19):
or even copy a single line ofCI/CD scripts. They just diagram
their cloud infrastructure.Massdriver pulls the modules and
deploys exactly what's ontheir canvas. The result, it's still
managed as code, but withcomplete audit trails, rollbacks,
preview environments and costcontrols, you'll see exactly who's
using what, where and whatresources they're producing what

(19:41):
all without the chaos. Stopdoing twice the work. Start making
infrastructure as code simplerwith Mass
So I feel like this is one ofthose things. It's hard. I love OpenTelemetry.
We actually... it isdisturbing to ne folks that like
work on our application. Wehave a quiet app, like if yo attach

(20:01):
to its logs, it doesn't make apeep. And it's because we've just
leaned into events andOpenTelemetry, like spans and events,
heavily And so it's likeeverything... we just send it all
there and we don't care. It'sfunny. People are like, "Oh, isn't
that expensive?" It's like,"No, your logs are..." It's expensive
when it's an additionalexpense, but when you're not talking
about logs, you can send 100%of stuff to OpenTelemetry and it

(20:22):
costs about the same. And it'smore useful.
Exactly.
One of the things I think ishard, though, is i somebody's getting
into OpenTelemetry, or justtracing in general, is like where
to put it. And so does thecall graph... like, how Dapr is kin
of this SDK that sits inbetween me making calls and handling
the distributed nature ofit... Is it automatically putting

(20:43):
on those traces? Like on thecalls for...
Yeah, every cal creates aspan, the spans get flowed and does
all that for you.
It's like a freebie. I mean,you can probably do some more in
your code, but like, you'renot making tons of changes, like
adding spans around everfunction. It's just like, boom, your
service input came in.
You turn it on and all of asudden it's like, boom. You just

(21:03):
get beautiful traces acrossall your APIs. If you think about
it, ther you are, you know,you're calling some message broker
with like Kafka, and thenyou're wanting to get hold of a secret
and, let' say, call adatabase. And like, I want to see
what that looks like. And so,you know, developers want to see
the call graph operations,operations people want to see what's
happening. And all of thesecome back with metrics, information

(21:26):
as well. So you can see thelatency, the error rates, the throughput
around all of these things aswell. So that whole thing just gets
freebie around this.
Yeah.
Then the other freebie you getis resiliency as well. Because what
happens is an applicationdeveloper will call onto something
like Pub/Sub service and it'llfail. And then they're like, "Oh,

(21:47):
it failed. What do I do?"Well, you have to build retry mechanisms
in where I failed and let mejust retry the call again. And you
can just say, retry it threetimes before you do something. Or,
for example, you want to putthings like circuit breakers in -
where something's going crazyand you want to sort of break it
off for a while until itrecovers around these things. So

(22:07):
the resiliency can be put ontoany API as well. So you can do retries
on the Pub/Sub API or you cando circuit breakers if things are
misbehaving around this sortof thing. And so you end up building
these on top of a lot ofPub/Sub systems and all these sort
of things as a whole. Thesecrosscross-cuttingcutting security,
resiliency, observability, youjust get for free - it's great.

(22:32):
So let's say I've got just twoservices - they're not in cloud services,
they're two of my services. Ianoldflaky one,it'swrittenin Java
2fromadecadeago.Butforwhatever ago but for whatever reason
we can't... we have to dealwith it, like it's there. Now th
retries, is that happeningfrom the and likehandling theretrieslikeonlikethelastleg?

(22:55):
Well this is wher Dapr comesin. So maybe I should just go into
the next thing. What Dapr doesis Dapr runs as a sidecar process
to your application. So theway it works is every single of your
applications that youlaunch... say I launch 100 applications...
they've all got their ownlittle Dapr sidecar running next
to them all. And so Dapr hasits APIs that you're calling and

(23:17):
it's doing all the heavylifting for you. And so it's doing
not only the behavior of likethe Pub/Sub messaging, but it's providing
a resiliency policy and doingthat retry on your behalf.
Oh cool.
If you're doing a call, it islogging that telemetry call. Or if
you're doing security, it's checkin that you're authorized before
it handles a call on. So it'sdoing all those things on your behalf

(23:37):
that you get for free. Soinstead of you having to like bake
this all into your applicationcode or try and mix and match all
these different SDKs together,Dapr just does this as a general
purpose library, as it were,but launched as a little sidecar
process next to your application.
So you don't have like ahundred apps comin back like on
one Dapr serve, it's allsitting right insid your Kubernetes

(24:00):
deployments - right there,fast Fastlocallocalhost.
Exactly.
Everybody's got thrown. That'stight. That's awesome.
Yeah, yeah. So when you launchyour pod for your application, Dap
runs a sidecar insid yourpod. So you know, it's just a local
host call around these things.There is a version of Dapr that you
can deploy which has a permachine version of Dapr if you want.

(24:21):
You know, because some peoplelike to have the idea that, you know,
they have less resources orthey want to have a central place
of this. But the most commondeployment is, you know, a single
pod with a Dapr sidecarrunning next to it for your application
and you have, you know, 10applications i 10 pods and that's
running onkuKubernetesbernetesanditalljustscalesout foryou.
Yeah, that's very cool. So letme ask you a question about the abstractions

(24:43):
because I've seen this in afew places where people have tried
to do multi-cloud. I'm curioushow Dapr handles it. So what's the
SDK for like Blob Storage called?
Well, right now there's a BlobStorage bindings we have for that.
But let's just talkabout...probably the best one is
to go for estatemanagementmanagement,wherewe sortof
dokey valuestorage intoanyformofdatabasethatyou want.

(25:03):
There's so many of these in S3that are great illustrations, but
like, how does it deal withlike the lowest common denominator
of the providers behind thescenes? Like, let's say that I need
a feature, can I get thatfeature through the Dapr bindings
and SDK and it's just like Ionly get that feature if the backend
supports it? Or is thatfeature just not accessible to me
because the lowest commondenominator doesn't offer it?

(25:23):
Yeah, so you nailed one of themost important questions that people
ask. It's like, "Oh, you know,do I not get the features of the
underlying platform?" And theanswer is you do, because although
the API is consistent, you canstill choose the features of the
underlying provider that getssurfaced in the component itself.
Let's take an example aroundthis. So Dapr has an API for saving

(25:46):
status key value pairs. So yousay, here's key value pairs and then
you could plug in anyunderlying database for that. And
there's like 30 differentdatabases, but some of those databases
support transactions and someof them don't support transactions.
And so you can decide thatwhen I'm using the key value API
and I have a multi transactiondatabase and I want to do sort of
multi update writes I can, butthis one, if I swap it out can't.

(26:10):
And so the API will still staythe same - here's a set of key value
pairs I want to store - butsome databases can do a transaction,
other ones can't. And so youdo get all the behaviors. Or, for
example, another common one,more so, is with the Pub/Sub message
providers. Some of themprovide sort of different capabilities
on their consumer groups orthey might provide their own retry

(26:33):
mechanisms. And so all thebehaviors of the SDK itself for the
particular infrastructure doget surfaced up in a YAML description
in a component. But theinterface itself, I should say the
API itself is very clean. ForPub/Sub it's just publish and subscribe
- literally, that's all it is.But you can choose the behaviors

(26:54):
in a YAML format for theunderlying message broker that you
want. So for example, Kafkahas its own retry mechanism and you
can choose that if you want,or you can choose its own security
authentication mechanism,things like this. So you do get those
behaviors of the underlyingcomponent to choose from as well.
And people often submitupdates to the components for a particular

(27:16):
SDK just to surface it in thecomponent YAML to use. So to answer
your question, yes, you canuse a component... the underlying
behaviors of the component or infrastructure.
Yeah, that's one of the things whe you'lllookatone serviceandyou'relike,theyspentalotmoretime
workingonthisthantheotherteamdid. AndS3isoneofthosewhereit'sjustlikeyougo,when

(27:39):
you gomovingthedataiseasy,butthen you getoverthereand
you're like,ah,there's whereit's just like moving the data is
easy, but then "Amultipart" uch
Yes. So you do get that if youdo choose between moving from one
message broker to another, forexample. There is a forced choice
of like, okay, you might havehad a feature in one that isn't inherent
in another. And so you'll loseit or gain it, whatever.

(28:01):
Yeah, but my code doesn't change.
But your code doesn't change. Yes.
Which is still... that's thelock-in hard part.
Yes.
Okay, so I think the otherhard part is. So Dapr is definitely
not like, "Hey, I'm a startup,I don't have code yet, let's go to
Dapr." Like I'm going toHeroku or whatever first. right?
So you're pitching teams thathave debt, code, just tons of stuff
deployed. Like what does their0 to 1 look like with Dapr? Like

(28:23):
what are the baby steps to,like, how do we introduce this to
an environment? And how do westart using it without overhauling
all of our services to start gettin the benefits?
Yeah, exactly. I mean, just tobe clear here, Dapr isn't just for
new greenfield applications.In fact, most commonly it's used
in the modernization ormigration of existing applications.
And so I mean generally Dapritself... I think the bigger challenge

(28:47):
more than anything is thatDapr, the open source project itself,
mostly is run from aproduction point on top of Kubernetes.
And so just generallyKubernetes is hard for startups as
a whole, I would say. But wealso get people that just run Dapr
on VMs - you can just do that.But you know, Kubernetes is still
more of the challengingenvironment itself, but there's a
very easy integration to it inthat you deploy a control plane service

(29:12):
that does all of themanagement and launching of the Dapr
sidecar for you. So actuallyfrom a Kubernetes perspective, it's
very easy to deploy, and run,and upgrade, and manage. But you
know, it more depends on areyou prepared to take on Kubernetes
as your underlying hostingsystem to host and run your code.
But t get started, there'ssome great getting started guides.

(29:34):
It's very easy to install Daprlocally on your local machine. There's
just a cCLIli,youcandeploy itall locally. You can just do Dapr
init and it sends up adevelopment environment for you with
Redis and Zipkin and a bunchof other local services. You can
just run all your serviceslocally. It sets up Redis as a local
Pub/Sub and state store foryou - just run and test it all out.

(29:56):
And then what really happensis you then deploy Dapr as a control
plane service into yourKubernetes environment and then really
switch out these componentsfor the environment you want there,
and you're off to the races.So it's pretty easy to set up and
manage itself. I say thegreater challenge is a Kubernetes
environment around these things.

(30:16):
Well, so for teams that arealready like... let's say they're
running Kubernetes, mean,I'm sure there's plenty of like getting
started use cases, but like,do you typically see teams like,
"Hey, we're going to startusing dapr and we have a new service
that we're building, so we'llbuild it using Dapr."? Or do you
see them saying, "Hey, youknow what? Like we know that we want
to move off of Kafka toRabbitMQ or something elsea,andndsowe're

(30:37):
going to use Dapr to make itwhere we don't have to go through
like the code thrashing asecond time."?
I would generally say, itwould be my opinion that you should
not build anything onKubernetes as an application without
Dapr. I would make that claim.If you were trying to do it yourself...
Ooh baby. Hard take.
Yeah, I think it solves somany more of the problems that you

(30:58):
will just end up reinventingthe wheel. And there was a good case
study actually came out just afew days ago from a company called
DataGalaxy, and they weremodernizing an existing... they called
it their own spaghetti codething. And so they kind of went to
a more... still a singlebinary, but it was a modular architecture
insid all that. So theydidn't necessarily break it up into
lots and lots of differentpieces of code, but they went from

(31:19):
a spaghetti architecture tomodular architecture using Dapr Pub/Sub
messaging. And they didintroduce another new service that
they were sending messages to.So i you look up the DataGalaxy
case study, I think that's agreat example of how they modernized
existing code and split it andstarted using Dapr Pub/Sub APIs.
Introduced Dapr into a newpiece of code that they wrote that

(31:41):
was receiving the messages,because it was effectively going
off and doing some remoteprocessing and the sending a message
back. I would stronglyrecommend that anyone who's starting
to build on Kubernetes shouldstart to use Dapr from the gget-goetgobecauseitjustreallysolves
a whole bunch of problems. Notjust in terms of behavior, but in

(32:03):
terms of these crosscuttcross-cuttingingconcernsItalkedabout.
well-testedImeanhonestlyIfeellike...that quote's
going in the video preview.We're going to get views on this
one. But I feel like also, Imean going back to that good, welltestedhexagonalcode,there'ssomuchpainto
boilerplatetheadaptersaroundeverysingle cloud

(32:25):
serviceyouuse.AndIfeellikeassoonasyoustartgettingoffofapassandinto thecloud,thatproliferationofservicescanexplode
quickly.
Exactly right. Here's anotherthing that we encounter a lot - there's
a lot of frameworks out there.And I will point particularly to
Spring Boot as one who wouldalso say, "Yeah, we do a lot of the
things that Dapr does" andit's true, they do. You know, they

(32:46):
have discoverability, and theyhave messaging, and they have abstractions
over Pub/Sub and things likethis. And, you know, even the Spring
Boot community is very nice.But that's just the Spring community.
And what we've seen is thatmost of the development nowadays
has a mixture of languages. Infact, you'll be amazed of how much
Python and JavaScript now iscoming in to existing applications.

(33:07):
And so Dapr also plays acrossdifferent language boundaries where
the Python people or Pythondevelopers can work nicely with the
existing Java developers and,you know, send messages between them
all. So we see a lot of thathappening as well.
Yeah an that's the hard part. feel like, especially bigger companies,
they're buying otherorganizations and you have no idea

(33:29):
what language in cloud justgets introduced overnight. So tell
me more about Diagrid. Let'ssay I'm using Dapr today. This is
great. I love it. My Devs loveit. It's super easy for us to swap
out underlying layers. Theircode's not changing. I'm having a

(33:49):
great time. When do I startlooking at Diagrid as an enterprise
version of this?
Great question. So just to beclear, by the way, Dapr is part of
the Cloud Native ComputingFoundation (CNCF).
Hell yeah.
It's a graduated projectthere, which is the highest level
of endorsement that you get,which means that they did all their
due diligence around it all.And actually, myself and my co-founder,

(34:12):
Yaron Schneider, we were theones who actually started the Dapr
project when we were atMicrosoft and left to form Diagrid.
And today at Diagrid, we areactually the primary maintainers
of the Dapr open sourceproject, along with lots of other
companies - Microsoft andNvidia and Alibaba and Intel all
contribute to the project.There's a number of contributing

(34:33):
companies to the open sourceproject, but what we do at Diagrid
is that we sort of provide twothings. First, we provide Diagrid
Enterprise, which iseffectively enterprise support for
organizations who've adoptedDapr but need sort of the core skills
and things to be able to helpnot only with architecture guidance,
but, you know, incidentsupport and fixing things upstream.

(34:55):
So we provide support. We alsoprovide a very cool tool called Diagrid
Conductor that allows you tomanage Dapr on top of Kubernetes.
It allows you to do rolloutsof new versions of Dapr Control Plane.
It gives you greatvisualizations. It gives you this
advisor tool that looks acrossall your components and your infrastructure

(35:16):
and finds things that you mayhave misconfigured. Effectively it
just makes an awesomeexperience for when you manage Dapr
in a Kubernetes environment tokind of get all of this incredible
data that's coming out of Daprin terms of metrics, visualization,
and behaviors, and justvisualize it all and manage it. Then
we also provide our ownversion of Dapr Distribution which

(35:39):
is just a more secure versionof this that has some other features
inside it all. But yeah, mostof our engagement and efforts is
kind of making Dapr Upstreaman amazing product to use. So that's
what we do there. And thenwhere we transitioned to recently...
which also is about takingpeople down the journey... is that
we've created a server productfor Dapr which allows you to, rather

(36:02):
than take the open source oneand host and run it Kubernetes yourself
and deploy DAPR on Kubernetes,our server product is called Diagrid
Catalyst and you can deploythis into your environment. And now
you can run and deploy yourapplication on any form of compute
and call into Diagrid Catalystas a Dapr server and use all the

(36:22):
APIs there without you havingto manage and upgrade and deploy
Dapr itself, because we takecare of all that for you. When you
really go down your journey,this really helps you kind of use
Dapr in a multitudeenvironment rather than just being
bound to Kubernetes.
Yeah. So then I can run VMs,or if I've got like a Nomad cluster

(36:45):
I can schedule work there.
Exactly, yes. And then thatsort of thing. I also think that
one of the most important APIsthat we introduced into Dapr in the
last release was a workflowengine. Are you familiar with workflow
engines?
Baby, I am a workflow engine.Sorry, yeah. I mean there's a lot

(37:05):
of different ones. There'slike CI workflows, ML workflow engines,
like the general purpose ones,like the Argos of the world.
Yeah. Well, I mean here thisis kind of like a business activity
workflow engine. So if you'relooking at the likes of Camunda and
Airflow and Temporal... youknow, if you love those, Dapr has
a workflow engine built intoit all that you write your business
logic into. So it goes, "Callthis service here, send a message

(37:29):
to this person here, checkthat this order is in process." Imagine
that you've got ahundredsthundred-stepepprocess.Think durable
execution. Durable executionisprobablythemostimportantthinginanapplication,becauseirrespectiveofthefailureofthe underlying
infrastructure orthehardware,durableexecution guaranteesthatyourapplicationanditscodeandthestate

(37:53):
machine will complete.Yes.machine will complete, or the series
of steps will complete. So ifyou of100and it diesbecausethemachinedied,youdon'twant
tohavetorestartthewholethingagainandyoudon't wanttogothroughall
50 stepsagain. Andhowdoyoutrack that?It'savery,veryhardproblem.Durableexecutionsaves
all thestateandallofthepreviousstateof very

(38:14):
hard problem. Durableexecution saves all the andcarrieson.
So workflowinbusiness iscriticallyimportantanditallowsyoutodo it
was and recovers it andcarries on. Workflow in business
is critically importanteveryoneispicking up it coordination,
so Dapr's youwriteitincodeyourselfrather than going...
it'smodelandyoucansortofeasilydebug it thenand thingslikethat.,
s

(38:38):
Yeah, so like inventorysystems, logistics systems... this
is a good fit here.
Exactly, yes. And being codefirst means that you can write it
in Python or Java or C Sharpor Go or any one of these languages.
And you can set breakpoints onit. And at its simplest level, you
have a workflow and have a setof activities - I do activity one,

(38:58):
activity two, I might doactivities in parallel and when they've
all completed come back. Sothere's a number of different sort
of workflow or state machinepatterns in the business process
that it enables. And it'svery, very powerful when combined
with the other Dapr APIs likePub/Sub and Secrets and State and
things like this.
Yeah, I feel like one of theuse cases that I'm kind of hearing

(39:19):
that I think I'll get a lot ofvalue out of Dapr - and I'm definitely
going to be playing with it inthe next couple of days - I see a
lot of stuff in our code basewhere I'm like, this would have made
moving so much easier. Thequestion is, is there an Erlang and
Elixir version? So that's whatI'll have to... well, we have stuff
in Go as well. I feel like oneof the things that can be difficult
for a developer who's workingin the cloud is like, I've gotten

(39:43):
to the point where I'm like,"Oh, you know what? I realize that
I need event services for thisor I need Blob Store and my app right
now just has Postgres... 13,14, whatever. I need to add this
new thing." But like, I don'tknow, just looking at awAWSs,likehowmany
message services do theyoffer? There's a lot depending on
how you decide you want to cutthat thing up. Right? And so like,

(40:04):
I'll sometimes spend sometime, like, "Let me see if I can
just do this with like SNS andSQS." It's like, "Okay, does it have
th properties that I need?"Or maybe I had to go over and look
at MSK, right? And I'm justtrying to figure out like, which
one of these services is goingto give me the properties that I
need so I can build mybusiness logic. And I feel like Dapr,
i I'm building around thisSDK, that makes the code part a lot

(40:25):
easier. And now I'm just kindof switching out the config like,
"Hey, let's see. Oh, you knowwhat? This doesn't support first
in, first out." Like if needfirst in, first out for whatever
reason in this app, that one'sjust gone. I don't have to rewrite
my code. I just swap theconfig to something that supports
FIFO, right? I Ifeellike aplace where this might be an amazing
use case to introduce in somegreenfields for probably some pretty

(40:48):
big companies is just aroundall the stuff that's happening in
AI. Internally, we've builtour own stuff on SageMaker, we've
used some MCP stuff, we'veused some Claude things, and we're
trying to figure out whichone's the right thing for it. And
it's like every time we changewhat we're using, it's like, "Man,
it's reflecting that in code."Is Dapr doing anything in the LLM

(41:08):
abstractionsabstractions,likeMCPabstractions?
Oh, yeah, yeah. I mean, we'rebig in this space and I think that
there are two kind of very keytrends that are happening. So one
is just the integration ofLLMs into your code. And here's what
I call an LLM. So weintroduced, in the last release,
a conversation API. And theconversation API is literally a prompt

(41:30):
API - here's a prompt, andthen you can plug in the different
components for any one of theunderlying language models. So you
can, you know, plug in OpenAIor Hugging Face or Anthropic or DeepSeek
or whatever else you want tobehind it all. I can just call the
Conversation API, call aprompt, and just like we've told
about swapping out messagebrokers, I can swap out underlying

(41:52):
LLM API calls and just switchthem all out without having to change
my code. Plus it providesadditional capabilities on that.
For example, it doesobfuscation of any data that comes
back and it actually doesprompt caching generically across
all of them, even if theydon't have prompt caching. So it
adds those sort of features.So you could do that, just add that.
I think the more interestingand exciting thing that's happening

(42:14):
is this rise of agents. And Idon't know how much you're getting
in and hearing talk aboutagents generally, but, you know,
agents now is whereeffectively we're trying to produce
and automate human processes alot more with language models to
do those things on theirbehalf. So we've always done lots
of automation, we've liked toautomate things and we have very

(42:35):
deterministic automation. Butnow the automation of business processes
using language models to makedecisions is kind of becoming a hot
thing around all this. Let megive you an example. I was talking
to a logistics company acouple of weeks ago and they have
a warehouse manager that sitsin a warehouse and his job is literally

(42:56):
to go and query a databaseevery hour and look at what orders
haven't been sent and whatneeds to be sent for the rest of
the day. He's doing a bunch ofqueries and looking these things
up. And they're like, "Well,it would be so much easier if he
could just have a little agenton his behalf who was like just doing
all these things. Like, "Whatorders are critical or what do I
need to do next because thisorder hasn't gone?", "You need to

(43:17):
push this one to the top ofthe queue because if it doesn't go
out tomorrow, they're going tobe pissed." And so they wanted to
build a little agent that wasbasically a warehouse manager that's
looking at all the orders andlooking at their database and all
these things. Working on theirbehalf with th language model, making
those decisions and basicallytaking hold of that process or helping
this warehouse manager. That'sjust one example. The way I look

(43:41):
at this is that all this talkabout agents is really a Distributed
Systems problem with LLMs.It's Distributed Systems, but everyone's
made up lots of new names like"memory" and "agentic systems" and
things like this. Dapr is verywell grounded in key distributed
systems principles likeworkflow and messaging and state

(44:03):
management. And so back acouple of months ago, we introduced
this framework in the PythonSDK called Dapr Agents. And it really
helps you build agents thatare stateful, long-running and durable
because it's built onworkflow. That they communicate really
well with Pub/Sub messaging.And they very much are sort of, I
would say, enterprise ready.To a large degree, I think a lot

(44:26):
of the agent systems orframeworks out there today are still
very immature in theirenterprise readiness in terms of
being durable and recoverable.And most of them are sort of like
- as soon as the thing fails,I have no idea what I did before
and I've got to start again.So yeah, I see the rise of this agents
thing as part of thecontinuation of distributed computing

(44:49):
with models being the sort ofdecision makers amongst us all. And
we're going to be pluggingthese things everywhere. But you
know, the Dapr agentsintroduction that builds on the core
APIs of Dapr is an amazingpiece of technology to help you build
these sort of longlong-runningrunning,durable,statefulagents.
That is very cool. Well, Iknow we're at time. I really appreciate

(45:09):
you coming on. This soundssuper cool. I'd say if you're a Developer
and you're listening, andyou're like, "This sounds pretty
awesome. Like, how can I startusing it?" It's a clCLIi.Youcanjust
drop the CLI locally on yourmachine and like start playing through
your tests. Like, let's tryusing the Dapr SDK. If you're an
Ops person, sounds like ifthis is exciting to you - Platform
Ops, DevOps person - like, itmight be good to find an engineer

(45:30):
on your team that you know islike struggling with a cloud service
that may need to be swappedand like, that might be a good gettin
started for you.
Exactly, yes. Yeah. Super easyto deploy, try out, test. Yeah, exactly.
You summed it up beautifully.Ops people should care about it because
it makes their lives easy withthe application developers. Application
developers should care aboutit because it takes away all the

(45:50):
boilerplate code.
Yeah. And Ops people, weknow... we all know that you have
it the hardest.
Yeah, you do.
We know that you don't haveenough time. You're outnumbered.
You need somebody on yourside. Dapr's gonna put you in a top
hat and a tie, it's gonna makeyou all professional. Like, you're
gonna love it.
You're squashed in the middlethere between the application developers,
who get all the demands, andthe infrastructure below you, and

(46:11):
you're like, "I'm in themiddle here. Help me."
And the CFO's mad at youbecause the cloud's expensive. Like,
everybody's on your back.Compliance team's after you. Like,
we're running and we needhelp, and it sounds like Dapr is
one of those tools that candefinitely get you there.
Yes.
Where can people find Dapr?Where can they find Mark? Where can
they find Diagrid?
Yeah, if you go to Dapr.io,start there and dive in. And I would

(46:32):
suggest that that's where yougo and look at some of the case studies
and then sort of try out thequick starts. If you want to come
to Diagrid, it's Diagrid.io.We have a set of services there.
We talk about enterprisesupport for Dapr or our Catalyst
Server, which is a Dapr serveraround these things. Reach out to
us and feel free to send me anemail anytime at mark@diagrid.io.

(46:53):
Happy to hear from you. Happyto hear about what you're interested
in. Happy to talk about Daprbeing used in any one of your solutions.
I just love talking todevelopers and platform engineers
and just hearing theirstories. So please reach out.
Yeah. And that Catalystproject actually sounds really interesting.
I mean, there's so many teamsthat are stuck like, hybrid. Like,
they've started moving thingsto Kubernetes, but they've still
got a bunch of VM workloads.That sounds like a really good fit

(47:16):
for many operations teams thatare kind of straddling these two
environments today.
Exactly. Yes. I mean, Dapr isjust about helping you ease your
journey for building andrunning these applications at scale
around these things. And theCatalyst server just eases that whole
burden.
Awesome. Well, thanks so muchfor coming on the show, Mark. I really
appreciate it. This was superexciting to learn about Dapr. And

(47:37):
I'm honestly... I wish I'dlearned about Dapr like two and a
half years ago before I justtook this like...
You need to do some re-engineering.
Oh, my gosh.
A mini Dapr
Yeah. I mean, I could probablykill half our code base by just deleting
all the abstractions thatwe've built, man. I'm going to thin
this thing up. Awesome. Well,thanks so much. And yeah, hit that

(48:00):
subscribe button. Please rateand review on Spotify or whatever
podcasting platform you'reusing and I'll see you next time.

All Episodes

Episode Transcript

Popular Podcasts

On Purpose with Jay Shetty

Ruthie's Table 4

The Joe Rogan Experience

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Building Better Platforms with Dapr: Abstractions, Portability, and Durable Systems with Mark Fussell

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}On Purpose with Jay Shetty

Ruthie's Table 4

The Joe Rogan Experience

All Episodes

Building Better Platforms with Dapr: Abstractions, Portability, and Durable Systems with Mark Fussell

On Purpose with Jay Shetty