All Episodes

December 3, 2024 • 48 mins

Send us a text

What happens when your mobile app needs to perform flawlessly across thousands of different devices? Meet Hanson Ho - Android Architect at Embrace. In this episode, Hanson shares battle-tested strategies from his experience at Twitter and beyond. Diving deep into real-world challenges, Hanson reveals how mobile observability has evolved from basic crash reporting to sophisticated performance measurement. Learn why traditional monitoring falls short in mobile environments, how to measure what truly matters for user experience, and why the industry is rallying around OpenTelemetry as a standard.


Where to Find Hanson


Show Links


Follow, Like, and Subscribe!

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Hanson (00:00):
At Twitter, we had to build an entire protocol based
on the infrastructure that wehave to measure these client
performance things.
We called it PCT performanceclient tracing Very similar to
what I'm working on right now,which is OpenTelemetry's tracing
API, creating spans With thatdata, the ability on the SDK to

(00:22):
measure this.
You can start looking at yourdifferent workflows and how long
it took and you can see a widerange is depending on devices,
depending on what you'reactually trying to do.

William (00:44):
Coming to you from the colossal Cloud Gambit studio.
This is your host, william,today.
On the pod we have Hanson Hoand you're in Western Canada.
Right, I'm in Vancouver, canada.
Awesome.

(01:07):
Have you ever heard of a sportcalled?

Hanson (01:08):
uh, hockey, is that something you've ever heard of?
I might have, you know, seen itonce or twice in my life.
Uh, you know it's prettyobscure, but uh, yeah, yeah, a
little bit awesome.

William (01:18):
Are you a vancouver fan ?

Hanson (01:19):
by any chance I am of course, naturally, uh, a sad,
sad that grew canucks fan, theuh franchise with, oh, 50 plus
years of not winning the stanleycup, despite getting close
twice in my lifetime.
Um, but you know, it's, it's.
It's what is what is sports ifnot for the elation of the
victory and and the struggle toget there.

(01:41):
It's, it's the journey, not the, the process.
That's, or not the destination,that's what I tell myself,
right?
So when, if we win it, it'lljust be less good because you
know the anticipation will beless.
So that's what we all say tocope all kind of absolutely.

William (01:53):
We have a young vancouver's, got such a good
young core.
Now quinn hughes is an awesomecaptain, an awesome defenseman,
and it was that that oilerseries last year was.
It was crazy.

Hanson (02:06):
We played out of our skis, despite not having our,
our starting goaltender.
You know that you'd Emco isexcellent, excellent.
We.
We threw in a guy who's see,loves who's oh God 10 games, 15
games in the regular season ever.
We did pretty okay, despiteplaying you know, the best
hockey player ever in terms ofphysical ability and skill.

William (02:26):
We did a good accounting of ourselves.
Yeah, he was a skating highlightreel in that series.
Some of the stuff he was doingwas wild.
I could actually talk aboutthis all day.
I better stop myself now beforewe get too lost.
But yeah, really glad to haveyou on and you have such an
interesting background and suchan interesting field that you

(02:49):
work in.
It's a field that I've actuallytalked to a few folks on the
pod about Open Telemetry andyour background.
You've worked for some prettybig companies that folks out
there may have heard ofSalesforce, twitter, among
others.
Do you want to start us off andjust kind of go through your

(03:12):
background and sort of how yougot into mobile app development
specifically, and it doesn'thave to be super detailed, but
yeah, just a brief overview.

Hanson (03:21):
Sure, I think 2015 is when I first got into mobile
development.
At that time, twitter washiring furiously, especially to
improve its services on themobile clients, especially for
folks who are in emergingmarkets with devices that are of
low quality and the networksthat are, shall we say, iffy at

(03:41):
best, and the task that I was,or the team that I was working
on was tasked with, wasimproving the experience of
folks like that.
So that was the first time Istarted working on Android and
that was the first time Istarted working on you know
anything that's really with alot of network involved and also
device quality.

(04:01):
I'd used an iPhone previous,but then hopped onto the Android
ecosystem and found itfascinating, just because of the
diversity.
The problems that we have tosolve is a lot more challenging,
just because of the number ofmanufacturers of devices, the
quality of some of those devices.

(04:22):
I mean, they're meant to beaffordable and you have to do
certain things to make devicesaffordable, especially back in
2015.
So I learned how to optimizeperformance, network experiment,
do experimentation at lowerlevels.
What are the impacts if wechanged the concurrent machine
requests to four, if we changethe window size and do things

(04:46):
like that that aren't naturallyinclined for mobile software
developers to do tweak thingsunder the hood and then see what
the differences are.
And from that point on I found apassion for stability and
performance on Android and letme kind of become somewhat of a

(05:08):
bit of an expert within thecompany about performance and
things like that.
And one thing I got especiallyinterested in with is how do you
measure performance and have itrelate to actual impact of
users.
So you know, p50 changing.
What does that actually mean tousers?

(05:28):
And to develop a more nuancedunderstanding of performance, we
had to experiment and we had togather data and you know things
happened at Twitter and then Ibecame, you know, a free agent
and I hooked up with Embrace,whose you know purpose is to
help folks build better mobileapps.

(05:49):
So adopting OpenTelemetry aspart of the SDK and using that
as a vendor-neutral, agnostictransport and format for
telemetry is an excellent fit.
And where I am now is trying tohelp kind of improve the SDK as

(06:14):
well as just open telemetry andmobile telemetry in general and
standardize it.
It's an area where you knowfolks, when they use an
application, tends to be with amobile phone these days, whether
it's an app or a website, thething that's supposed to use is
the one you use, andunfortunately, mobile devices
aren't the most trustworthy.

(06:34):
You know, the things I talkedabout previously, with emerging
markets and Android devices,still apply.
Not everybody's got a fastphone, and now the problem is
just a lot worse because we havea lot more Android versions and
we have higher expectations aswell.
So how do we ensure thatperformance is up to speed?
Well, you got to measure andyou got to do it in such a way

(06:56):
that you have to be aware of thenuances of the mobile
environment.
We don't know when thingshappen unless the device tells
us, and sometimes devices justdon't tell us because they've
gone offline or if the appprocess gets killed by the OS
because there's something in thebackground that's using too
much memory.
So it becomes a reallyfascinating challenge to not

(07:20):
only capture the data, send thedata, but do it as economically
as possible.
So, yeah, I take it as anever-ending improvement of how
we can best do this.

William (07:37):
Yeah, it's funny bringing up the Android
ecosystem.
I was talking to an Applemobile developer, not on the pod
, but just somebody that I knowand he was.
I was.
I was kind of giving him.
So he has iphone and I have agoogle pixel and he just he
always just throws something inthere like we could be going out

(07:57):
getting something to eat, itdoesn't matter, he will make.
He like spares no moment tojust give me flack for having an
Android and I.
So I give it back to him alittle bit.
But he was sort of venting aboutsort of what you said.
He's like I could never work inthis.
You know this, the most complexecosystem in the world, like
Android, when Apple, we justhave this many OS versions, we

(08:18):
have this many phone models andthat's it.
Everything's locked down.
It's very consistent.
But you know, somebody couldfork.
You know android and do thisand that, and then there's, like
you know, a gazillion differentdevices everywhere, you know.
And he he was talking about howI guess the his from his view

(08:38):
like how would you even begin ifyou were going to build a new
app?
How do you even build astrategy and and think through
and profile how you're going tobuild a new app, how do you even
build a strategy and and thinkthrough and profile how you're
going to go through the, youknow, app development process,
how you're going to beginputting things together?
Um, what, what do you thinkabout that?

Hanson (08:54):
uh, strategically from the beginning, um, thinking
through the android way of doingthings so the thing with
Android is, the platform itselfhas improved quite a bit since I
started working on it the OSitself, the API on top of it,
and also the Android platformand SDKs that Google offers in

(09:15):
terms of what you need to do inorder to just have an app up and
running.
Up and running, so higher levelframeworks for UI, like Compose
.
You have various methodologiesand patterns to create apps.
That isn't as free for all asbefore.
You have libraries that areeither official, supported by

(09:39):
Google, or de facto official,like OKHTTP, where literally not
literally, but like almosteveryone uses it.
So the fragmentation has gonedown and the soundization has
gone up, and the things you needto do to make your app good and
usable for most cases, you know, is a lot easier than it is,

(10:00):
you know, nine years ago.
Problem is that the long tailis quite long.
We have mobile devices thatwere released nine, 10 years ago
that are still in use usingAndroid OSs that are still
supported.
So Embrace supports up toAndroid 5, which was released.

(10:21):
I want to say, oh God, I'll getthis wrong 2014, 2015, something
like that, maybe 2016.
No, probably 2015, the earliest.
But you have APIs that aredifferent.
You have built-in librariesthat behave differently.
Tls 1.3 isn't even supported onAndroid 1.7 onwards or previous

(10:44):
to 1.7.
So you have to install your own.
The app has to installadditional libraries in order to
get the proper networking codein order for it to talk to
modern servers that haverejected insecure protocols.
So you have these degenerateuse cases that become difficult

(11:08):
to support.
And if you're just starting off, don't worry about all that.
Just have a higher version ofAndroid to support and your APIs
are a lot smoother and you'redropping 20% of your total
addressable market, but you canget started a lot easier.
So my suggestion is don't makeit work for everybody first.

(11:28):
Just make it work for mostpeople, and Android does a
pretty good job of letting youwork for most people.
Now, big companies, companieswith large user bases,
especially user bases that areperhaps not super savvy they're
just a regular big box retailerthat cater to you know folks of
all ages and all you knowtechnical, you know astutenesses

(11:52):
Well, they might have a phonethat their grandkid gave to them
from eight years ago and theystill want to use.
You know the app or your app,and if you don't support, you
know the particular versionthat's older, well, they're not
going to be able to use it.
So it is important to kind ofbe aware after you got things
working for 80% to improve yourexperience, for you know the

(12:18):
other 20% that's out there.
And the faster you make yourapp, the more of those older OSs
, older devices, worstperforming devices, devices
become usable to you.
So you effectively increaseyour total addressable market.
You can have these older phonesthat be able to use your app.

(12:38):
So I would progressively add, asyou can, mobile is
unfortunately an unending pit ofproblems and crashes and
interesting bugs that happen.
So you can't ever fix anything,everything.
You just have to kind ofprioritize and triage.
It's not about what, it's notabout not dropping any balls.
It's about dropping the, thesmallest balls and the least

(12:59):
important balls.

William (13:00):
So I love that.
That's so good and it's so true.
Good, honest answer, that'sgreat.
Um, well, I don't want to getahead of myself.
I was about to jump intotelemetry and optimization, but
I guess kind of going going back, you know, just for the
audience.
So, um, you're, you know you're, you're an android architect.

(13:23):
What, uh, let say, whatlanguage do software engineers
typically use when building amobile app for Android?
Is it like C++ or somethingelse?
What's sort of the standard?

Hanson (13:35):
Kotlin in 2024 is the standard.
There are mobile developers outthere who do Android, who don't
even know Java, which would beunthinkable 10 years ago
Impossible, actually 10 yearsago.
But these days Kotlin is, youknow, for your iOS, other iOS

(13:56):
folks you know.
Maybe they're familiar withSwift versus Objective-C.
It is similar to, I guess, theKotlin versus Java difference.
You can do native libraries inC++ as well.
A lot of graphics, a lot ofheavy intensive work is done
typically in the native layerlike that.

(14:17):
But that's not most of theexperiences of Android
developers.
It's just usually Kotlin usingCompose to write their apps.

William (14:27):
Gotcha Awesome.
And so, as you, okay, you beginto build an application and, of
course, kind of like what youwere talking about earlier, like
you can't really understand,quantify or calculate much
unless you measure it, and youalso like when it comes to

(14:48):
measuring things.
That means you know testing.
Testing is like the hallmark ofamazing development in general.
Can can you speak to some ofthe?
You know your thought processand practices that you might use
as you begin to think through,profile and optimize?
Like, what are you thinkingabout when you're writing tests

(15:10):
towards the beginning and howare you thinking about measuring
things?
Just from, like a firstprinciples point of view?

Hanson (15:18):
So first of all, you have to know your app works and
you know as much as unit tests.
Integration tests are useful.
They capture only the scenariosthat you can test for for the
devices you can test for, forthe cases that you think are
most important.
First of all, having automatedunit tests and integration tests

(15:41):
is table stakes.
If you don't have that, yourapp is not going to stay high
quality forever, especially evenif you have the best developers
in the world.
So you know, the first place ismake sure the things you check
in don't cause a regression, andon platforms that have
different APIs, as Android has,it's easy.

(16:01):
So locking that down important,you know.
Testing, you know.
Clearly, for edge cases, that'simportant too.
But beyond that, havingtelemetry of what your apps are
doing in production is equallyimportant.
So traditionally for mobile apps, people track things like
crashes.
You know it's always crashes,crashes, crashes, crashes,

(16:22):
crashes, crashes.
What's my crash rate?
What's my crash free rate, allthat stuff.
And the reason folks do that isbecause that's the easiest
thing to track.
Crashes are a discrete event.
You have SDKs like Crashlyticsthat will basically do this for
you Capture what's bad, tell youwhen bad things happen and then

(16:44):
you have a sorted list of oh,these things are the bad things
that happen, and burn throughthat list and for a long time.
It's about how do we reduce thenumber of crashes and different
instances of crashes.
So people have rolloutprocedures in terms of dog food,
beta and then you do apercentage rollout 1% for 24

(17:06):
hours and you ramp it up to 100%if nothing bad happens At each
stage, checking to see if thereare new crashes and if there are
old crashes that have gonehigher because some things out
have changed.
And when it has, you search forthe right team to own it and
then you kind of fix it and youpatch it and you re-release.
Typically that's the workflowof mobile stability and that

(17:32):
works well for the most part,until you realize that not all
bad things happening on mobileapps result in a crash.
Sometimes they could just beslow, sometimes things just
don't happen the way you thinkthey would in terms of the
amount of time it takes.
So we've developed other metricsfor it.
On Android there's somethingcalled A&Rs, which is Android
not responding, application notresponding, which basically

(17:53):
screen freezes.
Ios has something similar.
You know we also have like jankmeasurements.
You know frame drops Basically,when you scroll a list, you see
janky UI of frames dropping.
Well, that's considered.
Your main thread is being toobusy to process the new images

(18:13):
coming from the UI thread.
Those are okay.
Again, people track thatbecause that's what Google gives
you.
You can go to the Google Playconsole and see some of this
information App startup, alsosomething that folks track.
But bad performance couldhappen in any other stage.
And how do you actually detectthat?

(18:34):
Well, this is where kind oftracing comes in, creating spans
.
I mean, I think backenddevelopers are very familiar
with the notion of spans andtraces.
You break a workflow down intoeffectively a tree of workflows.
The top one represents theentire workflow and then you
have child spans that representsub-workflows.
If you have a network request,for instance, there's the

(18:56):
construction of the requests,the sending of the requests,
waiting for the server to comeback with the data,
deserialization and thenpersisting the response in the
format you want.
You break that workflow down andthere is no really good
platform on both platforms, iosand Android to actually give you

(19:20):
an SDK and also the dataexported in a way that you can
actually slice and dice tomeasure your performance,
arbitrary performance.
So you only click a button andsay how long does it take for an
image to load.
There is no easy way to do it.
So at Twitter, we had to buildan entire protocol based on you

(19:45):
know kind of the infrastructurethat we have to measure these
client performance things.
We called it PCT performanceclient tracing.
Pct performance client tracingvery similar to what I'm working
on right now, which isOpenTelemetry's tracing API,
creating spans.
With that data, the ability onthe SDK to measure this you can

(20:10):
start looking at your differentworkflows and how they actually,
how long it took, and you cansee a wide range of times
depending on devices, dependingon what you're actually trying
to do.
You'd be surprised.
Well, you may probably not besurprised, but app startup could
differ by 10x depending on whatdevice you're using.
So, you know, if you're using anewer phone, having app startup

(20:32):
take more than a second mightseem ridiculous.
If you're using, you know, aMoto X from 2015, it's de
rigueur for it to take eight anda half seconds.
People are used to, you know,staring at their phone and
waiting, and it's okay becauseultimately they're used to the
experience and they start up.
So, going back to what you weretalking before, what do we

(20:54):
measure in production to knowthat folks are actually you know
, things working properly.
Duration is great, fast and slowis great, but whether it
succeeded is the most importantthing, because typically, when
you track telemetry forperformance, it's well, when I
succeed, I'll record how long ittook, took.

(21:18):
Well, what if the user gave upand just closed the app?
What if the app crashed in themiddle?
Well, your server doesn't knowthat.
It doesn't even know that yourclient has initiated something.
So, before you track duration,track the fact that something
has happened.
That's the most important thing.
Did the thing that the user istrying to do actually happen,
and then you can buildperformance into it afterwards.

William (21:39):
Such a good answer and that just made me think about a
lot of other things.
I'm trying to think of how toframe this question.
So I work for a softwarecompany that basically builds
software on the cloud providersto connect different clouds,
connect on-premises sites,things like that.
We're microserviced out, we'vegot, you know, all these backend

(21:59):
APIs that you know we're really, you know, modern as far as,
like, software development'sconcerned.
But one of the things we'verecently been working on and you
know we're, you know, by thetime this is going to go out, we
would have already released it,but we have a Zero trust
network access or ztna, thatwe're building and one of the

(22:19):
things that was really importantto us.
For obvious reasons it wouldprobably take too long to go
deep, deep into.
But um, separating, basicallyseparating, like control plane
from data plane type mechanisms,um, where the actual traffic
and things are passing and doingthings.
So I guess what I'm getting atis like based on, you know,

(22:43):
technology today there's so manydifferent ways to do things and
one of the things that's reallyimportant with gathering,
telemetry, with anything to dowith where networking is like a
dependency, really is doing theright thing in the right place,
so does that?
Is that something that you'rethinking through a lot like?
Okay, are we going to do thisat the end point?

(23:05):
How are we going to do it whenwe're going to shoot stuff back?
How we're going?
You know all those differentthings.
What is the challenge there?

Hanson (23:12):
um, if you can go into that, so, coming from a mobile
developer background intoobservability, open telemetry in
general, the one big differenceis the operating environment.
So in the back end, when youhave telemetry that's being
recorded, you're fairly surethat you're not going to lose

(23:34):
data, or at least not lose it ina significant way, without you
knowing it.
You can also know that yourexecution environment is fairly
well controlled.
You're not going to haveclusters that suddenly stop
providing enough RAM for you todo things or suddenly say I'm
not going to schedule you nowthread because I'm busy doing
something else, because GC ishappening and I need to take two

(23:56):
seconds to pause everything.
So switching to a mobileenvironment where, where
execution is on the winds of theos, on the winds of the user,
on the winds of, of, of even um,the device itself, the device
itself, proves challenging.
So not only do you have to knowthat everything around you the

(24:26):
APIs, what you're saving thingsto, what you're sending things
out to is hostile and may failat any time.
You also have to know that whenit fails, you have to retry and
not only just blindly retry,because there's some
circumstances where retrying isnot good because your device is
on an airplane.
Why retry your networkconnection when you know you're
not going to have it.
So it's building out theseassurances at each gateway of

(24:46):
capture, of persistence, ofsending, of validating that
you've sent data and havingknowledge that things are
completed to the next step.
Handoff is warm and not justdropped off and says yeah, see,
ya, I'm done.
So having knowledge of your own, I guess stability is important

(25:09):
and also doing that in aslittle effect on the app as
possible is equally important.
You mentioned where you do stuff.
Metrics is a very importantthing to have aggregation of
duration.
So OpenSylometry typically doesmetrics aggregation both on the

(25:33):
I guess the reporting happenson the client side and the.
The reporting happens on theclient side and the aggregation
happens on the server side.
So you report metrics, you cando some aggregation, not some
aggregation, some summation ofmetrics and then have it
reported and sent some metadatato the server and it'll kind of
do all the stuff on thecollector level.
That's useful because you'reexpecting a high output or high

(25:57):
throughput on the client, so youdon't want it to do any sort of
aggregation on the client sideor any heavy aggregation.
So Open Telemetry for the mostpart says aggregation is done on
the server side or on the otherend of the telemetry emitter.
Well, on the client, we're notdoing a ton of repetitive things
and if there's aggregation tobe done, it's usually through

(26:19):
multiple instances of the app,different launching.
So I want to measure my network.
You know sorry, not network,but my startup performance
happens once every time an appstarts up.
So if you're aggregatinglocally, it's not that much data
and it doesn't take that long.
Reading locally it's not thatmuch data and it doesn't take

(26:39):
that long and in fact it'll bemuch easier if you did this
locally, as it is if you did iton the server side.
So for that open telemetry,unfortunately the metrics
doesn't work super well if youhave high cardinality dimensions
.
So we have to kind of workaround it a little bit in terms
of what we do.
So changing from a backendexecution environment to a

(26:59):
frontend execution environmentrequires some rethinking of
these basics.
Simply exporting the data assoon as the talent is recorded,
assuming that it'll almostalways get to the other side.
You can't ever assume that.
In fact, you could even likestart that request and have it
fail in the middle of it.
Even how do we update and swapdata on the client side?

(27:22):
We have to be very careful.
We don't want to blow upexisting payloads that are
perfectly good if we have abetter one.
So managing each of these keysteps is super important for
mobile developers to be aware ofwhen using open telemetry,
which is why you would tend towant to use an SDK that has it

(27:43):
built in, so you don't have todo things like that by yourself
or do it in an ad hoc way.
What have you thought about whenfolks background and foreground
very quickly and you knowsessions get created, or when
things are terminated by the appbecause you're in the
background now and Android, onceyour app is in the background,

(28:06):
can kill your app at any timewithout telling you.
So are you saving data in a waythat you know you don't do too
much as to drain battery but youalso don't lose data because
you're not caching things?
So it's a tricky trade-off,certainly to balance.

William (28:27):
Yeah.
So I guess getting into thereally open telemetry at this
point, you said a few thingsthere that I thought were really
um, interesting and I guessbefore I get, before we go
deeper, I guess you you sort ofuh inclined on, like not
reinventing the wheel, um, as itwere, not you know, repeating

(28:49):
yourself and and so like I guessmy question is like where does
the so?
Open telemetry is a reallyawesome project.
I follow the chats and Slackand things at the CNCF.
It's just a really bubblingcommunity, a lot of good work
going on.
But is is this how you thinkabout OpenTelemetry as a product

(29:11):
?
Is it like a baseline in whicheverybody's sort of starting out
at the same place so we're nothaving to go back and redo these
, the core things?
We want to keep the core andthe foundational things steady
with a bigger community owned bya foundation.
We want those things to alwaysbe a level playing field and
then we're coming on top of thaton that foundation and we're

(29:32):
building our products.
Is that sort of how that'slooked at?

Hanson (29:35):
Yeah, the last thing the software world needs is another
standard on that foundation andwe're building our products.
Is that sort of how that'slooked at?
Yeah, the last thing thesoftware world needs is is
another standard, but we do haveto have one, and open telemetry
is the one.
It may not be perfect.
There may be things that thatthat are not suitable for mobile
and things like that, but youknow, you go.
You go to the war with the army.

(29:56):
You have a less aggressivemetaphor.
You use the computer.
You have to code the thing.
You know what I mean.
Opencellometry is very good.
It is definitely good enough.
It also has a very passionatecommunity looking to make things
better.
So when I step in with thesemobile problems, it's not like

(30:17):
oh, this is not how we do thingsin open telemetry.
It is, please tell me more.
What can the protocol do tohelp you achieve what you need
to achieve?
So building on top of that issomething that we want to not
only do, but to help betterstandardize some of the
telemetry that's on the mobileand have it be the lingua franca

(30:38):
of observability and be able toconnect mobile data with
backend data.
I mean, sres have tons and tonsof data maybe too much that are
logged in mobile telemetry,like spans and logs, to be able
to join the context with thedevice that triggered a

(30:58):
particular API call, thattriggered all your distributed
tracing.
That's pretty useful,especially if we provide context
that is not easily derived fromsimply the backend metrics or
the backend metadata.
Everything about the client,everything about the payload
that gets sent, like, certainlyfor telemetry persons, you're

(31:19):
not going to crack open thepayload and say what does it
include, you know, and thenbasically add all this context,
um on the client, well, it's allthere.
All we have to do is not all wehave to do, but we can annotate
it.
And and so you know that whatgenerated a particular uh uh
trace that seems anomalous hasthese characteristics.

(31:39):
It's generated from this iOSversion, it's generated with
this payload that contains datafrom certain things.
One interesting story back inTwitter days that this kind of
like would illustrate is wefound an issue with a certain

(32:01):
crash in a certain device typeand we're like what is happening
here?
This device is no different thananything else and we're able to
basically use the telemetry tosay, oh, this is all happening
in Japan, this is all happeningin a particular time window.
So we know the time window part.
So that was like how canclients do this?

(32:22):
We found out it was Japan, wefound out it was a particular
device model and we found out itwas because that device model
had a pre-existing version of anapp that was installed that had
these weird characteristics.
So we're able to trace thatback to the origin of the bug,
simply because we have all thismetadata that you certainly

(32:43):
couldn't have found with justback-end data, even though the
back-end did tell us, oh yeah,our SLO for this particular
thing has been violated.
But we needed the client datain order to actually find the
context, and this is whatOpenSale Energy gives us
additional context for thebackend issues.

William (33:02):
That's awesome and you kind of bring up a good so
differentiating between backendand mobile observability and
sort of where they solvedifferent problems at different
times.
They're definitely different.
And something that'sinteresting is you really don't
hear I hear a ton about back endobservability all the time, all
the time, every, everything,linkedin, just everywhere.

(33:23):
I rarely and maybe this isgoing to change, maybe it's on
the swing and it's going tochange but I rarely hear about
mobile observability.
Is that just because it'semerging and it's transforming
now, or does it just not get thelove that it needs?
What do you think about that?

Hanson (33:43):
I think mobile developers up until now, have
had enough issues to handlewithout looking for new issues.
So I was talking about crashesbefore and A&Rs those keep
mobile developers busy, on topof adding features that are
requested, supporting newplatforms, new app versions or
new OS versions, new SDKversions, things like that.

(34:05):
So it hasn't been until nowwhere there are companies that
are looking to do better, andmost of the time they're the
bigger ones.
If you look at Twitter, if youlook at Facebook time, they're
the bigger ones.
If you look at Twitter,facebook, netflix, they're going
to have proprietary systems tomeasure workflow, to measure how
long things take, but you needteams that are of massive sizes

(34:27):
in order to have people who arespecialized in mobile
performance.
I think as platforms settle andthings get better, people are
going to understand thatperformance is important and
they want to measure it.
And it's even more importantthat when your backend SRE tells
me oh, you're violating SLOs,but it's not really sensitive to

(34:49):
customer conversion, well, why?
Well, it's because you'remissing a whole chunk of your
workflow and before your server,your backend SREs can even
detect if there's a problem.
Your client, your mobile app,has to make that request and
that's not a given.
So not having data about what'shappening on the clients and on

(35:13):
the apps means you're you'rebasically erasing a whole set of
problems.
Well, what if my networkrequests never made it off the
device because there'scongestion thundering herd
startup.
I'm making 20 network requestsat the same time.
You never get to the theimportant one because you
haven't prioritized it or youcould have deferred a bunch of
this stuff.
You don't know until you knowwhat's happening on the client.

(35:33):
And and as you don't know untilyou know what's happening on the
client and as platforms andthings like OpenTelegram should
get more standardized, itbecomes easier to do Because you

(35:55):
know, as I said on mobile,there's no way to do this
without having you know customSDK code and custom backend code
to process, visualize and, youknow, give you meaningful
information from the data.
Until that process is easieruntil it's easier for me to just
go sign up for a service andhave this data appear in a
dashboard and haveversion-to-version diffs and
have alerts built when things gobadly.
You're not going to be able tohave anybody talking about it
because no one's using it,because the cost of entry, the
barriers of entry are too high.

(36:16):
But OpenTelemetry and folkslike us and folks like the
OpenTelemetry community arebeginning to break this down.
So I'm hoping that if I talk toyou in a year or two years,
measuring mobile performance isgoing to be a lot more common,
because right now, other thanapp startup, you're not going to

(36:36):
have a lot of people talk abouthow long things took to do on a
mobile app.

William (36:41):
Yeah, yeah, yeah.
Those are all great points and,oh yeah, something that popped
in my head earlier.
I just I have to ask, I'm justcurious but you're building new
software for a startup or you'rebuilding something fresh?
You're building new softwarefor a startup or you're building
something fresh, and of courseyou have all these great ideas
and you can't do it all at once.

(37:01):
Usually you have like an MVPand you're starting off slow.
You have something that you'retaking to market and you're
slowly iterating and slowlyadding on your modular.
Maybe you have multiple microsor whatever that you have that
are being worked on in tandem,multiple micros or whatever that
you have that are being workedon in tandem.
But you know, observability isjust, it's a really important
thing because you can'tunderstand what you don't

(37:22):
measure and you know you want tounderstand what's wrong in the
performance and all thesedifferent things.
But as far as, like, thedevelopment lifecycle process,
when, when do you really startbecoming concerned and embracing
observability?
Like, is that really reallyearly on, like before you're,
you know, when you're writingyour initial tests and stuff, or

(37:44):
is it something that and maybenot?
I know there's best practices.
We all wish we could dosomething one way and that is
the perfect way.
But like we're talking realityhere as well.
When, when is that?
And part of me thinks it's okay.
We look at visibility typestuff only when we start running
into problems, and that's ahorrible way to to do things.

(38:07):
But a lot, a lot of times it'sjust simply reality.
But what are your thoughtsthere?

Hanson (38:11):
Yeah, so observability is not just monitoring.
Observability is a practice.
You have to build your softwarewith that in mind.
Now it's tricky to sometimesproperly do.
The instrumentation, especiallywhen you have the features to
write, is something you can justdrop in and set it and forget

(38:35):
it.
So with open telemetry with alot of back-end services, you
could set it and forget it.
You include the package, youturn on the tracing, you know
with the configuration, you knowby environment variables or
some YAML file or something likethat, and your commonly used
library will emit telemetry.

(38:57):
Now you have to set things upon the server side to the other
side to receive the telemetryand process it, and you also
have to have a dashboard and avendor maybe to process that and
send you alerts.
So the good thing is on theback end there's a very
well-developed ecosystem forthat.
You drop it open telemetry, yousign up for grafana and then

(39:21):
you get all your metrics and youget your metrics, you get your
traces and all that stuff.
Fantastic.
We need something analogous tothat in mobile um.
So you know to to plug embracewe effectively is a solution
like that drop, drop in our SDKand we'll record all the
relevant data for you as metricsor, sorry, as telemetry.

(39:42):
And you can use our SDK tobasically create custom traces
for your workflows and you cancome to our dashboard to see the
data.
Data gets exported asOpenTelemetry directly on the
client to your server, to yourcollectors, so you can actually
you know parse data yourself.
You can also use the OpenTelEntry Android SDK.

(40:03):
It does something similar.
You know, nothing is there onthe other end to capture the
data, process it and visualizeit.
You have to buy Kfana orsomething like that.
So it's not a one-stop shop,but it certainly works.
And in fact, actually Embrace.
You can use our SDK withoutbeing an Embrace customer.
It's open source.
You just go online I'm sure wecan make the link available, you

(40:27):
know, in the show notes orsomething like that Drop the
Embrace SDK into your app andstart seeing the open-source
data getting forwarded to yoursite.
So making it easy is going tomake it a lot.
So basically making it so thatall you have to do is think

(40:47):
about needing observability,needing telemetry, and just have
to hook up an SDK or two inorder to get this data, that
will make things a lot easier.
Just like people do with crashmonitoring drop in Crashlytics,
drop in whatever it is andyou'll get your data and people

(41:07):
will do that.
If it's as easy as you say itis and if it doesn't impact your
app negatively, like sometimeson SDKs, you could do too much
to gather telemetry, you'reactually reducing the
performance of your app, so makesure that whatever SDK you drop
in doesn't do that.
Then at worst you have datathat you're not looking at.

(41:29):
But at best you can have datathat you can look at to diagnose
problems when they see it orcatch problems before you know
your customers or your users youknow notice.

William (41:42):
That's such a good point to make it easier on the
developers.
I mean, it kind of reminds meof what's.
You know security is such a hotbutton item everywhere right
now.
But it's like for you know,security is broad button item
everywhere right now, but it'slike for you know, security's
broad.
So every layer of everyapplication, every service,
every piece of infrastructure,you know security's like

(42:03):
everywhere.
But some of the time security todo it the right way is like so
hard, it's painstakinglydifficult, and that's one of the
things I'm seeing in thesecurity space right now.
Security engineering is tryingto make it easier for folks to
adopt these things.
Kind of just exactly whatyou're saying, but just

(42:25):
applicable to securityinfrastructure and those
different areas.
But it's hard because ifsomething that you're trying to
do is just massively painful todo and you practically need to
hire a whole team to manage it,you know that just it makes that
sort of throws a wrench in your, your bicycle tires.
You know and you go flying.
So that's a great, great callout and a great point there.

Hanson (42:47):
You you want security by default.
You want, you know.
You don't want to have to thinkabout security after you
release your app.
It's like oh, how do I makethings secure?
You want to have that rightbuilt right in the beginning.
Similar you want observabilityby default.
You don't want it's like oh,how do I make things secure?
You want to have that builtright in the beginning.
Similar you want observabilityby default.
You don't want to have to thinkabout what tolerance you do
need afterwards.
You want to drop that in andhave whatever it is.
Do the important things for you.

(43:09):
So what you have is you can addon top of that, but the basics
you get for free or you get forlittle to no effort on your part
, and that's very important fora developer with deadlines are
coming fast and furiously.

William (43:24):
Yeah, that's such a good point because right now we
live in a day where there's alot of software as a service and
if you don't have securitybuilt in or you're really slim
on the security side, peoplenotice pretty quick, whereas
back in the day, when you'rebuilding stuff in data centers,
things are a little bitdifferent and you could kind of
skimp on some things.
Maybe you know, but noweverything's sort of front and

(43:46):
center and then there iscompetition that you know, that
can eat your lunch, that do havea lot of those things built in.
So, yeah, really really goodpoints.
This is awesome.
I love what you said earlier.
Like if you come back and wetalk in like a year, you know
just kind of evaluating likewhere things are, and if you
know what what has sort ofchanged.
And I would love to take you upon that if you would be willing

(44:08):
like maybe revisiting this.
You know some, you know thistime next year for instance, not
for sure.

Hanson (44:38):
Not for sure, I mean if Embrace does its job, if it.
But open telemetry as astandard is relatively new and
even you know the in the timethat I've been, you know, uh,
involved with it a year or so,it's grown leaps and bounds.
So hopefully in a year it'sgoing to be uh, uh.
You know people wouldn't thinkabout open telemetry without
thinking mobile or think aboutmobile telemetry without open
telemetry, so yeah, I didn'tknow it was.

William (45:01):
Yeah, I guess it is that recent.
You know it's.
It's pretty awesome seeing that, like companies like embrace
and other companies that areusing it, something that is so
new.
It really shows the value ofcommunity and foundations and
sort of the work that the cncfis doing and just folks coming
together and actually wanting tobecause it's a.
It's a downstream and anupstream effect really, because

(45:23):
if you don't have some of thesefoundations really hammered out,
like, of course, it impacts thecompanies that are building
stuff, but it also impacts theusers that are consuming these
mobile apps and they're going tohave more problems.
They're going to have moreproblems, you know.
So this is kind of a reallygood example of the, the
community, the vendor space andyou know just some different
areas, sort of getting the racktogether and coming together and

(45:43):
doing something good, you know,for the.

Hanson (45:45):
You know the greater good, I guess I think the entire
industry is tired of de factostandards.
You know, you know, internetexplorer the de facto standards,
things are oh, 95 of peopleusing it's the de facto standard
.
No, no.
I want actual standards thatare vendor neutral, that you're
not locked in.
The de facto standards Thingsare oh, 95% of people are using
it, it's a de facto standard.
No, no, no.
We want actual standards thatare vendor neutral, that you're
not locked in.
Opensource Mentor is aboutvendor neutrality and it's funny

(46:06):
it's coming from a vendorsaying that.
But we totally believe in yourdata being your data and we
record things in a vendorneutral way.
Eventually, so that you couldtake this and not have to
rebuild your infrastructure.
We want you to use us becausewe provide service on top of
that.
That is good.
But the SDK itself you shouldbe able to use us without having

(46:28):
to worry about being lockedinto Embrace, because without a
healthy ecosystem to push eachother, you're going to atrophy.
You're going to have thesevendors or a particular vendor
being very large and exertingtheir market powers into pricing
or whatever.
No, no.
We want this vendor neutral,portable, extensible and open.

(46:50):
Telemetry is a cornerstone, akey part of the strategy, so we
want to be part of the community.
We don't want to own mobiletelemetry.
It's not about that.

William (47:02):
I love that.
Yeah, that gives me hope fortechnology in the future.
You know that kind of attitudeand that kind of approach.
It's great, absolutely love it.
So where can the audience?
Are you active on any of thesocial platforms?
Where can the audience find you?

Hanson (47:17):
My blog is hansenwtf and you can link to all socials
there.
I still use Twitter for sportsthings.
Basically, I'm mostly onthreads where I'm just putting
whatever ridiculous things Iwant to post out there.
But, yeah, check out theEmbrace GitHub.
We have iOS SDK, android SDK,react Native SDK in order to do

(47:42):
this stuff.
But yeah, I'm on socials.
Hanson Ho is a very Google-ablename.
I'm not the architect fromSingapore, I'm from Vancouver,
canada.
If you Google Hanson Ho andwatch through it in Vancouver,
it's pretty idiosyncratic.

William (47:58):
Right on.
Yeah, I'll definitely linkthese in the show notes so folks
can find you easier and followif they would want to.
Yeah, and I just want to say Ireally appreciate the time.
This has been a fascinatingconversation.
This is such a just aninteresting and emerging area
and I do look forward to I'llhold you to it.
Let's have this conversation ina year.
Let's see where we're at.
Advertise With Us

Popular Podcasts

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.