Issue 2025-W31 Highlights

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:02):
Hello, friends. We are back with episode 209
of the Our Weekly Highlights podcast, and this is the mostly weekly show where we talk about the latest happenings and highlights that have been shared
on this week's our weekly issue.
My name is Eric Nantz, and I'm glad you're joining us from wherever you are around the world and being patient with us as always as yours truly has a hectic schedule these days as summer starts to wind down.

(00:26):
But as always, I am joined by my awesome co host who never has a dull moment ever, Mike Thomas. Mike, how are you doing today?
Doing well, Eric. We just rolled out a new
shiny apps portal
for a bunch of our clients. So we are shiny everywhere right now, which is very exciting.
I'm always here for it, and I saw your your exciting press release earlier this week. So really congratulations

(00:50):
to you and your
your very,
talented team that hopefully I I'm not I'm gonna be seeing you obviously in in POSITCONF next month and hopefully maybe see some of your teammates again too. We'll see. But, nonetheless,
really excited to to talk shop once again when we get there. I can't wait.
Yes. I've already started the speaker coaching session, so it's that time. Gotta get that talk squared away, so I got some fun stuff to share too. But, we got some fun stuff to share today.

(01:18):
Speaking of sharing,
yours truly will be a little vulnerable here for a second. Usually, as part of my,
release process for each episode, I try to
schedule the social media posts and, it was late at night.
Usually, when you do coding late at night or everything's late at night, something will happen.
Yours truly somehow on the transfer time in the future to next year because I had 2026

(01:44):
as the as the scheduled year. So, Mike, you call you definitely correctly said, where's the where's the post?
I never even bothered to check. Realize,
oh, jeez. They're stuck in the ether. So if you saw a note about the last episode a week later,
the problem was not in the machines.
It was what's sitting in the chair here. So you're welcome.

(02:04):
I thought it was gonna be a lost episode that, you know, would come out way later. But, nope. We caught it. No big deal. No harm. No foul. It happens. Yep. Yep. And sometimes you you need that that, extra hour sleep for you for you to send those out. So we probably will not have that happen this time around. You should see the notes the next day after I finish my fancy editing. So let's get the show on the road here. Our curator this week is Jonathan Caro. And as always, he had a tremendous help from our fellow Aroki team members and contributors like you all around the world with your great poll requests and other suggestions.

(02:39):
And, we've we covered this topic a bit before, but a lot of times when I develop either packages or applications
at the day job,
There is always the need, and I do mean the need for speed as a certain actor said in one of his racing movies. And sometimes that
becomes obvious when, say, you've maybe built that great shiny app and then it's doing some crunching behind the scenes before you get that plot shown or that nice data table shown.

(03:05):
You might be nice and at least put, like, a loading spinner in front of it, but then you may have a customer that says,
what on earth is taking so long? Then you're thinking, okay. What is it really?
Is it the business logic? Is it the UI element? What exactly is it? And a lot of times,
at least in my practice, it does become part of the business logic problem, which means you're back to regular our code

(03:30):
to try to figure out where this issue lies.
So in light of that, our first highlight we're covering today is some really great notes
from Kelly Baldwin, who has been a frequent contributor to highlights in the past.
And she has this great new blog post on, I believe, one of her, newer blog actually is called particles.
Great name there. And the article is entitled speed testing and three different levels that you can take

(03:55):
with this with your r coding.
And this was inspired by a recent,
workshop that
she jointly gave with Tyson Barrett,
from a USCOTS,
organization
workshop, and
this is inspired by a lot of material from there. And so to motivate the problem,

(04:16):
she makes,
a little bit of a pipeline here of doing some various dplyr operations
with
an inflated,
New York City, flights data set. In fact, she basically are binded rows, you know, multiple times to get a much bigger set.
And, spoiler alert, I mean, you must use
a little l m to get this example going quickly and that's, you know, very relatable.

(04:40):
Group by processing,
summarization,
doing a little additional variable generation for
different lengths or transformations
and doing some filtering afterwards,
and some sorting. This is all very
relatable
data pipelines.
And so that might might take a while to because if it is a larger dataset. So

(05:02):
before you can really do a lot of testing that's recommended here or, benchmarking if you will,
it is helpful to put all this in a function for convenience.
So she ends up
doing that with an appropriate name, do the thing function. And, I love these I love these kind of names because sometimes in my mood, I might name a function very random words as well.

(05:23):
First level is
basically the time itself. That's often where we start to see there might be an issue, so that might be your first pass, if you will, of just how long is something taking.
There is a recommended approach, but if you're essentially using r for a while, you may be aware of the sys dot time function where it literally just gives you the the actual time when you run that function.

(05:48):
And you could save that as a variable,
run your long running function,
run that sys time again, but then do arithmetic and minus what you recorded as a start time.
That is fine ish, but
she correctly points out
you're getting an object back called the diff time object. And it's not really user friendly from what she wants to do, which is more arithmetic

(06:13):
around these operations. So there are some better functions
to use in her opinion,
which are both in base r called proc dot time
and system dot time. So if you just swap out sys dot time or proc dot time,
you'll get now an actual proc dot time object, which is a named vector,

(06:34):
which is great because you also get the elapsed time right off the bat after you do your,
arithmetic of the start time minus the the end time. So that's a little more relatable.
And, again, this is human type of seconds that are outlined here.
And then you may be familiar in the r community, some of the wraps kind of this product dot time stuff, the tick tock the tick package or and the tick tock function

(07:01):
to be exact. So you can
basically flag it as tick, do your function, then do run another function called talk, and immediately you'll get the time elapsed right off the bat in the console. This is great for Marto's
interactive,
type of expirations at the time.
And it might lead to more, you know, more investigation depending on that complexity of the function. But you can do some pretty interesting things with the result of that object too if if you're daring to do so,

(07:31):
but that may not be quite enough.
So you may have found the slowdown. Now you wanna know, well,
what can you really do to interrogate this instead and maybe look at alternative solutions
to that alternative,
code.
That's where benchmarking comes in. There are a few different ways you can do this too.

(07:53):
First of which is may first of all, having that alternative function to have something to compare it against.
And because Kelly's been involved with a lot of the DT package,
might say a reinvigoration
or at least trying to bring that more to the mainstream
of the art community, she makes a handy function called do the thing DT where she basically takes the

(08:13):
d t plier package,
compares that existing data set to a d t object with lazy underscore d t, does the same d plier operations, and then, you know, returns out. So that's a nice little convenient thing if you wanna still keep your dplyr syntax,
but take advantage of data dot table functions.
So with that, there is a a package called micro benchmark

(08:37):
where you can run
iterations
of these same functions multiple times because if you run the function once,
depending on what your system is actually doing, you might get different results the next time you run it. Right? Because maybe your your browser is going crazy on you. Maybe you got another, you know, data hungry
process to do and stuff. So having

(08:59):
having Microbenchmark
do a a set, like, a controlled experiment, if you will, running each of these things, say, say, a 100 times,
you might get then the average, you know, the mean or the median of these times and resources
used. So that's great. Now keep in mind, it's gonna run the code a lot of times, so I might have to sit back and do something else while it's crunching through that.

(09:22):
And that can show, in this case, that the d t data dot table version of her function here
reduced the evaluation
time quite a bit. So that was pretty neat.
But you may wanna see, is that really gonna generalize the even bigger datasets?
So you might wanna make different versions of your inputs. And then that's where you might wanna opt into the bench package,

(09:46):
which has I think there's an offer by Jim Hester, formerly a posit. Now I believe at another company.
And this is where you can do the mark function from bench
and use
this these set of functions, but also
maybe a different input or different inputs throughout this process.
And there is,

(10:07):
again,
a way to do this with the mark function, and you'll get
similar elapsed times as you would for the other,
micro benchmark function.
But this only runs something once when you do that mark function. It might be just helpful for
prototyping, but the reason she wanted to do this says you get the memory

(10:27):
results, not just the time results.
So that might lead to, oh, wait a second. The DT version is using roughly half the memory
as the default dplyr version.
So is that going to generalize
to larger datasets? And that's where the press function comes in,
where you might, in this case, say, dupe you know, have these different scenarios

(10:51):
of different sizes of the data
and still use the mark function under the hood to do it, but then you're basically making these different versions of the flight's dataset.
And then that's where
she could see that as the data got larger, this advantage of data dot table
versus dplyr wasn't really growing at the same growth, as the dataset got larger.

(11:14):
So
it's not quite as fast in the larger datasets as it was on the slightly smaller set. So that's where
looking at the bent the bench package gives you some additional insights and little ways to control your experiments even more.
Last but certainly not least, and probably the most complicated of this is that you wanna really figure out in a fine tune way

(11:38):
where these potential bottlenecks are actually happening instead of just bigger wrapper function.
Where is this actually happening? And that's where profiling comes in.
Something I need to do more of, but also become more comfortable with the with the workflow here.
You got a few options here too. You got the provis package,

(11:58):
offered by Winston Chang over at Posit. This has been great for both traditional r code as well as shiny code for that matter. And you can feed in, in this case,
kind of the body of this function that Kelly's made here, and you get the nice kind of flame graph that's interactive, and you can start to see
that most of the time being spent

(12:20):
was on the filter statement. That often is a a bottleneck in traditional
dplyr pipelines.
But it's great that you can at least explore this interactively even in the blog post itself, toggle between the flame graph
and the actual data that shows, like, the steps in a in a classable list.
Really nice to see professor is being used here.

(12:44):
You can go even more lower level than this with the r professor
function and base r.
In order to do that, you still need to save your results out,
when you run r professor,
and then,
you'll have to do you know, create
a helper function of sorts or use a summary rprof,

(13:06):
function to do that as she has in the blog post where you can see
roughly similar output as prof is. But that may be helpful if you wanna stick to
stick to that native r prof structure.
There are some additional packages in this space and, in fact, profer,
offered by my colleague, Will Landau, was a great,
great, package in this space. So check that out if you're interested in. He saw some shortcomings and prop this, so he thought he wanted to take it to another level where proffer and if you know Will well enough, you know that when he gets his hands on some, he will make a nice interface out of it. Kelly concludes the post with some other additional thoughts about

(13:43):
maybe in the future,
profiling can be easier with maybe a pipeline kind of syntax.
She shows some kind of pseudo code that she does to illustrate this,
in a more manual way.
Maybe someone will take on that challenge. Who knows?
But there I hope this post,
illustrates to all of you. You've got different levels, if you will, of doing this kind of benchmarking or in speed testing exercise.

(14:10):
Sometimes just the time of last might be good enough. Other times, you gotta go a little further, and that's when you gotta put your, you know, design of experiments type hat on with the bench package and micro benchmark,
and then really start to get lower level if you need to with the profiling tools.
They can be quite rabbit holes. So you gotta, you know, right size this to what you need for your given project. There may be cases where you just can't do anything anymore and you gotta, you know, communicate that to your users.

(14:37):
But you might see some significant gains of how that data is being stored or the types of processing you're doing. Things like data dot table are definitely a big help, but there may be more you can do. So many things like DuckDB and Duckplier have my attention to make data operations even more faster, than potentially this. So,
again, great if you're new to this space and, again, some good nuggets to dive deep further if you want to, you know, go down the rabbit hole, as I said, in each of these different

(15:06):
levels. So as Kelly always does, she teaches me something new every time she writes a great post here. So, Mike, what did you learn from all this?
Yes. Kelly Baldwin, former Shiny in production workshop attendee,
asked great questions, asked me questions Alright. I couldn't answer,
because I think she's smarter than
me, which doesn't take much. But the the blog mentions that, Chad JPT wrote the data wrangling pipeline code, and I can't help but wonder, did Kelly do some vibe coding here?

(15:36):
I guess that's sort of the big thing these days. I have not What is that to use it for you? Yep. Yep. But yep. Throw that out there. I am really guilty of using sys dot time, which is, as Kelly points out, is a lot more fragile than some of the other options that we have, even just in base r, like proc dot time and system dot time. Or, you know, that TikTok package is one that's been around for a while. It's just, you know, it feels like one of those tidy packages that just makes the output a lot neater and user friendly

(16:03):
for us when when we're trying to do these types of exercises.
And there's some great use cases, and I never really thought about all of the nuances where you may want to employ, you know, profiling, benchmarking,
you know,
comparisons of the time it might take to run your pipeline.
But as she mentions and you mentioned Eric, if you're looking to do experimentation and run your code a ton of times,

(16:27):
because we know it won't take the exact same amount of time each time. Right? It could just be millisecond
differences or, it could be longer depending on what you're trying to do.
You can see a distribution of those run times with that micro benchmark package, which can really be your friend. I think by default, it runs your code like a 100 times and shows you, summary statistics on how long it took across that distribution. And if you wanna compare trying to do the same thing a couple of different ways. You know, one example I could think of is if you have a dplyr pipeline and a data table, data dot table pipeline that are trying to accomplish the same thing, you can use the bench package, which can be be your friend in those situations. And

(17:07):
there's always something for me just kind of oddly satisfying when I see these clever name spaced functions,
like bench colon colon mark or bench colon colon press. There's a few other packages out there that, you know, I think are are clever in how they name both the package and the functions within that package that the the package

(17:28):
and the name space version of the function,
is I don't know. It it's just one of those things that makes me smile. And, you know, speaking of that twenty twenty three Shiny in production workshop that your favorite podcast host taught,
Profis,
as you mentioned, Eric provides some really powerful visuals and graphics
about the time it took to run each part of your pipeline. And it takes some education to figure out how to interpret the output of Profis.

(17:55):
Once you do figure that out, it's it's really powerful.
But as you mentioned, I wonder if we could make this prettier,
or more interpretable in the future. If there's a way to kind of clean up what Profiz returns and and just make it, I don't know, feel like it's it's in the year 2025 in a way. I feel like that's gotta be possible.
I'm kind of making my own call to action to maybe step into that package and and see if I can submit a pull request to clean it up. But that's probably some serious code and probably a little bit above my pay grade, but we'll see. Just throwing that out there into the world.

(18:29):
And and it may not be important to you in a lot of,
situations that you're in to get your code down from maybe like a two minutes run time to one minute.
But if you have,
for instance, like a cron job running every day on on cloud compute, it can actually save you a lot of money in the long run.

(18:51):
And it can just be, you know, these things add up over time. So taking
the necessary steps and just being thoughtful
about
how long your code is running for,
where the bottlenecks are, and just trying to write the most, you know, concise but legible code. That's something that we really try to practice

(19:11):
day in and day out on my team.
And and it can be really important in the long run if you look at the big picture. But great blog post from from Kelly, and it's a a great reminder on a subject that I don't think we talk about enough.
Yeah. And a lot of times, maybe in a more interactive data analysis, you're certainly not really thinking of these of these issues. This is when you get to those ETL kind of processes

(19:34):
or anything that involves heavy business logic and your interfaces, whether it's packages
or apps or definitely a combination of both in my in my cases.
A lot of times, the biggest wins are when you find that even after all this, you still can't really do a whole lot
to minimize
the execution
time. If you do have an opportunity

(19:54):
to somehow cache these things or run them on a schedule and then say your app consumes them,
you know, on the on the fly, but that that heavy lifting's already been done, and the app is simply importing
them. This will,
free free consulting for those out there that may be struggling with this.
Packages like pins.

(20:15):
If you can do your data ETL stuff and then put it as a pin as your summary or transform set, have your app consume that instead of the app doing all that.
Oh goodness. They say the the time gains can be immense, and that was a recent project I'm
about to put in production in the coming month.
That's been our biggest win. Just cache all those results.

(20:38):
The app itself didn't need to do any of that stuff on the fly anyway.
And now I run a crown job that right now is every night or every morning, I should say. But I got the capability to run that even more frequently if I need to with with cloud architecture. So
it's,
little things like that, you kinda learn on the fly, but even, you know, before you get to that point, if you can speed things up, these are some great tips to do so.

(21:11):
And as I just mentioned, Mike, you know, trying to do things outside your app is a really helpful technique. And a lot of times being able to separate out
the way your package or your app is is calling an analysis, you know,
requirement or analysis function
that gives you more control on, you know, the rest of that process can be really helpful too.

(21:33):
You and I have both been knee deep in our recent projects and either building
or consuming or frankly, both building and consuming
custom APIs,
application programming interfaces. We haven't heard that term before
in our various utilities or or innovation capabilities we've been building here.
And in the world of the art community,

(21:56):
a great way to build your own API has been the plumber package.
That's had many years of history now, but, boy, it's been a lifesaver for me when I wanted to democratize,
if you will, the use of this analytical
pipeline or resource.
So so many from our I could call it, could call from Python, could call from anything that can do a curl request.

(22:18):
Plumber gives you that that opportunity.
So you may be building this great API, but then it comes to Sage where if you gotta put this in production, they're gonna ask you,
where's the test?
What do you wanna do for testing?
You could go down a lot of different directions for this, but this next highlight talks on some really great points for you to consider,

(22:40):
especially as you structure,
you know, your testing paradigm and some common pitfalls that you can avoid here.
And, returning the highlights once again comes to us from Jacob Sobolewski,
over at Absalon
who's been, as you may have heard in previous episodes, really active in the space of testing
on the shiny side as well as on the general r side. And this blog post he has here

(23:03):
is called testing your plumber APIs from r itself.
And so there is basically
first set of advice in this blog post that
I think can relate to us on many levels.
It's a given that your API probably is gonna do some kind of business logic type function,

(23:24):
wherever it's data processing, wherever it's munging, wherever it's just making new variables or whatnot.
And that doesn't necessarily
need to be tested
in the same way as the API interface
itself.
So his first piece of advice is called the separation of concerns
for that business logic stuff.

(23:45):
Put that in its own function
and use what's familiar to many, those in the r community
of the test that package.
Use that framework
that's been well established.
Does that have anything to do with the API interface at that point?
Get all that stuff ironed out first in the more traditional testing paradigm.

(24:06):
Same thing applies to Shiny itself for that matter. Get your business logic in the functions,
test that with test that, your life will be a lot easier.
That gives you the ability to tailor the API
kinda testing, which he calls API contract testing.
Not so much focus on the exact response a year from the API because that's gonna be handled from the business logic stuff. This is more about

(24:31):
the what he calls the shape of the response.
Is it the type of structure, say, a list structure? Maybe it's a data frame structure.
And so having these two different layer approaches,
you got the business logic layer and now this API contract layer,
to really compartmentalize
these two perspectives.

(24:53):
So you may be thinking, yeah. I got I I can I can handle to test that framework now? Well, how do I get started with this kind of API contract
testing layer?
The first step is just getting the API running of itself. So as many things of our, by default, things run-in the foreground when you run, like, a long running function. Heck, when we run a Shiny app interactively, that's in a foreground process most of the time. So you wanna run this API, this Plumber API,

(25:21):
process in the background.
And that's where, this is where I started to learn some interesting use cases.
The Mirai package, author by Charlie Gao, who is now a solution engineer at Posite.
This package
is great for launching a plumber API
as a very straightforward black backward process.

(25:41):
I remember in the past and the plumber documentation, they would say, okay. Spin up another r process,
run this plumber process on the fly manually,
go back to your foreground process, and then run your, like, curl calls or whatever else to
to interactively spot check the the API you're serving.
So this is a great way for me or I just to get that thing running in the background and then get on with your

(26:05):
actual testing paradigm.
Next, he recommends a new pattern or at least new to me the way he articulates it
for the the way you test things. So you first got the arrange part of this,
getting your API as a background process,
get your test data ready to go,
and then setting up whatever you need from a configuration or authentication standpoint.

(26:29):
That's your setup time, if you will.
Then once you got all that boilerplate set up, the background process, the authentication,
now it's time to actually do the thing as I say, and make the actual request to your API
using a package like h t t r two, hit or two, however you wanna pronounce it.
This is great. That is only just

(26:52):
running the action that you wanna test with the required parameters in your API.
Last but not least, you've got the assert phase. And this is where you're gonna look at
making sure that
you're getting the expected types of results.
And again, maybe not the result itself, but the shape of it, the type of response codes, the status codes you might say. Is it a 200? Is it a 400? Is it a 500?

(27:16):
Is it, and again, the structure,
making sure it's fitting the structure, not necessarily the in-depth
content
of the response itself.
So though that in and of itself, that approach really resonates with me.
And then also looking at,
how you structure all this, he's gotta recommend a naming pattern of, like, test dash API

(27:40):
dash
endpoint,
you know, whatever that endpoint ends up being
for the individual end points, but keeping those in isolation from each other instead of doing one script that does all the end points
at once. So it's a great way to focus yourself well on that particular end point
and then running the emo, the associated, maybe business logic test alongside with that, but in that layered approach.

(28:03):
And then we can make the bugging easier. It can make setup easier and a lot, you know, a lot of things like that.
And then, you know, he's got some other, helpful information about, again, the idea of testing the shape of the response,
not the content itself.
You may not necessarily care what the, like, a slot called ID or name, what those actually are, but what are the attributes of that? What is the logic in that? Use the business logic for the actual values.

(28:32):
And then lastly, he concludes of some pitfalls to avoid,
that can be difficult to avoid if you're new to this. Trust me. I know.
Is making sure you don't necessarily duplicate the business logic type of test in your API test. Like, those should be
kinda self contained, their own function, their own tests, and you just simply call them

(28:53):
as needed.
And then,
really, hopefully, you're not so much testing the implementation
detail itself.
It's more about the API endpoint. They it should not matter
necessarily if you're using hit or two to do it. If you're using a straight system curl command to do it, that should not matter. An API calls an API call one way or another.

(29:17):
And then hopefully this is probably the hardest to avoid if you're in an organization.
Hopefully you don't have to depend on an external service to actually conduct the API testing.
But if you do use it only when you absolutely need to and maybe take advantage of things like mocking
or other sit you other helper packages
to kinda cache that what that

(29:40):
setup, you know,
external process looks like,
and then use that mock function in your testing instead of having to do that
repeated call to that external service to make that happen. So
lots of great advice. And then also in the supplements, I will put in the in the show notes here. He's got a handy set of examples and what he calls the r test gallery,

(30:02):
which I
never he's I think he's had this for a while. I just I wasn't aware of it. And you can go straight to examples
for the plumber API test, so it opens this little editor
right under the blog post where you can kind of look at this almost like an iframe thing, I guess, through GitHub or something. And you can literally see in this hypothetical example what he's doing to illustrate the concepts in this blog post. So really convenient way to kind of see how all this relates to each other.

(30:32):
I've been building a couple custom APIs at the day job, so I've learned a lot through this and I've been using some of the best practices already.
But certainly, I've I've learned a thing or two about this this recommended pattern of how to use,
this arrange act and assert pattern
when we think about constructing these API level tests. So

(30:52):
great blog post here by Jacob as always, and he's quickly establishing himself
as a real key thought leader in this space. I'm I'm gonna be reading this definitely again after this show. So
hopefully you learned a thing or two, Mike, from this. I know you've been knee deep in API development lately. What did you think of this? I did, but Jacob always delivers seems like a whole lot of blog posts lately, really interesting content,

(31:18):
particularly around
testing. And I couldn't agree more with the beginning of this blog post. You've got to separate your business logic from the service itself, and I'd preach this if it's a Plumber API
or a Shiny app. I think it makes sense to do so in pretty much all cases.
One thing that I was reminded of as we started reading through this blog post is

(31:39):
there's a Plumber two potentially coming
That is a rewrite of plumber. Right?
Yes. That is coming. I mean, Thomas Lynn Peterson sent a note about that a couple months ago. Yeah. Yeah. So I see that out there. It definitely it has not hit crayon yet. It has no tags yet. I haven't taken a look at when the most recent commit was made. Oh, last week. So it's Okay. Looks like it's still under active development, and I wonder if some of these concepts that Jacob's talking about in his blog post

(32:09):
may be
introduced as features in the plumber two package as well. We will have to see. I haven't looked under the hood of the plumber two code enough yet, but just some food for thought out there. And I love the arrange, act, assert pattern. Jacob mentions,
you know, writing unit tests for status codes,
can absolutely be your friend.

(32:30):
In my experience, it can also be your enemy if you're too generic.
Yes. Absolutely.
Just a little word to the wise out there for all those those three, four, 500,
status codes that could come back.
One flavor of a 400 status code could be quite different than a different flavor of a 400 status code. If you've been there, you know what I mean.

(32:51):
And also, in my experience, the with r package can really be your friend in unit testing for
creating, like, sort of an isolated environment that you can throw away when the unit testing is done or this environment that's, you know, sort of separate than the development environment that you're using for your package, to create sort of these really interesting temporary

(33:13):
environments. I think we've talked about a lot of use cases in the past where that may help. And Plumber or Mirai may already sort of offer that same functionality,
or they may import with R, in those packages. I wouldn't be surprised at all. I don't have a whole lot of experience with Mirai.
I should. So, I know that it's great for

(33:35):
spinning up multiple our processes, which can be great for parallel processing or concurrent requests. We don't have any APIs right now for our clients or internally that get hit really hard.
So
none of them we've found the need to allow for sort of concurrent processing yet, but
if everything goes well, we will one day. So,

(33:58):
and nan Nanotech seems very interesting for,
that exact topic. Right? Concurrent request handling.
And this all reminds me of,
Jacqueline and Heather Nolis' talk. I think it was called, you know, we put our in production.
And they literally had, like,
in a neural network, I think,
API that they had stood up with R

(34:22):
that was getting hit by the customer service team at T Mobile,
like,
a million times a month or something like that. It was an obscene number. I couldn't believe it. Yeah. And it was working great. So maybe I'll try to add those slides to the show notes because Yeah.
I think they had a website called it was literally just called Put R in Prod.

(34:44):
And we'll we'll try to link to that website, but it was one that when all the Language Wars stuff comes out about R slowness
once in a while, I always just fire off a link
to those slides in that website to prove that that is all bogus. But great blog post by Jacob. Lot of,
I think really interesting nuggets to take away

(35:05):
for your API testing
use cases.
Yep. Love loved every bit of this. And like I said, I've learned a lot along the way. And one thing I'll I'll plug in this space, I think, can be a great
accompanying to this is that,
there's a there's been a new release of a package in this space by Scott Chamberlain called VCR.

(35:26):
This is a great package that lets you record and replay these various HTTP requests
that you might,
that you might make if you're doing
an r package that services an API.
This is this is a great way to to test that. So I definitely recommend that approach maybe to augment with it. It's got a nice,

(35:46):
straightforward way to augment test that with this, functionality. So lots of great tools
in this space for in a space,
I should say, and great thanks to Jacob for continuing to pump the knowledge into our veins about doing this in an innovative way.

(36:15):
Alright. And then lastly, the round out the highlights today. Admittedly, this one's gonna be a little more difficult to talk about because it's actually a video tutorial,
but it's a space that we often touch on in this show about,
the clever uses of visualization and art to replicate
or perhaps even make better a lot of these fancy visualizations

(36:36):
you might see in, in, you know, in the in the reporting space.
And in this case, this high this last highlight
comes to us from, again, a video tutorial,
created by Spencer Shin,
who is a data and policy strategist
at the City Forward Collective over in Wisconsin.
And he saw

(36:56):
a
a a bar chart in the New York Times online blog.
Be warned if you do try to link to it or click the New York Times link, you might get nagged by making an account. So if you get that just like I did when I was preparing for this, we have a link in the show notes to the GitHub repo that Spencer created, for this code demonstration

(37:18):
that has the actual image in there for the plot.
But,
admittedly, the plot itself, the subject matter is a bit political, and I've had a hard enough week already, so I'm not gonna get political on you all. There are hundreds of other podcasts to listen to if you wanna get in on that train, but it is a bar chart about
the the new bill that's been proposed,
by the by the president

(37:41):
called the big beautiful bill,
and the way it could potentially
impact different,
income brackets
in the next, ten years or so. So
with that, it's pretty straightforward
bar chart with each, threshold
income is a separate bar,
and the bar goes down below the the x axis if it's a negative threshold

(38:04):
and goes up if it's a positive threshold or positive change, I should say,
in the resources. So,
in the video tutorial, I'll just kind of give a quick take on what Spencer did and some of the things I've learned from it.
There there was a hyperlink to the source data in the article, but, actually, the data is in the chart itself. It's annotated above each bar. So he made a simple tibble,

(38:28):
for each, income bracket,
the the average,
threshold change or of the income, so to speak. He just made a tibble out of it. You know, brute forced it, but, hey, it works.
And then after that, he
the the finished product on the GitHub repo looks great, but he didn't start there.
Started with a very basic ggplot,

(38:49):
very bare bones, no updates to theming, no updates to anything. So he he illustrates
what does he change along the way to start getting it to replicate the New York Times visualization.
He first,
again, made the the data generation a bit easier,
starts to think about, you know, getting the default theme out,

(39:11):
changing the colors of the bars using,
of all things, the developer console and the browser at the article
to pinpoint where the colors are actually used in that
online visualization,
which I'm sure it was using
some WebGL
or web framework to do it. But the CSS is all there. So you grab the colors,

(39:31):
start to update the theming of the of the of the code itself.
There is a lot of interesting things. He was using positron for the first time, so there are little gotchas of the editor that he he showed and
fast forwarded parts that he felt were a little little off, so to speak. But
it's a great
very much in the style of live coding.

(39:53):
He made this tutorial,
so I can relate to it. I've done a few of those in the past, so definitely fun to see somebody
flex their visualization knowledge.
But honestly,
it's native
g g plot with the scales package. That's it. Like, it did not take a huge
effort
other than using some custom fonts

(40:14):
to get this JGPOT version of the chart to look like the New York Times article. So if nothing else,
it's a great illustration with just, like, a handful of time about a half hour screencast.
He replicated this chart
all with what g g pot two offers. So lots of clever use of the theming,
the text representations,
and all that. So check out the show notes. You'll get a link to the code. But, yeah, have a look at the at the video if you wanna see how an expert does this in action. So pretty pretty fun stuff there.

(40:45):
I feel like data visualization
blogs and videos and things like that,
they
always
offer something that you hadn't thought about before. Right? Like, there's sort of an unlimited
amount of creativity
that can go into
the concept of data visualization. And this is, you know, a really cool exercise to go from scratch, something that he saw in the New York Times, to trying to build it yourself in Positron.

(41:10):
And as you mentioned,
love the live coding idea.
Eric, you that's a a road that you have gone down many times. I have never live coded before. I am way too scared, but I think it's incredibly
beneficial to
people who watch live coding exercises because they can see the little hiccups that you hit
and how you thought about,

(41:32):
debugging them, right, on the fly. And sometimes
someone else being able to just visualize that thought process can be more helpful than, you know, following this set of instructions
on how to get around it.
So absolutely. We are all we are all human beings, my friend. No matter what you may think when you see the finished quote, we are all humans, and how we get there is not always as smooth as experience.

(41:55):
Kudos to Spencer. Absolutely.
You know, one really
sort of interesting thing about this
particular chart that lends itself to
telling the story of how flexible g g plot is and how customizable it is, is the output plot here has sort of date data labels on both sides of the bars.

(42:16):
It has has data labels, you know, for the positive
bars. It has data labels above the bars. And then for the negative bars, it has data labels below the bars that represent sort of the value. Right? The height of each of those bars. But it also has these sort of annotation
data labels that
I believe, you know, represent like the the quartile or something like that of each particular bar that are on the opposite side of the value data label. So for the negative bars, it's it's on top, and for the positive bars, it's underneath it. So you would think that that's a whole lot of data labels going on on the same chart here, but it's actually really clean and

(42:56):
really informative.
I thought that it was pretty incredible that he was even even able to pull that off. And
if you take a look at the code, I think the final piece of code is, like, 38 lines of code to pull off this
New York Times bar chart with all the It's amazing amazingly
concise. Yeah. It's just amazing what you can do with JujubePot too.

(43:16):
I think that's sort of the long story short here. Amazing what you can see with g what you can do with ggplot two.
Love
the concept of bringing in
the exploration
of,
sort of the browser
dev tools. Right? Being able to take a look at the New York Times website itself and grabbing, you know, those exact CSS

(43:36):
color elements,
off of off of the page and incorporating them in the g g plot two code. Really handy trick for some of the the data viz folks out there that might be a takeaway. And I'm sure there's plenty of other things
in this particular video here that you could gain,
a lot of knowledge from. So hats off to Spencer.
Well, I couldn't have said it better myself. So, again, lots more

(43:59):
in this week's our Wiki issue. And, again, check out that that screen cast. Again, it's a very entertaining,
style that Spencer has, and I can relate to
it on on many levels.
But, yeah,
our we better get out of here soon. Our day jobs are calling us once again. But, again,
if you wanna get in touch with us,
we have various ways of doing that. First of all, we we appreciate your input on the our weekly issue itself. If your new poll request for that great new package, resource, blog post you found,

(44:28):
I have a few and I might be next up for curating so I can need all the help I can get. So that would be great.
You can also get in touch with us with the contact page in this episode show notes. You can also find us on the social medias.
I am on in fact, you know what? Check the show notes. You hear me say it enough. We're you know where to find us. Just look at your podcast player. We're all there. So with that, we are gonna close-up shop here on our wiki highlights episode 209,

(44:54):
and we'll be back with hopefully episode 210 of our wiki highlights. And again, hopefully, you never know where the summer goes next week.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

The Breakfast Club

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Issue 2025-W31 Highlights

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

The Breakfast Club

All Episodes

Issue 2025-W31 Highlights

Stuff You Should Know