Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:02):
Hello, friends. We are back with a 292 of the Our Weekly Highlights podcast. This is the weekly show where we talk about the awesome highlights and additional resources that are shared at this week's Our Weekly Issue.
My name is Eric Nantz, and I'm happy you join us from wherever you are around the world. Hopefully, staying warm depending on where you are in the world because it is frigid over here in my humbly abode here. But I'm warming up with this recording, and, of course, keeping me all warm and fuzzy in terms of, you know, cohosting
(00:30):
is my awesome cohost, Mike Thomas. Mike, how are you doing today? Doing well, Eric. Yeah. Thankfully, these highlights are hot,
because in Connecticut, it is just as cold as I'm sure it is in Michigan,
right now. So it's pretty out, though, here. We got some we got some nice snow.
Yeah. That's true. It hasn't all melted yet here. And when the kids see the sun, like, I don't want the snow to melt. They're like, it's not gonna melt at 0 degrees, buddy. It's not. No. Not here anyway. So
(00:56):
but as you said, we got some fun, hot topics to talk about in the highlights this week. And, of course, this is a community project. Right? So we've got
our curator of the week. This time was Ryo Nakakawara,
who is one of our OGs in the curator space of our weekly.
And as always, he had tremendous help from our fellow our weekly team members and contributors like all of you around the world with your poll requests
(01:20):
and suggestions.
And we lead off with a visualization
style of a package that definitely has a lot of utility in terms of the scope of it, especially in my industry.
And we're gonna dive into it here.
And this post is actually coming to us from the RPost blog, and it's a guest post
by Paul
(01:42):
Satorra. Hopefully, I said that right. He is a bio statistician.
And in this post, he talks about
introducing
a package that he's created for the art community,
now on CRAN,
called flowchart.
The name should make it pretty intuitive of what it does, and it helps you create flowcharts of R.
Now you may be thinking, and this show and other, you know, other, presentations
(02:06):
or resources,
creating flowcharts, there's a lot of different ways of doing this. Right? Especially in the in the realm of HTML style outputs.
We we've been using I know myself and Mike, I believe, have been using
frameworks like Mermaid. Js within our quarter or our markdown documents. So there's definitely ways of creating flowcharts there.
(02:27):
I also was a heavy user of the diagrammer
package from long ago.
That was, helping me out quite a bit with creating
may not be necessary flowcharts, but definitely things like decision tree outputs and and, you know, choose your own adventure kind of layouts.
But what flowchart
brings differently than the rest of those
(02:48):
is, in essence, a very tidy interface to make all this happen.
So let's dive into this a little bit from the post. So first, we will have, of course, a link to the the package and the episode show notes here.
But as I mentioned, it is on CRAN, so it's all just an install dot packages
flowchart away.
And it actually,
(03:09):
you know, requires you to bring your own data. So for the case study in this example, there is a built in dataset
from a publicly
available clinical trial set of results called saffo,
and it is actually about the journey of patients
throughout the life cycle of a trial. When I say journey, I'm thinking in terms of
(03:32):
what is the result of their status, and this could be that they are randomized to the trial, I. E. They get one of the treatment assignments,
or they discontinue after the randomization
for a various reason, or they end up completing it. There are many other nuances in this, but this isn't a clinical trial podcast, so I'll I'll stop there. But the the data is built into the package.
(03:55):
And, basically, in order to register
this data to be available for a flowchart,
you start off with feeding this dataset name
into a function called as underscore fc.
And this is basically gonna turn your data frame into a list object with 2 components here.
One of which is the original data going into it, and you may be wondering what does the data format look like. In the case of this example,
(04:22):
each observe each row is what looks to be a patient with a rant with a unique ID,
and then the columns are the different kind of flags in terms of where they're at
in the in the trial and what the statuses is. So, again, there is a vignette that describes this data set in more detail, but it's basically a bunch of yes or no type variables
(04:45):
for what happened to that patient in the trial, whether they they were, you know, in they met the inclusion criteria,
whether they had chronic heart failure or whatnot. Again, you can take a look at the data in in the episode show notes.
So once you feed this in that he has underscore fc,
now you may be wondering what do we do with this? Well, you can just simply draw
(05:08):
a very bare bones flowchart or one cell with one function called fc_draw,
where if you feed in that original
data or that original object,
you're just gonna get a box
with,
an optional label of your choosing if you want. And this time, it has, like, all 925
records in one box saying that these are all the the patients inside.
(05:33):
Well, that's that's boring. Right? Well, let's start actually having some flow in this flowchart. Right? So
that's where the tidy interface kinda comes in here where you can feed in
this dataset.
Again, make it a flowchart object with as underscore fc, and then pipe that further
to an fc_filter
object. And this is where you can perform what looks to be like dplyr manipulation
(05:57):
with its filter statement.
And in this case, in this first example,
we want a simple
filter to determine if the patients were randomized or not.
Now there is no column for whether they are randomized or not,
but there is a grouping column, which in essence acts like that because it determines what treatment group they were assigned to.
(06:19):
If that column is missing, it means they weren't randomized in it. So in the case of this example,
the filter for rather that group variable is not missing. So an exclamation point is dotna group,
and then you can give it a label.
And then you can also show
who were
not meeting that filter. And that's gonna be automatically labeled in a box called excluded.
(06:43):
Then when you draw that, then you get that original box of the 925
patients, but then there are 2 arrows
going away. One
arrow goes to the right, and it has this excluded box. And then the arrow going down has another box that,
the author's label is randomized,
which has now 215
patients. So,
(07:04):
obviously, not many patients made to that randomization step, but this is
a very similar format that we do in a lot of our clinical trial reports, and we get to what's called the disposition section
where it shows the flow of the patients that meet certain criteria
and who end up actually completing the trial.
So this look quite familiar to me,
(07:26):
but you can do a lot more than just that single filter. Right? You can also,
at that next step where it show those randomized patients,
you can now split that
into different boxes as well. You might call parallel boxes,
and you can use a function called fc_split.
You give it the group, the variable that determines the grouping of that split. In this case, it is simply group,
(07:52):
and that's now gonna partition that randomized group into 2 boxes
of the 2 different treatment groups. Again, pretty straightforward, pretty neat tidy interface here,
and you can do even more manipulations
with that fc filter applied to those, you know, middle boxes that we just created at the treatment groups. And you can read the example more in the post here, but, again, it's really just using the fc_filterfunction.
(08:18):
And then you'll see these boxes kind of in parallel chains or or or,
you might say trails going down, but the boxes are all parallel next to each other for the equivalent
kind of steps.
So in essence, the flowchart looks pretty darn polished already.
And again, with a tidy interface,
(08:38):
I think it is a this is a great package to put in your toolbox if you just want something quick to the point of a familiar
tidy verse kind of piping syntax,
and you could feed this into whatever document you choose. I could see this going into a quartile document or markdown,
you know, whatever have you, whether it's HTML
or PDF format.
(09:00):
It looks like it's gonna output these image these, flowcharts as image files, perhaps, although I haven't tested that myself.
But it is definitely an interesting paradigm if you want
in if you know what your data going into is fairly straightforward, which it is in this example.
And you may not necessarily need the additional customization that you get with frameworks like Mermaid. Js and whatnot or diagrammer or some of these other packages
(09:28):
past. So, again, you might find a great use case for this. As me as usual, I like having choice in the way I I, perform these flow or I construct these flowcharts.
There could be cases where this might not quite fit your needs. If you have more, you know, customized kind of directions of the flow, maybe things kind of feedback to an above step. In that case, maybe mermaid. Js is is a better fit for you. But like I said, for this kind of flowchart where it's a pretty predefined start
(09:58):
and stopping points or might say finishes and kinda the trail of where this flowchart goes, I think, Paul's package could be a could be a great fit for your toolbox.
I agree, Eric. Yeah. I make flowcharts
literally every day.
They're the way that I communicate with both my team and our clients
about sort of the the end to end process that we're going to undergo to to get them to the solution,
(10:23):
because that's that's how we bridge the gap. And there's you know, a few years ago,
there weren't a lot of great flowcharting tools that integrated well with version control. You know, we used some like Lucidchart
and Visio,
But you really had to, like, export those as as PDFs or maybe host them somewhere that folks could go take a look at, but not scriptable, not easily
(10:47):
versionable.
And nowadays, you know, as you said, there are better options, mermaid. Js being one of them, the diagrammer package
being another one. But I'm really impressed with this flowchart package here. You know, it's it's really easy and simple syntax
to get started, very tidy friendly for developing these flowcharts.
And when you look at it on the surface, especially in some of these examples, a lot of these functions are really just taking one argument.
(11:14):
And there, you know, isn't doesn't appear to be a lot of customization,
but that's actually because there are a ton of other arguments that have default parameters that can be changed if you want to. You know, originally, I thought that this was a package that, you know, was just very simple syntax, you know, made a lot of decisions for you. And with the default parameters, they do, but you still have a lot of control
(11:37):
over all sorts of different types of things, like the direction of the line, within that fc filter
function, whether or not you wanna kick those filtered observations
out to a flowchart node on the right or on the left,
you know, font styling, font size, font color, things like that,
rounding for the number of digits that are gonna get displayed,
(11:58):
the background color of the node itself. So if you actually take a look at, you know, some of the the the reference,
the reference page of the package down site, which is what I'm taking a look at, and click into some of these functions,
you'll realize that there are a ton of arguments behind the scenes that are almost ggplot like in terms of the amount of control that you have over each element,
(12:21):
in your flowchart. So I'm I'm pretty impressed pretty impressed with some of the new,
features
and, you know, some of the the interesting
functions that they have in here that I'm not sure I've seen anywhere else
like fcmerge
and fcstack, which allow you to actually combine 2 different flowcharts either horizontally
or vertically. I thought that's pretty interesting. Maybe could help in your workflows depending on how you're sort of modularizing
(12:47):
your code. So really impressed.
Honestly, just on sort of a side note, I really like the
the HEX logo as well. I think it's really cool. And
I'm excited to start to play around with flowchart as well because I had not come across it until today. So great way to start off the highlights this week.
Yeah. I can see a lot of convenience here. A lot of ways just cut that chart done, like I said, for pretty straightforward datasets. And, yeah, I'm gonna
(13:14):
show this to a couple of colleagues here as we're thinking about ways of using R and more of the
document generation space, especially for these more,
I'll be I'll try to be polite here, rigid set of documents that we have to do in my industry. We're we're slowly trying to feed R into these things. And in fact,
there is a section that we often have in what's called our analysis data reviewer's guide where we talk about kind of the flow of how the programming works, wherever you go from dataset
(13:43):
to program and then to output.
Perhaps flowchart could be useful in that too. So I've got some, I got something to share to some colleagues, I think, later on today. So, yeah. Credit to Paul for sharing this package with us, and, yeah, choice is good as they say. Pray for the day when our audience
finally lets us do dynamic documentation.
You can't have everything, Mike.
(14:18):
Alright. And our next highlight here, we're gonna shift gears quite a bit because we're gonna get really in the weeds technically here, but with a pretty fundamental issue that I think has affected
quite a few package authors in recent months and maybe even the recent year, year and a half,
having to do with best practices
(14:38):
and recommendations
for authoring packages that have to do with more
than just, you know, new r code in the package itself.
In particular, we're gonna talk about what you wanna do when you extend a package with another language,
mainly the c language in your next r package,
(14:58):
and some of the learnings that have been that have been shared from a very influential package in this space.
So we are talking about the latest blog post that comes on the data dot table community blog, which has been featured, quite a bit in last year's, section of the highlights.
This post comes to us from Ivan Krivlov,
(15:19):
and he leads off with the tagline about using
non API
entry points in data dot table.
Now it's amazing. In 2025, I think when most people think of APIs, they're thinking of those web APIs. Right? No. No. No. We're not talking about that here. API actually is a historical term in software development. We are talking about ways you can interface with the language
(15:42):
in different constructs or different perspectives.
And in particular, we are talking about the API that the R language itself exposes
to package authors
via its integrations
with the c language.
So really getting, you know, setting the stage here
from since the beginning of r itself, there has been, you know, the canonical reference if you wanna build something on top of r, which ideally is a package
(16:12):
or perhaps are even gonna contribute to the language itself,
there is the writing r extensions
reference. This can be found directly on the r project homepage,
and this is what the CRAN maintainers are using as reference for any new package that's coming
into the our ecosystem.
Certainly, there are a lot of automated checks in place to make sure a lot of the principles in these in the extension manual is met. But in particular, why we're talking about this in this post is that within this manual,
(16:43):
there are, in essence,
entry points that are defined by the R maintainers
to interface with the C API of R itself.
And in particular, there are 4 categories that you'll find in this manual.
1st is literally called API, and you can think of these
as the ones that are documented,
(17:03):
they're declared for use,
and that they will only be changed if the CRAN maintain or if the r, you know, maintainers end up deprecating that particular
API call.
Then you get to the next 3.
There is the public designation,
which these are exposed
for use, you know, by our package developers,
(17:25):
although they aren't really documented
and they could change without you knowing it. So you could think of this as like
maybe in a package you have a function that is technically there, but you don't export it to the user with user facing documentation.
But like any package in r,
you can look at or use any function in the package with the namespace,
(17:49):
you know, prefix
between 3colons
and the function name.
So again, some call that off label use. You your your terminology may be different.
There is another category called private.
These are used for building R, and, yes, they are exported,
but they're not declared in the header files of R itself.
(18:12):
And they say point blank.
Do not use these in any package. Do not. No. None at all.
And then you get to hidden. This one really is peculiar to me.
They are entry points to the API
that are
sometimes possible to use, but they're not exported. But I think it kinda goes by the name of it. You probably don't wanna touch those.
(18:35):
So,
historically, there's been
no consternation
from the R maintainers
or the R package authors
that the the the header the entry points designated as API,
all good. Right? Should be able to use those.
However, there has been a bit of discourse around the use of the public ones
(18:57):
because they're not documented.
They're not forbidden
by our command check,
and they've been there for a while.
However, there has been a little bit of, you know, modification to the language itself
where some of these,
to be able to use these, there may have been either somewhat we call escape patches put in
(19:20):
by, you know, having
a header called define use our internals
that was used by package authors in the past to kind of get around maybe some potential issues.
Well,
that you might call escape hatch or loophole was kind of closed,
in recent versions of r.
And then
(19:41):
the number of non API
blessed calls
grew a little bit in between package or between R versions.
And, also, another,
you know, discussion on the R development list
is where is the framework
or the header of the library called alt rep fit into this, which got a lot of great press in recent years in the r community
(20:06):
about being a more optimized way of operating on vectors. And, in fact, I I was had the pleasure of speaking with, Gabe Becker numerous times who was influential in getting alt rep into the language itself,
although it was certainly labeled experimental
in those times.
So fast forward a little bit,
(20:27):
but there's been a little confusion into, like, which of these API calls, you know, are really
ready for package you package authors and whatnot.
Luke Tierney on the on the our team,
he's actually worked program to try programmatically
describing
these exported headers, these exported symbols,
(20:49):
and to be able
to, you know, give a little more clarity into what package authors can can can use.
And he's, you know, found 2 additional categories as a result of this.
Experimental,
which I think sounds a little more intuitive. These are header, you know, pointers that are there. They're in the early stages,
(21:09):
so there might be some caution to use them because they could change that NER version.
So be prepared to adapt, basically.
And then there's one called embedding,
and this is meant for those who wanna create what are called new front ends to the language itself. Itself.
But for now, they're keeping it separate. There isn't a lot of traction on whoever to use those or not.
(21:32):
And then
now
our command check has been beefed up a little bit
to make sure that it is checking for any calls by that package
that are using these
non API entry points, I. E. Those that move from the API designation to some of these other ones.
(21:53):
And it looks like data dot table was on the recipient of some of these checks and recent upgrades.
And so the next part of this post dives into,
as a result of these checks, what the data dot table authors are doing
to be compliant with kind of this reorganization
of the c, you know, API entry points
(22:14):
that data dot table has been relying on for years years.
Again, some of these escape patches are being patched up, and I've actually seen
discussion on the Mastodon,
you know, in the r group from people like cool but useless, Mike FC, you might know him as about some of the adventures he's had with trying to use c
API endpoints and some of those packages he's been dealing with and our command check issues and whatnot, but it looks like data dot table
(22:40):
has been, looking at this quite a bit.
So I'm not gonna read all these verbatim because there are a lot of corrections that are being made in data dot table to use some of these more
either updated
or or newer
API
c endpoints or entry points.
There are some that are quite interesting where they've got
(23:01):
solutions in place
and they link to every poll request
that fixes these issues in each of these sections.
Some of which is looking at,
you know, comparison of calls and pair lists, I call it, and which entry point they're using in the past, what entry point they're using now,
(23:21):
looking how strings can be composed as c arrays,
refactoring
certain reference counts,
dealing with encoding and string variables,
growing vectors so that doesn't destroy your memory. There are some new entry points for that as well that you can read about.
And then it gets pretty interesting
(23:42):
because there is more
and especially getting back into the alt rep
framework.
Apparently, there are ways or there are some
might say confusion
into where alt rep fits in all this and which parts of alt rep
should be exposed
(24:03):
in the in the way that a package author is not gonna get dinged
in our command check.
There is a lot of narrative in this, and this actually does
speak to how you grow vectors and do some other type checking.
So I thought alt rep was kind of all ready to go. I'm not saying it's not ready to go, but apparently there are some refactoring that needs to be done
(24:26):
with starting with r4.3
in terms of how you grow these vectors
with the alt rep framework.
So this post talks about the common rep methods in alt rep
and other common, you know, you know, interactions with this and the c libraries.
And there will have to be some refactoring in data dot table
(24:47):
to use some of these newer recommendations
for alt rep. And like I said, growing these vectors,
growing these table sizes,
doing things like fast matching of strings.
And this is the one section
where
things are not fixed yet. There is a lot of refactoring. It needs to be done by the data dot table authors that comply
(25:08):
with some of these new endpoints
and some of these newer, you know, recommended approaches of using alt rep.
And there is even more
going on here with some other attribute setting
dealing with missing values
where they are very transparent. They're not sure how to fix some of these yet in light of these new API calls or these API calls
(25:30):
being shifted.
Again, this is an extremely
technical deep dive into it.
I, for 1, have never authored a package that deals with c, so I don't have a lot of firsthand experience with dealing with these checks. Although I've seen, again, some conversation about this
on social media and the rdeveloped channels and whatnot.
(25:50):
But if you ever wanna know how a very important
large scale
set of package package like data dot table and the authors of that package
are dealing with some of these newer approaches that the R team is recommending
for dealing with these API entry points,
boy, this post is for you. There is a lot to digest here. Again, I can't possibly do it justice
(26:13):
in this particular highlight,
But I think it's important to have things like this as a reference
so that it's not just so mysterious to you as a package author if you get dinged by an r command check about these API calls. I'm wondering
how would another team approach this. This this is a very technical deep dive in how you can approach it. And as I said,
(26:36):
some of these are not fixed yet. There is obviously still time in between releases to get compliant with these
newer calls, so I'm sure data dot table is gonna find a way.
But we're all humans after all. Right? It's not always a snap of a finger to get into these newer
these newer ways of calling these entry points. So getting into the internals of data dot table quite a bit, but more importantly,
(27:00):
also looking at how they're dealing with this new world, if you will, of using c with a new package in the R community.
Yeah. That's a lot. But, again,
really recommended reading if you find yourself in this space.
Yeah, Eric. This one is is very technical as you mentioned,
but I think it's it's great to have a really technical blog post like this. And it it may seem really niche, but I guarantee you it's going to help someone else out there who's probably going to run into the same situation with their
(27:31):
R package where they leveraged,
you know, this kind of API
interface
into sort of the underpinnings of of the c code
behind r,
to accomplish something and and realizing maybe now that, you know, CRAN is going to start to complain
about that. And, you know, as much as we might have mixed feelings about CRAN and the the checks that they enforce can be stressful to us sometimes. Like, I did see a a blue sky post recently. I don't know what they're they're called, toots tweets.
(28:02):
But somebody
had, you know, passed
6 checks, I guess, on the different types of operating systems that get checked on CRAN, and then the the 7th was Windows, and it failed.
Like, that hasn't happened before. Right. My goodness. And, obviously, that's the worst feeling
in the world. But
if we really take the time to step back and think about how
(28:24):
open source software, and I guess most software in general, is just, you know, software stacked on top of one another over and over and over. And if we're going far enough down the r rabbit hole, right, at c,
And not to throw stones, but it's a little scary to me that,
you know,
something like CRAN doesn't exist in other languages. You know, I'm thinking about the Python ecosystem, and I think it's pretty easy to submit a package to PyPI. And I don't know if they require you to have, you know, any unit tests at all. Not not that R necessarily requires you to have any unit tests, but at at least they're going to try to build your package, right, and let you know if, anything is is breaking. And it's, you know, as you make changes and updates to that package,
(29:09):
it'll rerun it
and, you know, rerun a lot of those tests, and those tests are getting updated for
things like this. Right? Newer versions of r and newer
guide lines and guardrails that we have to adhere to to make sure that your package has the best chance of working
on everyone's
computer. Right? And
(29:30):
I think that goes a long way to,
you know, at least provide some infrastructure that's going to appease,
you know, auditors.
You know, I don't think the SAS community is ever go SAS community is ever going to be happy with us, and they'll point to to situations like this about why their software is is more stable or or better,
(29:51):
than open source. But I think you and I could talk for about 10 hours about why that's not the case.
You know, but it's it's really interesting, and I'm very appreciative
of blogs like this
that really take the time to walk through all the decision points, you know, sort of everything that was laid out in front of them and and what they were up against and and why they made the decisions that they did to try to, troubleshoot this particular
(30:14):
issue.
And, I'm also grateful to not have to understand
any of this. You know, I'm being a little facetious,
and I certainly understand that it's it's all c under the hood, but the folks that have really taken the time to
understand,
you know, the bridge between these two different languages to to build these
(30:35):
higher level, right, programming
interfaces,
for folks like us that that make it easier
to work with,
you know, it's it's incredible. You know, I think it's why the R language and the the Python language as well, you know, are
as popular as they are because the
the syntax and the APIs, not to use a buzzword here, that have been developed,
(30:59):
you know, make it very
user accessible
to a wide audience. And,
you know, one last note here. I guess it's pretty crazy to think about how old Data. Table is.
2006 was the 1st CRAN release.
The oldest version of dplyr released on CRAN, at least from what I can see on the package downside,
is 2014.
(31:19):
So 8 years later,
still a decade old, but we're going on 2 decades of data dot table. And it's definitely,
been a package that was transformative
for the R community. So great to see it still still thriving, and, you know, the folks that work on that
project are are at the cutting edge, you know, of a lot of what's going on, in the open source data science ecosystem. So hats off to them and great blog post.
(31:45):
Yep. It stands the test of time as as an understatement to say the least that it has that history and it's been that influential
in this community.
And and, again, not all of this was despair. Right? I mean, there were many of those points that,
that are mentioned early in the post. It was simply changing the name of, an API header call or whatnot. And it was straightforward
(32:08):
in the documentation
of which to change it to. And again,
credit in the post, having all the links to various poll requests that fix these. So Ivan did a tremendous job of being transparent of like showing the fix at a high level and then pointing to the actual code that does the fixing. I love that. I can't wait to dive into that a bit further.
(32:30):
But again, it calls out that like anything in open source, it's not always a quick fix to everything. So I will be keeping an eye on what's happening with those alt rep style header
calls where there are new wrappers that need to be made in this in between world of the current version of r and an r version 4.5 or later, which is due out, I believe, this year. So, as usual, if anything developing
(32:54):
a a highly influential
production grade package
or app, you gotta think about backward compatibility. Right? So that's what their their journey is on, and, yeah, we'll be very interested
to see where it goes. And in the cases where they don't know the best fix yet, I hope that the community can help them out too and that there will be
a transparent,
(33:14):
dialogue for that. But
data dot tables, group of authors have been on the cutting edge for many, many years.
I'm so thankful that he got that recent grant to put resources like this blog together and their various presentations that they've had at the conferences.
So it's great to kinda get a lens into all the innovations they've been thinking about, you know, now in in the public domain like we get to see here on our very well humble little our weekly project.
(33:59):
So we're not gonna talk about see you again for this podcast. Enough and again see, of course. We're gonna go back to some visualization
with a very important type of visualization
in the in the world of health,
especially of a very important
organ in our bodies that we're relying on every single day for obvious reasons.
So it's one thing to talk about, you know, how your brains work. Right? But when anytime we're trying to diagnose issues with our humble little organs inside our craniums up in our skull,
(34:28):
you often turn to, you know, visualizations,
I. E. Scans, of your brain tissue
to perhaps diagnose issues
or find ways that maybe a treatment is affecting certain, you know, or certain parts of your brain, if you will.
Typically, this is done via MRI scans.
And just like anything,
the art community has stepped up for ways you can bring these visualizations
(34:52):
into R itself for further analysis.
And our last highlight for today is a great tutorial
on some of the issues and ways that you can import and
analyze these type of highly complex visualized data here.
This post is coming to us from
Joe Edsall, who is a staff scientist at the cognitive cognitive
(35:14):
control
and psychopathology
laboratory at Washington University in Saint Louis. That's a mouthful, but she definitely is a subject matter expert in this field from what I can tell here. And she has written multiple tutorials in the past. In fact, she's, constructed these with Knitter, which is a great great way to use, again, reproducible
(35:34):
analysis for tutorials.
And she's addressing some of the points that she had talked about and working with 2 different types of quantities in these brain images.
One is the volume
and the other is surface area or surface of the of the brain visualizations.
So first, she talks about the volumes of this.
(35:55):
And,
just like anything in in the real world in physics,
we are, you know, 3 we have the three-dimensional, you know, perspective here. Right? And when you get these MRI scans, you get
three-dimensional
coordinates of these if you feed this into some of the more standard software to
to actually visualize the readings from these MRI scanners.
(36:18):
And you see some example images here looking at,
some off the shelf software
where you look at on the right side the three-dimensional
layout of the of the brain itself,
and and then you get more of a 2 dimensional representation
via the different perspectives.
So all this data is readily available from these image formats
(36:40):
once you import it via this great package
called
rnifty,
r n I f t I, if you wanna look that up after, well, the link in the show notes.
But there is,
you know, very handy ways to import that image file.
I believe these are actually,
zipped archives of these of these, images,
(37:02):
and you'll get a lot of different attributes of the different pixel dimensions, especially in the three-dimensional
space
where you can use to help visualize this and perform
additional processing.
So that can be very important if you're looking at different areas of the brain and trying to see the coordinates and the different
representations
(37:23):
of those. So you this package can help you
figure out all those different orientations, all those sizes of those areas.
And, again, off the shelf software that can be used to visualize this,
is readily available, but r itself,
again, gives you a nice way to plot this
in your in your r session as well.
(37:44):
But, again, it's not just the volume perspective. It's also
the surface perspective, and this is where you can do some really handy things
like looking at within your brain the cortex. Kind of this almost like a winding pipe inside your brain in different regions
to see maybe where some areas are maybe are getting a little more, you know, condensed. Maybe they're getting plugged. Maybe there's an anomaly in the
(38:10):
in the image there.
But these type of surface visualizations,
they require a different type of format for visualization.
It is called Giftee.
Never seen this in my day to day work,
but that
is helping
consolidate the image data into what's called pairs, kind of representing both the left and the right side
(38:32):
of the brain in those corners. And she links again to some previous tutorials that she's authored
to import these files into R as well via a function called
Gifty, another or a package, I should say, called Gifty. Again, freely available. We'll have links to that in show notes as well, where you can then
(38:53):
interrogate this,
surface imaging, you know, data
and be able to get different dimensional
representations
via, like, the locations,
the maybe triangle type dimensions.
And again, you can plot these as well so you can get a visualization
of the different hemispheres of the brain, not too like the hemispheres of a globe. Right? You have the left and the right, and then you can flip that around,
(39:20):
do different color
ranging depending on the intensity or the different areas
of these images. So you get kind of that heat map like structure
for the left and the right. Maybe some areas are having an issue, maybe more brightly colored
than others. And, again, you get the code right here in this post
for how you can define these regions
(39:42):
and define the different visualization
for how you can
distinguish those from the other areas and maybe more of the normal
representation.
So it is great in the world of bioinformatics,
in the world of other, you know, health data,
when we're working on treatments that are trying to help
deficiencies or maybe areas in the brain that are getting, you know, affected by diseases.
(40:06):
The one that comes to mind immediately is all the research that's being done in Alzheimer's disease where they're looking at things like the plaque, amount of plaque in the brain that's impacting tissue
as a hypothesis
to try to slow the cognitive decline
of patients as they're as they're dealing with that debilitating
disease.
But the first step, right, is to see what you got. So this great post by Joe, the look at the different packages
(40:31):
that you can import this data in and be able to quantify these different regions and maybe point those out via an additional
visualization.
It looks really top notch. So if you're in the space of visualizing these readings such as MRIs,
this is a wonderful post to kind of show you
what is possible here. And again, with links to really dive into it further with these great packages,
(40:55):
like I mentioned, Rnifty,
as well as the GIFSKI package. Yeah. Really great stuff here.
Yeah, Eric. And this is just super super cool, and it shows us just how fantastic the graphics capabilities
are in R.
And there were a few publications that were referenced in Joe's blog post that makes me think about doing reproducible science,
(41:17):
and how just impactful this type of work is. And we can create these publication ready visualizations
programmatically
based upon the data.
And not only can we, but in my opinion, I think we have to. We must.
My only other takeaway here is that I need to see this somehow integrated with the Ray render package
(41:38):
for interactive
3 d visualizations
of the brain and the different hemispheres.
So shout out to Tyler Morgan Wall,
the author of the Ray render package. If you're listening, you know, no pressure, but it would be pretty cool. We don't nerd snipe on this show, do we? Never.
It's usually me putting the pressure on on myself or you doing the same for yourself. So it's about time that we just start calling some other people out.
(42:04):
Alright. Well, if you wanna see more material like that and more, well, guess what? There is a lot more to this particular issue. As always, our weekly is jam packed with additional packages,
great tutorials,
great resources.
We'll take a couple of minutes for our additional finds here.
And we are talking about those that are contributing via add on packages to the r community and our data dot table discussion.
(42:29):
Well, there is, in terms of contributing to the language itself, there's we have covered a lot of great initiatives
to bring
developers that are wanting to contribute to R itself in a friendly,
you know, open way, whether it's these meetups or these hackathon type dev sessions with the r forwards group and whatnot.
Well, another great resource that's being developed as we speak and really taking, you know, it to the next level
(42:55):
is what's called the CRAN cookbook.
We'll have a link to this, from the rconsortium
blog in the show notes, of course, and this is
meant to be a more user friendly yet technical,
you know, recipe type book,
which is gonna help those new, you know, those new to the R language in terms of wanting to contribute to the language itself.
(43:18):
And it really is great for those that are dealing with issues submitting their packages to CRAN
and the different issues that they can come across.
There could be just about, you know, formatting your package as metadata with a description file.
Could be about your
documentation
itself of your functions
and, of course, within the code itself. So I don't think it's gonna get into all the weeds of those c header issues that we talked about. But, nonetheless,
(43:45):
I think this is a great companion to have with, say, the R packages reference of an offer by Hadley Wickham and Jenny Brian
as you're thinking about, you know, getting that submission to CRAN and some of the things that might happen
that might blindside you if you're not careful,
but a great way and accessible way to look at how you might, you know, get around those issues and how to solve them in a way to get your package on CRAN. So
(44:11):
I know this effort has been in the works for quite a while. It's great to see this really taking mature
and how it's being used by the Grand team itself
and where they're going forward with it. So, yeah, credit to the team,
the Jasmine Daley, Benny Ultimate, and others, involved with that project.
And, Mike, what did you find? Shout out Jasmine Daley, Shiny developer in Connecticut.
(44:34):
Heck yeah. Yeah. Yeah. Gotta love that.
A a bunch of great stuff. You know, one
blog that I I found, which was just sort of really nice to reflect on was from Isabel Velasquez over on the POSIT team. It's the 2024
POSIT year and review. A little trip down memory lane of of all that POSIT worked on,
in the last year. And, you know, a lot, obviously,
(44:57):
around
their R packages for interfacing with LLMs,
like Elmer,
you know, Shiny Assistant,
Shiny Chat, Pal,
you know, as well as things out of the Quarto ecosystem,
including Quarto dash boards being a big one.
Obviously, all sorts of stuff coming out of the Python
ecosystem on both the R
(45:18):
and,
Python
or excuse me, out of the Shiny,
world in both the R and Python
side of the equation there. Some great advancements from tidy models and survival analysis that were really impactful
to our team as well as a bunch of others across, you know, WebR.
I know that's one that, you know, impacted you quite a bit in 2024.
(45:41):
So it was just nice taking some time to do that reflection
on,
you know, all of the work and investment that Posit and the other folks that, contributed
to projects that Posit maintains.
Shout out myself with one small,
contribution to HTTR 2 in the latest release, just yesterday.
(46:01):
So thank you. It's, I think, 2 words in the function documentation
for Oxygen comments, but we'll take what we can get. I was I was on the list. So, thanks, Hadley, for including me among, I guess, 70 other folks who contributed to that latest release of HTTR 2.
But it's it's cool to all collaborate together in the open, and I think that's all I'm trying to say here. And it was nice to to walk through a lot of these projects that have impacted
(46:27):
me
and my team,
you know, in 2024
and beyond. Yeah.
Excellent. And you're on the score sheet as they say. They can't take that away from you. That is awesome stuff. I I congratulations
on that. Yeah. It's amazing the breadth of contributions
in this space. And,
certainly, you know, AI was a focus for them with their with their awesome innovations of Elmer and Maul and the shiny assistant, which I'm really a big fan of now. I was one of the skeptics on that, so it's great to see him doing it and doing it responsibly. So credit to the the team on that. But, no, they're not just resting on those, innovations. As you said, the WebR stuff really is
(47:06):
is jiving. It's really, getting a lot of traction, and I can't wait to see where we take that
effort in 2025.
And when I say we more like what Jor Stag comes up with, and I'm just a very,
shameless,
consumer of it, but I love the stuff that he comes up with. So lots of lots of great stuff here. There's never a dull moment in the in deposit team here.
(47:28):
And and never a dull moment. The rest of the issues we say, lots of great resources that Rio has put together for us. But, of course, as I said, this is a community effort, and we cannot do this alone.
So one of the ways that we keep this project going is for your contributions.
You're just a pull request away from getting your name as a future contributor to our weekly itself.
(47:48):
The great blog post, maybe a new package that you authored or you discovered, there's so many opportunities for it. Head to our weekly.org.
You're gonna find a link to get your poll request up there right in the top right corner.
We have a a handy draft for template for you to follow. Again, leveraging GitHub for the win on that. And, our curator of the week will be glad to get it in for you. And, of course, we love hearing from you. And, we did hear from one of our more devoted listeners about, apparently, I do not pronounce names well. And even though I practice it,
(48:19):
I got, called out for it. So, I'm gonna get it right this time.
Nicola Rennie. Sorry for butchering her name. All these months in the previous highlight podcast.
Thank you, Mike. Thank you, Mike, for calling not you, Mike. Mike Smith for calling me out on that. I need to be honest with it. So
feedback warranted, and, I I may have to have a little cookie jar of, like, funding I send a nickel every time I butcher her name in the future. Hopefully, never again. Nonetheless. Okay. We love hearing from you, and the ways you can do that are through the contact page and the episode show notes as well as on social media as well.
(48:55):
I am atrpodcast@bluesky.social,
I believe, is how to call it. Again, this is still not natural to me yet. I'll get there. I'm also on Mastodon with atrpodcast@podcast.social,
and I'm on LinkedIn. Search my name, and you'll find me there.
And, Mike, hopefully, you don't have a hard time butchering names, so we're we're we're gonna find you.
(49:17):
You can find me, I think, primarily on on blue sky nowadays at mikedashthomas
dotbsky.social,
or on mastodon,
atmike_thomas@phostodon.org.
Or, probably even better on LinkedIn, if you search Ketchbrook Analytics, k e t c h b r o o k, you can see what I'm up to lately.
(49:42):
Awesome stuff. And a little quick shout out to, good friends of mine,
from the art community,
John Harmon and and Yani City, because I've been using in some of this r wiki infrastructure I'm building,
some of the packages they've created to interact with
interface with the Slack API of all things. So it's been pretty fun learning there. And, again, h t t r two is involved in some of that as well. So it all comes full circle in this fancy schmancy calendar thing I'm I'm making. So always learning all the time. So shout out to those 2 for making some really elegant packages to interface with an API of a framework that seemed really cryptic to me at the time. But now now it's starting to demystify a little bit.
(50:23):
Alright. Well, we'll close-up shop here for episode 192 of our weekly highlights,
and we'll be back with another episode of our weekly highlights
next week.