Issue 2024-W50 Highlights

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:02):
Hello, friends. We're back with episode 189 of the Our Weekly Highlights podcast. This is the weekly show where we talk about the terrific highlights that are shared on this week's Our Weekly Issue.
My name is Eric Nantz, and yes, we are now getting towards the middle of December. And it is the
season of giving in many, many were parts of the world. And, of course, over the weekend, my, kiddo was nice and gave to his entire family a nice little nasty cold that I'm just recovering from. Hopefully, the voice is up to snuff for this episode.

(00:35):
But, yes, I am here. But, luckily, I'm not here alone this time because fresh from his travels across the country is my awesome cohost, Mike Thomas. Mike, how are you doing today?
travels across the country is my awesome cohost, Mike Thomas. Mike, how are you doing today? travels across the country is my awesome cohost, Mike Thomas. Mike, how are you doing today?
you. And luckily, no Wicked Witch is, delaying your flights or anything, so that's terrific. Not this time. Thank goodness.

(01:01):
Yes. Yes. It is a busy travel season, so we're always, like, you know, consider ourselves lucky when things
go smoothly. And and, honestly, we are also very lucky that we have,
on my opinion, a fantastic issue. And I'm not just saying that because,
check check notes here. Oh, yeah. It was me curating it this time, but, no, this is awesome because I have all of you in the community to thank for your awesome resources. I was able

(01:26):
to merge into this issue, and and I dare say, I think this was a human smoother process than the last time I curated it. And we had some nice poll requests that I merged in and a lot of content that we can talk about. But we'll be focused on the highlights this this time around. But, of course, since this will be the last
issue I curate
in 2024, just wanted to do a dramatic pause there for a fact,

(01:50):
I definitely wanna thank the rest of the team throughout the year. It's been terrific fun to work with you all, and I'm looking forward to having more great great adventures of our weekly in 2025.
But without further ado,
let's get right to it.
And we are talking about now for our first highlight,
a very key piece in the world of the R ecosystem,

(02:12):
in particular with package repositories
that has had already a wonderful effect on many parts of the r community.
And with this recent news that we're about to share, we think there is even bigger things to come for this project.
What are we speaking about? This is the R Universe
project that has been led by R OpenSci

(02:35):
and their, chief engineer,
Yaron Ohmes, who
has shared with us and as long with the R Consortium blog
that the R Universe has been named
the newest top level project
under the R Consortium umbrella.
Your first question probably is, what does a top level project actually mean here, Overnon sounding really important?

(02:59):
Well, and technically speaking in the blog post, it is
a particular project that was now going to get
3 years
of funding from the ARC consortium,
and this is a recognition
of this being recognized
along with other projects that are currently
designated top level which include DBI,

(03:21):
the database, you know, common interface that many of the R Packages
involving with databases utilize,
as a key dependency.
The R Ladies initiative
and the R user group support program or the RUGS program helping to bring, you know, infrastructure
around having all sorts of R type meetups and community resources.

(03:44):
Our OpenSize R Universe is now joining
that effort.
And there's a great quote from the executive director of the R Consortium, Teri Christiani.
She says and I quote, the ability to find and evaluate high quality R packages is important to the R community,
and the R Consortium is pleased to support the R Universe project

(04:05):
with a long term commitment to help strengthen the foundation
of the r package ecosystem.
We are pleased to be working more closely together with our rOpenSci
on this effort. And there's also a great quote quote from the new executive director of rOpenSci, Noam Ross.
He also touches on, you know, the importance of our universe

(04:27):
and their excitement to work with the our consortium
to strengthen the infrastructure even more.
And we'll also have linked in the show notes. The the you might say the companion blog post to this from our open side directly,
from Noam Ross and Yaron who co authored
it. It highlights another point that I do want to emphasize

(04:49):
is that our universe,
yes, on its own is already bringing this revolution
of a, you know, a package repository
that's powered by a lot of DevOps principles,
a lot of automation,
and very very intricate infrastructure
to help you as a package author
give your users a way to have not just a source of a package available, but binaries that are compiled,

(05:15):
and more recently even WebAssembly compliant version of your package. That's a huge win right there.
But our universe is
now serving as the foundation of newer efforts
in the r package management ecosystems.
There is a new effort and I must stress it's early days,
but there is some excitement around the R multiuse

(05:38):
prod multiverse project, I should say. This is being headed by a team of people including Will Landau, the author of Targets,
and this is building upon
our universe's infrastructure on their automation, their compilation of binaries
to help bring a transparent
kind of governance
that combines

(05:58):
with our our universe's infrastructure
for another type of package repository.
Certainly, industries are paying a very close attention to this. It is early days,
but it just goes to show that the Our Universe platform
could be used for more things than just Our Universe itself. And that's what we're probably gonna see

(06:19):
in 2025
and beyond.
So I am thrilled to see this, you know, more, you can say, rigorous or just more robust backing of the project.
We know that times can be tough financially for certain vendors. So the fact that now,
our open side can count on this additional funding for this particular project,

(06:39):
I think it's gonna be a huge win for your own and and the rest of the our open side team as they make, our
our universe even more of a awesome platform
for a new and and older package authors alike in the our ecosystem.
Yeah. Eric,
one thing that I will say is a quick call to action as you get toward the end of the year and this isn't to to my organization's

(07:03):
own horn or anything like that but we donate every year as a charitable contribution
to the our consortium into our open side
if you work at a company
and I think that it would make sense and you use our there and it benefits your company, I think it would make a whole lot of sense to ask them to do the same thing. This is the season,

(07:25):
of a lot of corporate charitable donations, so it might be an easy yes and I think it's certainly
worth asking the question considering how much some of these projects have given back to a lot of us. Our universe is incredible. If we think about what we sort of had to do before our universe,
there was really no searchable

(07:45):
way to figure out what sort of packages
are on Korean. I mean, you can do it through, I think some of the API tools that'll give you a list of all of the packages on Korean right in your console.
And then you could do some filtering, but there's no interactive
GUI, very visual,
hyperlinks to all of the vignettes,

(08:06):
with a hyperlink to the GitHub repository,
with graphs that show when the most recent contributions were and who the contributors were it's just this fantastic visual medium
uh-uh to be able to take a look at all of the different our packages that are out there that at least have been picked up by the our universe project and

(08:27):
the scale of it still
blows me away.
This this whole infrastructure that your own and others has created
is it's mind blowing that you know the amount of automation that's involved here,
you know, the the way that packages are are being built as binaries
instead of users needing to, you know, install from from GitHub

(08:48):
and actually build that package themselves.
It's it's absolutely incredible.
And to think that there's a a multiverse project out there and more coming on top of this is really really exciting to hear. A question for you, does does multiverse mean multilingual
or are we still just talking r?
It is still focused on r,

(09:08):
but,
I actually don't have the full genesis on how the name came to be, but I think it is trying to
bridge, as I said earlier,
these, you know, concepts that can be derived from certain aspects of, say, CRAN
versus certain aspects of our universe, and we're blending them together in a way that kinda takes the best of both worlds, as I say. And, again, this is all very early days. You'll hear more about this in 2025,

(09:37):
but it is, you know, taking
the the best of additional frameworks here and and certainly our universe.
Without our universe, our concept, like, our multiverse is simply not possible because
can you imagine, Mike, you and I, even if we had a lot of funding spending this up on our own, like, the amount of engineering it takes to build this

(09:58):
in a way that now, like I said, can be built upon with its API infrastructure, with its
automation, whether it's powered by GitHub actions or other slick services,
just the amount of attention to detail that's been laid here. Yeah. We are we are very thankful that this exists at all.
Yeah.
I can't imagine how much the government would have paid a big four consultant

(10:24):
to put together a project like this. I think what your own has done, government dollars is probably
probably 1,000,000. But
Sort of looks that way when you look at the UI, doesn't it? Because everything is so polished and,
even looking at you know, you can have we we often say in the art community, we have these, you know, verses of packages such as, of course, the tidy verse. And in my industry, a a suite of packages called the pharmaverse.

(10:50):
I have an entry to the pharmaverse right on this page. And when I click it, I immediately see all the packages
inside that. It's very searchable,
very you know, like I say, you can put an API in front of this should you wish. There's so many different ways to get to the interesting parts of a package, including the documentation,
which is rendered on the spot in this platform and is absolutely

(11:14):
the the attention to detail cannot be understated. So I I expect there's gonna be more really big things coming
to this platform that, again, our universe itself will will be a front runner to this, and then other projects like multiverse are gonna hugely benefit from this too.
This is so stinking cool. I'm looking at the the pharma verse sort of landing page in the our universe under underneath our universe, and it's it's really

(11:42):
really cool that the visuals that we have access to just the amount of of automation and ease of use for,
working with our tooling and our packages out there and that the way that things are really beautifully organized and documented here is is fantastic. And I guess just to go back to to an earlier comment, and please don't fire me from the R weekly highlights

(12:04):
podcast, but if I could speak it into existence,
it would be pretty cool to have something like this that could service both R and Python
packages.
You know, if we think of we're doing a lot of work building
both types of packages for the same sort of function. If you think about what POSIT's doing with, like, both GT and great tables, it's pretty much the same functionality just serving both user bases

(12:29):
and we've been doing a lot of the the same lately,
and to have sort of one place where maybe the binaries are already built as well to install those things would be be pretty cool. That's why I was thinking multiverse might be multilingual, but but we'll see. Just throwing that out there.
Yeah. Who knows? We'll take it. And, admittedly, I may have stepped into it a bit on

(12:52):
on Mastodon a little bit when I couldn't resist taking the bait from, Bruno Rodriguez when he was talking about his, disdain
for managing Python dependencies
and educational projects. And I had a, you know, rather snarky comment about this is one of the reasons why I try to avoid Python. It's going through those nightmares.
But hey. You know what? They could learn a thing or 2 from our universe. I'm just saying. Because I don't say anything like this with Pypie or anything of that sort, but I think a lot of people,

(13:22):
you and me included,
I do a little share of Python on the side here and there,
would love to have this kind of curated resource,
for multiple languages. I I think who knows? Maybe you heard it here first. Maybe we'll look back on this a few years later, and we'll say, hey. We're the ones that spoken in existence. Who knows?

(13:42):
We can hope.
Well, Mike, as usual, you kind of have a crystal ball. We were talking about multi language type,

(14:04):
situations. Well, I think our next highlight is very much
about a new product, an IDE, that is very much trying to be a multilingual
data science,
IDE powered by the latest innovations
in software engineering,
and we are speaking about Posit's new IDE called positron,

(14:25):
which had its beta earlier this year. And then, of course, there's a huge focus in in quite a few talks at the recent positconf.
But that, of course,
posit has made the RStudio ID for many years, and it's got a lot of engineering behind it, a lot of commits, a lot of features behind that.

(14:45):
So you as the user who may have been using our studio for many years or even a few years, and you hear about positron, you may be thinking,
I wonder if it's time for me to take a serious look at this. Well, there have been a few,
few posts that have addressed this, but the latest post I think is, got some real nice insights here. This is coming to us from the jumping rivers blog, in particular, offer by Theo Roe. And the post is titled

(15:13):
positron versus our studio.
Is it time to switch?
Now, of course, we always throw the caveats. This is certainly a subjective
decision,
but I like what this post is doing. It's kind of laying the facts down on what's currently available
in between both environments and kinda comparing and contrasting
different aspects of what you, as an R programmer,

(15:37):
expect to see or have to utilize in IDs of the past.
So the first thing right off the bat we wanna mention is that Positron is not solely focused on r. It also has
support for Python
and Julia,
but also additional languages
that wanna come to play, so to speak. There are ways

(16:01):
through its APIs under the hood for a new language to talk to its engine, so to speak. And we'll talk a little bit about how it does it with the r language in a little bit,
But this has been built
unlike with RStudio, which, let's be frank here,
is predominantly an r based IDE

(16:22):
with a little bit of Python here and there with Reticulate, but you don't need that with Positron.
If you have Python available,
if you have R available,
Positron is gonna pick it right up. And there have been many, many people, you included Mike, who just mentioned you may be switching between the frameworks
for a given project.
In positron is just a toggle in the upper right corner away from switching from an R environment

(16:47):
and R interpreter
to a Python interpreter.
So that is already, I think, for those that are operating in multi language projects,
a huge win in the positron favor.
There are other things that you might have to get used to in positron that maybe
we're kind of bolted on to our studio later in the game, so to speak.

(17:09):
Such of one example is the command palette.
If you use IDEs like visual studio code before or Adam before that, you may be used to the command palette as a way to quickly bring up a well, it looks like a little search box. You You start typing a few keywords and it will auto complete
to a particular command based on what you're searching for, such as maybe

(17:31):
adding a new git commit or bringing up a new file, rendering a new app, or things like that.
RStudio
never came with a command palette until probably about a couple years ago or so.
Now there there may be, you know, those that might say it doesn't quite feel as native in RStudio as it does in Positron.
But in Positron, you kind of have to get used to it because that's the way to really unlock most

(17:57):
of the functionality in an IDE
of a Positron
is to interrogate that command palette to get to what you need to run or what you need to open and things like that. So there are ways you can bind additional shortcuts to that,
but it is something to get used to alongside the way that it handles settings.

(18:18):
Those are a bit different too.
Rstudio, there was a way to find them in, like, your in a config file in your home directory or somewhere else.
Positron is similar, but it's kind of agnostic
to,
you might say the language being used inside. There are plugins
or extensions based on languages
from time to time,

(18:39):
but it's basically a JSON file. You can get to it
either using the interactive
setting toggle in the editor
or just editing that JSON file directly. You get the kind of choose your own adventure with that. It would trip people up in the RStudio world sometimes to figure out exactly where that file is stored
and trying to edit it outside of, like, the GUI elements

(19:02):
of the settings. So that might be advantageous to you if you wanna really customize
your experience with Positron
but do it through, like, a file based way.
Other things to watch out for or there may be a benefit to you depending on your perspective
is that in positron,
you're not necessarily
gonna have to spin up what are called those r project files. So the file is starting in or ending in dotrproj.

(19:30):
Our studio used that extensively
for things like setting up a git repository
in a project, setting up a package,
other other uses like that.
In positron,
It's just the folder.
You can have what's called a workspace,
set of settings in that folder, which basically are a way to customize settings per folder if you wish,

(19:53):
but you're not gonna need that dot rproj file to tell positron
that you're working in an r project when you work with that.
There have been some people that really like that file, and there have been an equal number of people that really don't like that file in the repo. So that might be helpful to you if you've, you know, had some angst on that in the past that Positron

(20:13):
is kinda doing away with because, again,
they're building upon an open source clone of Visual Studio Code. They didn't they're they're piggybacking off of another effort with a lot of toying on top of it. Whereas our studio was built literally from the ground up to be a first class r based, data science editor.
A couple other things I'll mention before I turn it over to Mike here is that

(20:37):
the layout will look a bit different.
I've been used to the layout in Positron because it does have a lot of similarities to the Visual Studio Code layout
that I've been using a lot of my open source projects, but you will often see in a default layout in Positron,
the file pane is on the left,
and this file pane

(20:57):
has a lot more going for it than I think the one in RStudio does, and I don't think I'm upsetting people when I say that. There's a lot more you can do
in the file pane and positron
versus what you could do in the RStudio
file pane, such as even just expanding
folders with, like, the toggles to expand the nesting or whatnot.

(21:18):
Little things like this can add up for a bigger project. Trust trust me on that.
But then you also see on the left side your extensions in another tab,
your,
Git,
integrated Git console in the other tab.
There are people that like the the visual visual studio code or the positron

(21:38):
way of doing Git versus the way our studio does Git. Your results may vary depending on where you fall on that fence, but it's all right there. There isn't anything special you have to do for it. It'll pick up
Git right away.
But with that, there might be some things that aren't as intuitive in the beginning. You just have to play a bit a bit.

(21:58):
But with the extension ecosystem, you can supercharge your Git experience with extensions like Git Graph.
There are other ones called Git Lens that can really do some slick things for your command pallet and Git operations.
There's there's a lot going on here.
The other thing I'll come in on that I think is something you wanna look into

(22:19):
is the data explorer and positron, which is something the visual studio code does not have. This is something new
that posit created in this version of positron.
You get a much more richer experience for
when you're looking at your data frames
to sort the columns by multiple columns if you wish.
The filters are gonna be much more intuitive to work with because they're gonna be

(22:44):
across the top of your of your data frame
and instead of above the actual column itself. So that way you can navigate them much more quickly.
And it can handle larger data sets about, you know, causing your IDE they'll wait by 5 or 10 seconds although the the snapshot arose.
So there's a lot of engineering behind that I've heard from the previous talk, so it might be worth a look.

(23:08):
If you really find yourself using that data viewer in RStudio a lot, I think the positron data explorer is something you where you wanna
take a look
at. There are some things to be aware of that just aren't gonna feel as native right now,
such as the use of add ins that RStudio used to kind of give you that additional functionality in the editor without

(23:30):
building it into the editor itself.
I'm hearing there isn't like a direct one to one to use those just yet.
Although it makes me wonder because
with the our extension to visual studio code that I use for many years,
there were,
features that were added to extension a couple years ago
to leverage add ins in Visual Studio Code. So I know it's possible, but we'll have to see if positron adopts that or posit adopts that for positron

(23:59):
in the future.
So with that, any
additional functions that use that Rstudio API package
that was often used on the back end of Rstudio itself to kind of interrogate features of the ID, that's not going to work well either.
And Rmarkdown,
you can do Rmarkdown in positron but it won't feel quite as native

(24:21):
as it will in, the Rstudio ID.
But if you moved on to quartile that may not be an issue for you because quartile, however, has very first class support
in positron.
So the question might be have I switched?
I am not fully switched yet, and I'm not saying that because I'm using RStudio a lot. It's because I'm still using Visual Studio Code a lot because I have so many workflows that have built upon it. However,

(24:49):
now that
Bruno and company have figured out a way to get positron installed on Nix systems
yes. I have positron on my Nix system. So I am trying to use it. I'm trying to adapt my visual studio code workflows into that.
Mostly going well.
The thing I miss the most
is the dev container stuff. I live off the dev container feature that visual studio code has.

(25:15):
Unfortunately, there's no easy way to get that in the positron because that's a
Microsoft
specific extension.
It's not an open source extension. That's another thing to keep in mind.
There may be a few extensions in Visual Studio Code that do not work in Positron
because it's using the open source extension registry,
not the Microsoft specific one.

(25:38):
So with those caveats in mind, I think it has tremendous potential. It is still in beta.
So your results may vary depending on the project you like or your entire use of it. But the foundation is there
to carry out positron, to really carry out its vision, to be in a first class multilingual
environment. So I'll be watching it closely,

(25:59):
and I'll see where it breaks and share when it does.
I'm gonna continue to watch it too, Eric. I
have not made the jump yet, and I need to. I need to start exploring it a little bit more. I am very locked into Versus Code and and dev containers
and I guess I'm gonna blame you,
as opposed to myself.
I'm gonna deflect here and say that, you were the one that started me out down that journey that has,

(26:25):
locked me into that tooling for right now. I'm just kidding. It's
hugely hugely helpful. But,
the command palette,
you know, we should talk about the the fact that positron so closely
mimics Visual Studio Code. And I think for a huge
proportion of data scientists out there that are
more comfortable

(26:46):
in probably our studio
than in a, you know, more developer type environment
like Versus Code, Positron is a perfect bridge
between those two things, in my opinion.
I think it starts to bring in some of the best elements from something like a a v s code or, you know, like a full stack developer platform,

(27:09):
into
a an environment that RStudio
native users might be a little bit more familiar with.
You know, the command palette being one of them. I I know that there was a command palette that existed and it and exists now in our studio, as you mentioned, for the last couple of years, but,
it's

(27:29):
not quite as obvious, if you will, as the command palette that exists in in Positron. That's sort of, you know, a little bit more obviously put in front of you and drives a lot of the functionality,
of of the IDE itself. You know, another thing that I think is a benefit and one thing that I love about Versus Code as opposed to RStudio

(27:50):
to be able to sort of search all of the files in your projects,
there is a Find in Files button in RStudio
underneath edit and it it's handy. Works very very well. It takes a couple clicks,
you know, or if you know the keyboard shortcut, you'd get to it a little bit quicker. But there's a giant magnifying glass in the left hand sidebar in Positron and Versus Code that allows you to

(28:14):
immediately
do that. I think it's it's much quicker, and these differences are subtle because I I think the functionality still exists in both places,
but the UX is just a wee bit better,
in in Positron
than in RStudio that I I think, you know, that slight difference,
makes all the difference to some extent, if you will.

(28:37):
I think a really interesting thing is, you know, the lack of a need for dotrproj
files or our projects
you know this is again sort of moving folks away from
our specific workflows
into
you know slightly more developer specific workflows and understanding how to interact with working directories without

(29:00):
running set working directory if you can help it right
so I don't know how this impacts the the here are package that was developed by by our studio and now pause it I think it'll still work fine because it'll look at your working directory
as well and I think it'll create relative links to that, but I know that some of the functionality of the here package,

(29:21):
actually looked for that dot our project file if I'm not mistaken,
and sort of recursively search to be able to find that to figure out, you know where the the working directory was that needs to be set
in order in with respect to all the other files that you want to work with so I'm not sure how that plays into this whole positron

(29:43):
IDE
e in the event that you know you you have a lot of workflows that depend on here
and maybe it's a little less stroke straightforward to create a new our project although I do see in a screen shot here under the workspaces our project section
it's very faint but if my eyes aren't deceiving me there's actually a button that says new project

(30:07):
under our so maybe that takes care of that that for you I know that you know you may not necessarily need to do that, but if you want to do that and you have a lot of,
you know, workflows within your organization and your team
that leverage our projects and all of the different functionality that that comes along with that and you wanna continue to do so, it looks like the functionality is still there for you. So that's just something to something to watch out for as you move from RStudio to to Positron and sort of decide on which

(30:37):
pieces of of functionality in your current workflows you wanna
continue to to leverage in which you you may wanna change,
to evolve, you know, your team's practices, if it makes sense to do so. But I thought that this was a really nicely comprehensive
blog. I I do think that one of the strengths here that we're going to get is the vast, vast ecosystem,

(30:58):
of extensions within that open
VSX
repository or community of extensions, whatever you wanna call it. I know that there's a lot of RStudio add ins but I can pretty much definitively
promise you, that the number of extensions in the OpenVSX
ecosystem
probably is is quite quite larger than that.

(31:18):
One thing that
I
absolutely am envious of for the Positron
users who are already using it and and something that'll probably push me over the edge here to start using it is the data viewer. There is one in Versus Code. It leaves a lot to be desired.
Obviously, the put it nicely.
I know.
Obviously, the data viewer in RStudio

(31:40):
is is is great. You know, it's it's geared towards data scientists and exploratory
data analysis.
But this is
what we have in positron is is the Rstudio data viewer on steroids I would say you have you know column
level summaries
summary statistics
including missing values

(32:01):
in the left hand sidebar
while you're viewing
your
data, in the majority right side of the screen as well as the filters that have all been applied that can easily be, you know, added to or removed,
through a click of a button kind of along this nav bar at the top. I think it's a fantastic
UI.

(32:22):
I think it's really super powered data viewer compared to anything else that I've seen today.
I have seen some products like this that are just, just, you know, standalone
SaaS platform data viewers, if you will. But to have this in our IDE, in the same place that we're doing our development work is really really exciting.
So, you know, excellent job by jumping rivers. I think to summarize, you know, the trade offs here and the benefits that we have from the Positron IDE. And I'd encourage anyone that hasn't

(32:52):
had a chance to check it out yet,
self included,
to
to check it out as soon as you possibly can.
Yeah. I mean, certainly, it's becoming easier to install as I suffer even, you know, major geeks like me and others. Now we can install it on next. I have put it through the paces a little bit. I started theming a little bit. You're right about that extension ecosystem. There is something for everything. And one of my

(33:16):
favorite extensions I was using in my live streams way back when, turns out it's available in the open source one, they call it power mode, where when I type, I get these nice little, like, explosion
sparks happening
next to the words as I type just to give a little flair. And I was like, there's no way that one's on there. Oh, sure enough. It is. So I can replicate some of my live streaming experience in positron that I had for my shiny dev series stuff from a while back.

(33:41):
I will also have links to a couple additional posts, some of which have been covering highlights before.
One of those was some Athanasia Mowinkle, and she talked about her experience of Positron.
I hate to say it looks like that here package or project stuff isn't working as nicely in Positron as we're hoping for according to hers. So I'll have to see if that gets better

(34:02):
over time. And as well as a post by Andrew Huysse who also has used Positron a bit and gives his 2¢ on his favorite extensions and the customizations
that he's done to make it, you know, the experience more fit for his
workflow. That's a key right here. Right? Is that Positron is already
giving you a lot of nice, you know, all out of the box

(34:25):
configurations
but there's nothing that's locked so to speak. You can tailor that to whatever you see fit and with the power of the vscode
you know open source code I should say foundation
you can do all sorts of things. You can get your VIM key bindings. You can do all sorts of interesting
ways to make that your own experience.

(34:45):
So
I will admit at the day job, we can't really use,
positron yet because it's not part of the,
posit workbench enterprise product just yet. They're obviously gonna
posit's not gonna put that in until it's production ready, so I still have to wait a little bit for my work projects. But sorry for my open source stuff, I'm gonna give it a go and

(35:07):
report back on what what breaks and, hopefully, what works even better.
You know, Mike, we are in the you might say now we're starting to get into the doldrums of winter, but

(35:32):
our next highlight has a very interesting title to it because it speaks on a few different levels.
This is
talking about one of our keynotes
that was given at the recent R pharma 2024
conference
by
Opposits
CTO, Joe Chang himself,

(35:52):
on the new tooling that's coming to the our ecosystem
with the realm of artificial intelligence and interacting with large language models.
The talk was affectionately to titled
Summer is Coming
AI for R Shiny and Pharma.
So we have talked about some of the new tooling already in previous highlights of this show when we spoke very highly about the Elmer package,

(36:17):
which is a key focus of this talk, which is giving you
in the our ecosystem,
very, you know, robust compliant way to call different LLMs both
in, you know, third party services like chatgpt
or Claude
or our others, as well as self hosted versions of it,
as well as the accompanying package ShinyChat,

(36:40):
which gives you a way to bring that LOM kind of console experience
into your Shiny applications and build upon Elmer to do that.
This talk was a tour de force of a few different concepts, but I wanna set
a little bit of context here because,
first, I have shared on this show. I've been a skeptic as anybody about kind of how AI can be put in directions that shouldn't be put into. It can be almost nauseating

(37:10):
scene.
Some of the fluff that's put out there on cough cough LinkedIn about some of the weird uses of it.
But guess what? I wasn't alone in that skepticism.
Joe Chenk himself was very skeptical of this. It took him a while to warm up to this.
He had some epiphanies earlier in the year

(37:30):
and combined with, you know, getting to know
that the AI tooling as as, as you see it,
there's more than meets the eye to steal the frames from transformers
because you can build on top of these services. And that wasn't something that came obvious to him right away.
But this talk is first introducing

(37:51):
again, the aforementioned new tooling
in our, the Elmer package and the shiny chat package
with the demonstration
that was lifted from
deposit conf talk he gave where we have what's called the restaurant tipping Shiny app,
where instead of the app developer having to build a whole bunch of sliders,

(38:12):
select inputs,
toggles
to try and be proactive,
so to speak, on what the user wants to do to explore the restaurant data.
There's, like, a chatbot on the sidebar where the user can type in
a key question like, you know, what is the average tip rating for males in this year or whatever?

(38:33):
That is a prompt going to an LLM to translate that prompt into a SQL query
and update the Shiny app on the spot.
That was
an eye opener for me when I first saw that.
And then when Joe had mentioned that he was going to give this keynote at our pharma,

(38:54):
he, you know, had had a quick call with me to ask, you know, what what can we do to make this a little more
relatable to the life sciences folks because, yeah, we all love restaurant data. But this audience in particular, we can be pretty skeptical of things. Let's put it that way, and we we often have to be right. It's a very high regulated industry.
So I gave him a little seed of what if we take part of the shiny app that we did for this, our consortium submissions working group, where we sent

(39:24):
a traditional Shiny application with a few different summaries
to our health regulators
as a as a way to prove it out that we could we could send a Shiny app for a submission.
There's a portion in that app, I'll have a link to in the in the show notes,
where we have a survival type plot of time to event, so to speak.

(39:45):
And we had a couple sliders and toggles that were built by the teal package to explore that data going
into the plot. I thought,
why not have that chatbot
in this display?
So
Joe, to his credit, it only took him about a week to do this, but he spun up another demo for this presentation
to take that Kaplan Meier interactive visualization

(40:08):
built with ggplot
and put a chatbot into the left of it so that we could ask similar questions on different partitions of the data,
and the plot would update on the spot.
New sample sizes, new distribution curves, or survival curves.
Amazing.
This to me, to be, you know, putting my head my spin on this,

(40:29):
is a very intriguing feature when we start looking at data reviews and hopefully
finding ways to get to insights more quickly, but in a controlled way.
When I say controlled way, that's another part that Joe emphasizes here
is that the way this is all built is a very intelligent

(40:49):
yet diligently structured prompt
that's going to the chat server or the LOM
when the app is launched so it has a context set correctly.
Now correctly may be
a a strong word here because
nothing's ever absolutely perfect in this realm of LMs,
but it's trying to control the possibility

(41:11):
to its best extent
of the l m giving complete nonsense
to the result coming out and telling it to do a SQL type query
that is getting gonna be used to filter the data going into that plot,
kind of a translation layer on top of it.
The other key concept is that these packages

(41:31):
are leveraging
another functionality
that you might need if the LOM can't do everything on its own. That's a concept of tool calling.
Another eureka moment in my mind
where maybe in the example you gave in a talk, you ask an LOM what's the what's the current weather in California.

(41:52):
It may not be able to do that on its own because it kind of needs to have an interactive way to look up that weather at a given resource. So
Joe's example was giving it access to an r function
that calls an API for weather data.
Having the function be documented with a parameter like the city name or whatever have you,

(42:12):
the LOM calls that function
and then takes its result and it gives it back to the user. But it's kinda like that assistant
to the LOM to get the job done.
That was the overall moment to me is that we don't necessarily have to be limited
by just what the l m can do on its own.
We can augment it with other services, other ways that if you can code it up in an r function,

(42:38):
you might be able to use it as this tool paradigm
with what Elmer can do to call to these LOMs for you
on your behalf.
And then the last part of the talk was talking about some of the practical considerations,
and there are quite a few.
I think the the parts that show me that he is still grounded in this.

(42:59):
It's okay to say no, folks. If they've given you results that don't make sense,
it just may be time to move on to a different solution.
But putting in use cases where the answer is not always so black and white,
it may be more of a layer to get to a final answer
where you have a little more flexibility,
but also keeping a human in the loop,

(43:21):
which, again, in my industry, you better believe we better keep humans in the loop when we look at look at these results,
there's still a lot of productivity
gains to be had
if you can harness this the right way. But, again, a very
realistic talk. Again, he's excited about the tooling,
but he is being realistic too. This is not gonna solve all the world's problems. It's not gonna magically put our drugs on the market, like, half the time as it currently takes.

(43:50):
But I think this can greatly help certain aspects of development
such as the way we produce either applications
or produce
our tooling to interpret this data,
get us the insights more quickly.
There is a really robust q and a after the talk. I had the pleasure of moderating that, but then we'll also have linked another

(44:10):
dedicated talk from our APAC track
of our farmer where Daniel Beauvais
led a q and a with Joe Chang himself who actually called in later that night around midnight his time to join that call just because he was so passionate about connecting with the Asia Pacific colleagues on it, and there's some great q and a in that in that session too. So

(44:31):
am I still skeptical?
I won't lie. I'm still kind of skeptical of certain things.
But what Joe gave me in this talk
was a way to show that, like I said, there's more than meets the eye for how he can leverage these l o m's
and the tooling in front of them to craft a solution that I think is more fit for purpose
to your particular needs

(44:53):
and cut out all the noise you see in the various social media or other, tech spheres.
I'm I'm very aligned with you, Eric. I think that the way that Joe is approaching these concepts and the way that Pauseit, in general, is building this tooling out,
I think is
is fantastic, and it and it aligns with sort of what I would hope for. I was I was quite skeptical, and then, Eric, I know we were both at Positconf

(45:19):
this past year,
and I watched that keynote talk, that hour long talk from Melissa Van Bussel
on practical tips for using generative
AI in your data science workflows,
and
it changed things for me a little bit.
It was very applied.
It was very geared toward the audience.
And,

(45:39):
just to be honest, there were a lot of things that she had presented that I didn't know were possible.
And I thought that I knew everything that there was to know about about AI and I I thought that the cons outweigh the pros, but
that talk in particular
sort of brought the the pros up to to maybe as even with the cons, maybe if not if not more,

(46:01):
and maybe want to start to look into things a little bit more in this space. Try some things out and tune out, like you said, maybe some of the LinkedIn
narrative marketing hype BS that's that's out there right now. I don't seems like
AI agents
is is all I'm hearing about these days. I don't even know what an AI agent is. I don't really care to know either. Agentic or or whatever. But, yeah, another mind blowing thing. And, you know, hats off to you. Did a fantastic job moderating this talk.

(46:33):
It's
well, well, well worth your time, if you're in the R or data science space and and trying to make heads or tails of these l l m's. And I think that the tooling again that Joe and the the team have put together
for us is really really cool. I mean you can't watch it and and say that it's not super cool or super interesting. Whether or not you wanna leverage it is totally up to you and your use case. But some of the possibilities that we have here are really really cool. And thinking about

(47:03):
these large language models is maybe a step in the workflow,
and their ability to call, you know, another process like an an additional
API
is really really interesting
and work with them. I know that recently,
the open a I,
if it can't find the answer to your question and it's training data, I believe it can like execute a Google search,

(47:29):
or a web based search and then sort of fairly quickly execute that search,
process the results that are coming back,
you know, maybe just looking at the first few links and trying to crawl over them and and leverage those as the context that it's using to try to answer your question, which is
is pretty incredible.
I know that that, you know, just as a tangent here, while we're sort of still on the AI topic,

(47:52):
the SoRA model was released from OpenAI
yesterday,
I believe,
which was a long time coming and that is supposed to be text to video.
So, you know, check that out. I would recommend,
even if you're a skeptic and you you you're really really against this stuff, I think it's it's worth watching just to educate yourself and understand what the art of the possible is because the art of the possible is changing every day.

(48:17):
And we're trying to do a lot of thinking
at Catchbook about how we are going to integrate these
into the Shiny apps that we we develop for our clients in a way that that makes the most sense isn't going to just,
you know, involve our team going crazy
with all of this stuff to the point where we're just, you know, strapped for resources because everybody wants this. We're we're really trying to work hand in hand with our clients to figure out, you know, how and where it makes the most sense to

(48:47):
leverage this type of technology.
So the
videos like this and the tutorials,
that really take a practical
approach and hands on demonstration
about
how to go about injecting this functionality into your your Shiny apps,
are invaluable
for us. So a big thank you to you, Eric, and and Joe, for all of the time and effort that you've put into trying to do that for those of us on the ground.

(49:15):
Oh, I felt he he did all the the hard work. I'm just, like, are you are you kidding me? Is that even possible? I mean, it it is it is amazing to see
what we can do. And and, honestly, yeah, I I definitely had almost like a closed eye perspective on this. I just got
I got, you might say, perturbed too much by all the noise out there before really giving it a fair shake. But, like you said, we were sitting next to each other at Pazacom. That was step 1. And then step 2 was this talk because

(49:45):
now it wasn't just a quote, unquote, you know, fun toy example. Now it's like, okay. What can we do in life sciences that will open some eyes? And there are so many other areas we're pursuing too. It's not just, you know, quote, unquote, the interactive data
reviewing. There are many other realms of automation
that we wanna use to
make the mundane more

(50:07):
done more quickly
and hopefully find advancements
in training
these models or giving it the prompt to train itself kind of on the fly.
But that that there is another part of that talk that you all should see. Again, being realistic here. There's some areas that surprised them as well at POSIT,
such as when they try to use LOMs to help ingest their documentation,

(50:29):
their technical
documentation,
and then putting, like, maybe a bot in front of that. They had very mixed results on that, which kinda surprised them. Right? Because technical documentation,
that's literally the source right there. Right? You would think an LOM can ingest that and then immediately when they get a question about it be able to surface that more effectively. And I know others are looking into that as well. I thought about that area

(50:54):
for some of my internal documentation
because I write a great website on
using r and HPC environments.
It'd be great to have a little bot next to it that people can type their question on, and it's gonna use that doc to kinda help point them in the right direction without,
always emailing yours truly when something goes crazy. Not that I'm not that I don't like helping people, but, there's a there's a balance. There's a balance there. So I'm I'm intrigued to see where that goes for sure. Me too. Yep. No. Boundaries are important.

(51:25):
There are no boundaries, so to speak, when you see just the breadth of what's possible in this ecosystem these days, and I dare say they are with the issue.
Not to sound biased here, but I think we gotta build something for everybody here and all the the full gamut of
new packages, and there's a a good chunk of them in this issue. I'm I wasn't
I wasn't, I wasn't shy about putting all these great new packages in here as well as updated packages.

(51:49):
Some really interesting tutorials, so we'll take a couple of minutes for
our additional finds here. And
it's December.
If you like me like music and you're on social media, you're on. Lyc.
These the the Spotify
wrapped
post that often people are showing about their favorite
tracks that they've listened to in 2024.

(52:12):
It's always, entertaining thing to look at.
Well, Andrew Heiss, we mentioned him earlier,
kinda took matters in his own hands because, a, he doesn't listen to Spotify, and frankly, neither do I, but he listens to Apple Music.
So he has this great post called Apple Music wrapped with r, and he leveraged a way of exporting out

(52:33):
the metadata
associated with his listens from Apple Music and Itunes
because there's some somewhat interesting XML based files that you can correct from this.
Built some code via tidyverse type packages to process that XML data
with a little bit of intelligent,
like, time lapse, you know, summarizations.

(52:56):
And he was able to derive basically those key metrics that we often see in the Spotify rap and, you know, got some interesting results on what he's listening to and,
to the surprise of no one potentially. I don't know Andrew personally, but apparently, Taylor Swift is in his, his top, tracks to listen to, which I think many in the world would have that same thing.

(53:18):
This inspired me and not just on first Andrews as usual. Don't know how he finds the time to do all this and plus he
he,
wrangled some gnarly XML to make it happen.
But I've recently spun up some really intricate
self hosted version of my music listening.
I took a day during my break for Thanksgiving,

(53:42):
and I ended up ripping a whole bunch of my CDs that I bought when I was a teenager, music CDs,
onto my beefy little server here in the basement because I thought these CDs aren't gonna last forever. I might as well rip them up and put them on the MP Threes on my or FLAX, I should say, on my server.
But then I thought, well, it's great that I have them out of these files. That's no way to listen to it. There's gotta be a better way to have, like, a Spotify

(54:05):
like experience.
So I found this program called Navidrome,
which basically I can put in a Docker Compose and my Docker container,
serve up the MP threes from a directory,
finds the album art, finds the metadata
for the artists and,
you know, the track and whatnot.
And I can basically listen to my songs even in the web player,

(54:28):
which doesn't look like much to shout about,
but it's an API under the hood much like our universe has an API under the hood. And if you heard about a framework called subsonic,
if you have a Subsonic
compliant player,
you can basically tap into that service and put it on your phone, put it on your computer, whatever have you.
So that's great. But then I thought, well, it's not really keeping track of what I'm listening to.

(54:51):
Sure enough. It has a plug in for that too, combined with another project called Melaja.
Don't don't ask me how do you name these things, but it basically gives me a way to track
every time I play a song. But I keep that data in house folks. I ain't going to Spotify. They ain't going to,
what's it called last FM or anything.
So next year, I'm gonna speak this in existence. And Mike, you're my accountability buddy here. I'm gonna make a version of Spotify wrapped but completely self hosted with my geeky taste in music. So you heard it here first. So, Andrew, thanks a lot. You may have just nurse sniped me into another project.

(55:30):
How do you top that? I have no idea.
I don't know. And if for the folks that have listened long enough or or know us, there's nothing more we love than than music and nerding out. And when you can combine the 2 of those, it's bad news for everybody else. But that's a great that's a great find. I wanna call out a blog from, for any of the cinephiles out there. A blog from Mark h White the second who is a PhD, and it is about,

(55:58):
predicting best picture at the 2025
Academy
Awards.
As of he's updating this weekly with his predictions
on the win probability,
that he's seeing based upon,
the sort of critical reviews that are posted online.
And it looks like, the brutalist
is just ahead of Wicked,

(56:19):
in the ranking of
which,
movie is is most likely to win best picture at the 2025 Academy Awards. So really really neat little blog post. Nice little interactive,
visualization here at the top of it. And check-in each week on the blog to see who's winning.
That's awesome. Yeah. I know a lot of people like to do those predictions and

(56:41):
it's always, you know, somewhat fun, sometimes scary trying to predict what ends up being a very subjective
voting as the Oscars are or Academy Awards are. So, yeah, I'll be interested to see how that shakes out. And, like you said, very interactive, plotly visualization,
Plotly, another package. I should give thanks to Carson Seifert every time I see him for this because I use it in all my apps usually for shining,

(57:06):
but,
but there are other ones too. As we all know, we can't say Plotly without giving good kudos to echarts from our good friend, Jon Coon. So we're fair and balanced here on this podcast.
Fair and balanced, and I will extend an olive branch as well to, Kelly Baldwin
who wrote a fantastic article on her adventures with Advent of Code

(57:29):
using data dot table solutions.
That was awesome. I I'd love to see that see that. And,
I even saw this wasn't necessarily our specific on on Blue Sky,
a post about somebody
using DuckDV
to do the app and a code. Yes. DuckDV with SQL queries, and it's special.

(57:50):
I don't know how all of you are able to do this. I was actually talking to a few people earlier today about it. I'm
someday, I will do Advent of Code, but my goodness, I feel like I am so far behind. It's almost like imposter syndrome just thinking about it. So I love living vicariously through Kelly and offers as they do this.
I feel you as well. Someday, we'll get there, Eric. Yep. We can be accountability

(58:14):
buddies on that one too. But what we are accountable for, hopefully, is sharing, you know, what we find
so exciting about the our weekly project and this particular issue. But, of course, you can find this and all the other issues
at rweekly.org,
as well as how you can give back to the project. And the best way to give back is to share that great resource you found. Or maybe you created that great new package and you want the art community to know about it. We are a poll request away to use the GitHub language. You just find that little poll request, the Octocad icon in the upper right corner. You'll be taken directly to the template.

(58:52):
You can fill out the poll request right there. We got nice little template text and navigate with sections your resource should go in. But, again, that curator for that week will be able to merge that into the upcoming issue, and we love it when we get your contributions. It's always
a a smile to my face whenever I get the curation, and I don't see a 0 for poll request. This is one time I want the poll request. There's several times that you're dreading it. Not that I would know anything about that. Nonetheless,

(59:22):
other ways to get in touch is with us specifically.
We have a contact page that you can find in the episode show notes. You can send us a quick note there.
Also, you can send us a fun little boost in Podverse or Fountain or Cast O Matic if you're on a modern podcast app. We have details
for that in the show notes as well.
And we are on these social medias when we're not being drowned out by AI noise that you might see on various spheres. You can find me on Mastodon where I'm at our podcast at podcast index dot social.

(59:53):
I am now more recently on bluesky.
I am at our podcast
dotbsky.social.
That's a little addendum I should make. I have seen people put custom domains on that, and I need to figure out how they do that. I may be tempted to do that in the future. But, nonetheless,
that's where you can find me currently. And I'm also on LinkedIn. Just search my name, and you'll find me there, and I promise I won't send out garbage posts about AI on there.

(01:00:19):
But, Mike, where can the listeners find you? You can find me on mastodon@mike_thomas@phostodon.org.
You can also find me on blue sky atmikedashthomas
atbskor.bsky.social,
or on LinkedIn if you search Catch Brook Analytics, ketchb
r o o k. You can find out what I'm up to.

(01:00:42):
Excellent. Excellent. Always great to see what you're up to. And, you know, I I considered a badge of honor that I tune you in on to the death container round. I have no regrets about that in the least, buddy.
I am so grateful for it.
Yes. If I can get my day job and do more of that, but let's not let's say that on a positive note. This was a a great episode, I dare say, and,

(01:01:03):
we hope that you enjoy listening wherever you are. Again, we love to hear from you, especially as the year wraps up. It's always great to hear how your year has been in the our community and your journey with our end data science. We always love hearing your stories. That'll put a wrap on episode 189. That means we're 11 away from 200. And one way or another, we'll get there. But we will be back with another episode of our weekly highlights

(01:01:26):
next week.

All Episodes

Episode Transcript

Popular Podcasts

On Purpose with Jay Shetty

Stuff You Should Know

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Issue 2024-W50 Highlights

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}On Purpose with Jay Shetty

Stuff You Should Know

Dateline NBC

All Episodes

Issue 2024-W50 Highlights

On Purpose with Jay Shetty