Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:02):
Hello, friends. Did you miss us? Oh, we missed you. We are back with episode 191
of the Our Weekly Highlights podcast, our first episode
for the year 2025.
I was just telling my awesome cohost here I'll introduce shortly. It feels normal again. But, yes, my name is Eric Nance, and I am delighted that you're joining us from wherever you are around the world. This is the weekly podcast,
(00:27):
unless we had an extended break like we just had, where we talk about the great happenings and resources that have been shared in this week's our weekly issue.
And, yes, I was solo for a bit there at the end, but no. Not this time. I was right with the world because my awesome co host, Mike Thomas, is back here joining me on the mic. So how are you doing today, my friend? I'm doing great, Eric. It feels good to be back. I apologize,
(00:52):
to have you have been solo there towards the end of 2024, but I am committed and determined,
in 2025 here
to
get us at least to our our 200th episode. We are close to that milestone, and I'm confident we're gonna go well beyond that.
Yes. Every
there's always hurdles in the podcasting world, but 200 is a nice one to achieve. So we'll we will get there, and then the chips are far where they may, so to speak. We will we will find out. But
(01:23):
but as always, it's always so much fun when I don't do this alone. So I'm glad you could join me today. And, also, the Art Wiki project is possible because of the community,
and our curator team curation team is back at it after some well deserved rest.
And we got a fancy, updated calendar from yours truly, Debu, to keep us honest with it. But this week's issue
(01:44):
was curated by Batoa Amrazak, and she had a whopper of resources to look at because this built up over 3 weeks, I believe, since our last issue. So she had a lot to pick from our team that are awesome voting as usual. But as always, she had tremendous help from our Rwicky team members and, of course,
can the community like you around the world with your poll request
(02:06):
and suggestions.
It is 2025.
You know that this year, just like last year and parts of 2023,
there is gonna be a common theme in many of the advancements
in the tech sector. So we do have
some very interesting uses of AI to talk about, and you you're talking to 2 co hosts here that can be a bit skeptical on certain pieces of it. But I will say I learned something from each of these that we're gonna talk about here. And the first one we're gonna talk about
(02:38):
is a very extensive and and frankly,
enlightening
analysis
of
conversation sentiment and summarization
on the new Blue Sky,
social platform that we've been talking highly about. And in particular, this post comes from Stephen Turner, who has been on the f on the highlights feature previously. He is the head of genomic strategy
(03:01):
at Colossal Biosciences,
and he is no stranger to investigating
AI models. And in fact, some of his posts from last year were concentrated
on using the open source models. That's something I've been very curious about, particularly the llama models coming from Facebook and meta.
And he does mention,
(03:22):
you know, they are getting better. You know, there's been a lot of hype around them.
He does mention in the post, it's not quite there yet
matching the hype, which, again,
in this space, you gotta be careful. There's always a lot of hype going around this this area.
But with the recent development as of last year of having Wickham's Elmer package, which I've I've had a bit of time to try and we've covered on the highlights before,
(03:50):
it opens this new framework of being able to try not just open source models, but also, as Steven calls, the frontier
or you might say host of the base models. We're talking about, of course, like OpenAI's
chat GPT
and others,
including one that I don't have a lot of experience with, but the anthropic
(04:11):
anthropic,
easy for me to say, clogged models, which do get talked about quite highly in the world of development.
So he thought what a great you know, what a use what use case could be that he could compare how the models are doing,
particularly this anthrotic,
clogged model
with the llama models. So he thought, well, blue sky has now
(04:33):
got a lot of momentum,
a lot of great conversations to be had about our favorite language, the our language, as well as other domain specific science groups that are coming there.
So he thought, well, you know what? I am going to do an analysis of these blue sky posts
comparing
how well these different models actually summarize
(04:55):
a large number of these posts over a given duration of, like, months or even a year.
So what's the tooling that's gonna be involved here?
Well, with the momentum Blue Sky has had, there is a very awesome package. We may have mentioned it last year
called
atrrr
or attr. I don't know how to say it. That is authored by Johannes,
(05:19):
Gruber
that basically talks to the API behind the scenes that Blue Sky is using in their operations of how they kind of federate all the posts,
and he and Steven was able to grab a 1,000 of these posts that have been tagged with the r stats hashtag, which is again very familiar back in the,
(05:39):
you might call the glory days of the Twitter days where we all use the r sas hashtag to share our our happenings and our questions and our our finds and whatnot.
So
as always, there's always a little bit of cleaning to do after you get those posts. So use a little dplyr,
to clean up the the text of those posts and get it ready to be imp to be supplied to an AI model.
(06:03):
And by the way, all the code that Steven has produced here is shared via GIST, that we'll have linked to in the show notes as well. Very, well structured. Steven is is definitely a veteran of, robust r code, and he shows that in his, code here.
So once the blue sky posts are prepared, the text behind those,
(06:24):
then the next step is to feed that into the models.
So he's got
again, he's gonna treat this like an experiment, which I would expect someone like Steven to do,
where he's leveraging
4 models.
There there's a Claude model,
the llama 3.3,
gemma 2, and mistral.
(06:48):
So
and there may be yeah. We'll have to look in the results of this as we get there, but he's going to run the experiments on on each of these. How well does it summarize? And then
he's being transparent about this too. He want he will share
the results, the summarized results of these posts
(07:08):
via a GitHub gist and markdown,
but he's not just manually copying all those however many summaries are. No. He's using
the gister package
offered by Scott Chamberlain,
previously from his are open side days,
to automatically put that on as a gist. So that that's some pretty slick stuff. So I'll have to take tabs on that for future reference.
(07:30):
So
the results.
This will either be surprising to you or not surprising to you.
I will admit I'm not too surprised based on what I heard.
He does say that the Claude model, the Sonnet model in particular, was
by leaps and bounds
much better at summarizing the content of these blue sky posts
(07:52):
than the open source versions of these.
Well, you know, and when you run these hosted models, there's no such thing as a free lunch. Right? You have to buy the tokens, if you will, to leverage these API services, so you are gonna pay a little bit depending on how much you use that. Now
I I he does say the cost of it, which, again, luckily in this case is not breaking the bank. It's basically pennies on the dollar for this particular case. But let's say you scale this up at a at a company or whatnot. You've gotta you gotta watch this this sort of thing. But
(08:25):
he does have an excerpt
of the summary produced by Sonnet in this post, and it is
very you know, he's not wrong. This is a very concise summary that looks to be pretty on target for the time period that he analyzed in these blue sky posts.
And, Mike and Mike and I were just talking about the use our conference that's about to happen as part of the summary here that's gonna be held in Duke University in in 2024.
(08:52):
Talking about some of the notable themes,
such as visualization,
tutorials,
generative art, mobile development.
I think we know where the mobile development one came from. That was when Colin Faye made the splash late last year on the mobile app powered by randshiny that's now available,
which is included in this summary,
(09:13):
points about community engagement,
the the hashtags that were used in these posts, notable authors.
I mean, that that's top notch stuff, as well as the top posts,
the top ten, if they if you will, that were that were, authored here and such as Hadley's post about
posit joinopensourcepledge.com,
(09:35):
as well as others from Daniel Navarro,
and others included.
So you'll have to go to the gist that he made to look at the rest of the model summary the summaries from the other models.
He does give a preview that they are quite different than what Blue Sky authored,
going from just, you know, coming looks like something that anybody could summarize just in high general, high level overview,
(10:01):
or just listing summaries from each post individually instead of, like, the whole collection of them.
So,
unfortunately, this is consistent with what I'm hearing from others I trust in the community that
the open source models are gaining traction, but there's still a gap in terms of certain domains and what they can analyze or summarize well and what they can produce well in these models.
(10:26):
There is some bonus material in this post, but there is another framework for running LOMs in R that I'm not as familiar with. It's called MALL from the ML verse.
An interesting way to get predictions
powered by large language models,
not just, like, summarizing what's there, but actually predicting content
(10:47):
as well. And he shows excerpts
based on what he
did to predict
the top 10 most liked posts from the past week
and then also translated into another language on top of it. So,
pretty slick stuff, I must say. I never tried anything like that, but that's a good,
that's a good way to do the translation and also do a sentiment analysis
(11:12):
on top of that, which,
spoiler alert,
a lot of positive vibes on blue sky. Now that let's hope that continues because I I like to have a little positivity in my life.
Nonetheless,
another
applied type problem that was tackled here by Steven, really starting to see where are the trade offs
between the host of the models, the open source models. It's certainly my hope that the open source models do catch up because
(11:39):
there are certain things I want to leverage AI for that I don't really feel comfortable
throwing over to a hosted provider, and who knows what they're gonna do with that information.
I want to be in control of that. I'm starting some self hosted things where I am in control of that. But, you know, as of anything in open source, you know, innovation can take time. The iterations are you're probably gonna have to be a little bit more
(12:03):
before they can be on par with these solutions. But Steven does a great job, again,
summarizing his experiment very well. The code is well well documented, well commented,
and, yeah. Some food for thought for sure. But, at least the positive vibes have been, you know, proven, if you will, in this experiment that they're they're there and some great content that that we're seeing here.
(12:27):
So mixed reaction for me in terms of the endgame of it, but, nonetheless,
I think, a a great analysis to kick off the highlights here.
Yeah. Absolutely, Eric. And I think you can probably hear the skepticism in both of our voices around this type of,
content for a while now, but I think we'd also be remiss to,
you know, not
(12:47):
admit that there are definitely some use cases here where there's value.
I've seen
people like, Sharon Machlis on Blue Sky talk a lot about how much better they feel Claude does at generating
accurate R code compared to chatgpt. I have not really tried that out yet.
I've, you know, used chatgpt.
(13:08):
To be honest with you, sort of where I found the most value is
in generating code for other languages that aren't my first language,
but I have enough
programming experience to be able to
sniff out the issues, if that makes sense.
Absolutely.
Yeah. So that's where I'm finding the the biggest value, but if I was just to try to blindly use it to generate our code, I from my experience in the past, and I probably haven't tried in the last last month or 2, but it wasn't
(13:40):
it wasn't code that I wanted to to use or, you know, I could have written it just as fast as having the model generated for us.
But I'm assuming that that these leaders, you know, Claude and Chad, GPT are probably fairly equivalent as you you mentioned in their ability to summarize information. There is this gap, right, between
the paid,
(14:01):
hosted models versus the, open source that we have. But my hope is that the open source models continue to to plug along. Right? They're they're gonna lag a little bit, but hopefully,
you know, in 6 months from now or 12 months from now, they'll sort of catch up to where the the Claudes and the chat GPTs are of the world are currently. So we'll see.
(14:23):
And I can't help but notice
in the Claude Sonnets sort of summarization of all the blue sky posts and how Stephen's able to spit that out in the markdown, I think, that he puts in his blog. It looks a whole lot like the our weekly homepage
in terms of the latest.
It's all the latest news from the R
ecosystem.
(14:44):
So I don't know if, the R weekly
team can use something like this to make their their lives easier in terms of content curation on a week to week basis or pull requests or god knows what these days, but I don't know. It got my wheels wheels turning potentially.
The one small hallucination
that Steven points out, which was pretty interesting, it's it's that USAR conference on August 8th through 10th,
(15:11):
and the the post itself on Blue Sky doesn't have a year.
So the model
stuck 2024
on there, but, obviously You're right. I didn't even catch that. Thanks for keeping me honest. Yeah. No problem.
Steven has a, a little tiny footnote at the bottom of the blog post that that points that out, and the the year should actually be 2025,
(15:34):
obviously, because it's an upcoming
conference. But it's an interesting little gotcha and,
you know, as I I look into the sentiment analysis
that was done as well that's driven by a large language model,
You know, I'm trying to wrap my head around the utility there as well, because, you know, the weird thing about these generative a AI models,
(15:56):
as opposed to, you know, something that's that's more deterministic,
your traditional machine learning model,
is you could put the same inputs in and you may not get the same output every time. Exactly.
So if you're trying to leverage an LLM for sentiment analysis,
you know, how do you
deal with the idea that, you know, that analysis may not be reproducible.
(16:20):
Right? Because the the sentiment may be slightly different,
the sentiment output may be slightly different,
the second time that you put the same inputs in. So to me, I I have a little bit of a hard time wrestling with that. I mean, I think for, you know, exploratory
work and just creative work and things like like that for this the purposes of this blog post, just a one time little sentiment analysis didn't have to be reproducible or anything like that. I think, you know, absolutely. It's totally fine. But if you are, you know, considering
(16:50):
trying to do more robust,
build more of a robust framework and process around doing sentiment analysis, you know, in your organization or something like that. I think that's probably one thing that you should consider. Right? If you're deciding whether to,
use, you know, a machine learning approach versus
the this generative AI sentiment analysis approach. But it was interesting.
(17:13):
Love the fact that most of the sentiment was very positive.
That is
at extreme odds with what I have seen on the the other app, the old app, where the sentiment seems to be quite different. And, I can't say enough about Blue Sky and what's going on there and how
empowering and exciting it is to be in that community and how much it feels like like the old world of community,
(17:37):
social media, data science, Twitter.
Yep. And the and the fact that we can we can look at, you know, the sentiment and many other factors of this in many dynamic ways, thanks to the power of APIs,
the power of our just our imaginations
really. So I'm, yeah, I'm intrigued by where the community goes
(17:57):
on that space of it. And and, yeah, you won't find me first in line
on that prediction approach,
like, what you saw here, especially in my industry. That is, oh, boy. Yeah. That that's that's that's a tough sell for us. So we will find out. Yeah. Predicting with LLMs that
turned my stomach a little bit when you said that.
(18:28):
If you are in an industry or, you know, whoever, a large, you know, midsize company,
at at this point, you, I'm sure, have some kind of internal towing that's been set up at your organization
and are surfacing some of the more low hanging fruits, so to speak, of what AI can do in terms of integrating
with other platforms. And
(18:50):
I don't know how much common knowledge it is to the data science community as a whole, but I do, you know, in my spare time, of course, listen to other podcasts, and it's been very
apparent just how important
Microsoft has been in OpenAI's,
life, if you will, with the funding that they receive.
And,
of course, as you would imagine,
(19:11):
Microsoft does have a lot of integrations with OpenAI.
I literally deal with this on a daily basis, wherein my meeting calls at the day job, about 5 seconds after the call ends,
there's a, summary generated by Copilot
just right there.
It's happening, folks.
(19:31):
So if you are in this ecosystem and you're looking at ways of how you can leverage AI models and maybe integrate with other services in a specific Microsoft type setup,
our next highlight does show just the what you can do in that space,
and this post is coming to us
from Martin Chan, who is a data scientist at Microsoft, where,
(19:53):
as I said, if you in your organization
are invested in the Azure ecosystem,
and, yeah, Mike and I have had our adventures of Azure as well,
they do offer an open AI,
integration.
Now the the goal in this, demonstration post that Martin has put together
(20:14):
is first part of the post is how you actually get set up for this, and there's no way I'm gonna audio and narrate that. That's all very straightforward with the screenshots and whatnot,
but he wanted to use, like, you know, a practical way to demonstrate how all this works.
He was able to find a dataset
called the 100 climbs dataset,
(20:34):
which is about different locations
around the world. My guess is this is somewhere in Europe if I had to had to had to guess
about the different attributes of these locations that would be relevant to a cyclist,
including having a URL
associated with each location that you can boot up in your browser and read about that location, one of which is called the the Cheddar George or Gorg or I'm probably not saying that right. Nonetheless,
(21:03):
it's got some interesting information, and so what what Martin wanted to do
is take this dataset,
take the URLs
associated with each of these locations, which, again, is just a field in the dataset,
leverage the Arvest package
to
extract
just the content of that summary of that location or that description of that location, I should say,
(21:28):
using,
you know, a clever use of CSS div ID so you get the fluff of the whole web page,
just the content of that description location,
and then
leverage an OpenAI model, in this case, a chat gpt4
minutei,
within the Azure implementation of OpenAI.
(21:48):
But much like other posts that we saw from Anastasia,
I believe, last year, he kind of built this up himself as a way to kind of learn how all this kind of works under the hood,
leveraging the previous version of h t t r
package
to write a custom function to supply
a call to the API
(22:09):
using a prompt that he supplies a function parameter
and then returning the results back in JSON format and then munging that for the rest of the analysis.
And so definitely a, you know, pretty pretty straightforward look at all how all this works.
But the results that he got from this, again, were were quite promising, albeit
(22:32):
didn't ask you to do a whole lot. It was a summarization
of that location, not too dissimilar to what we saw in Steven's post earlier.
And it take it, in this case, with this, this location that I mentioned, this Cheddar Gorge location
in England,
it kinda had the top takeaways
of this of this article,
(22:53):
such as the climbing details, how popular the location is,
significance,
you know, the experience of it, and any local attractions around it.
Again,
does the job. Right?
But it was a it was a neat demonstration of how you could leverage, you know, building your own functions to do this.
If you have to integrate with an AI model, but it's not as convenient as, say, what Elmer brings to you, if you are leveraging
(23:20):
one of these hosted platforms that may or may not be offered by Elmer, then what, Martin has done here can be an interesting way to prove that out,
especially as in the context of this of this highlight here,
you want to integrate this pipeline as part of others in that major ecosystem. So here we're talking about Azure. I could see a very similar thing with AWS and what they do with model hosting that you might wanna integrate with, say,
(23:46):
optic storage or databases or whatnot.
These are the kind of things that I know are happening more routinely in many industries.
I'm still pretty new at it. I'm not as in-depth as many others like Martin is here,
but it is interesting to see all the tech giants out there are trying to put services in front of these models to, you know, make it, in their words, easier to run, but also integrating with their other pipelines.
(24:12):
Again, just like we talked about in the last highlight, there's no such thing as a free lunch, so you will be charged for these API requests when you do these hosted model approaches.
But I do know some of these platforms are also gonna give their customers a way to not just run
a off the shelf model like chat GPT or whatnot.
(24:32):
I know some of these vendors are also letting
the the company bring the open source models to that platform,
and they just provide a front end to it. So that might be helpful in certain situations as well. That's not the topic here, but that's an area I'm looking at, I think, more closely this year when I look at how I integrate this in my daily work. Nonetheless,
(24:53):
a great summary here. The code is linked in the post, and
and all in all, if you're in the Azure ecosystem, you've got got a lot of options available and, more power to you for what you can do there.
Yeah. Fantastic blog post,
walking through sort of this end to end process, you know, including
the custom call, if you will,
(25:14):
to the API via h t t r.
So and it looks like I used the the GPT 4 o mini model from Azure's OpenAI integration. So,
it's nice for us in the highlights to be able to get to see a variety of these different types of, LLMs. And one thing that that he did, which I'm curious if Elmer allows for,
(25:36):
is
not only provide, you know, what's called a user prompt,
but also a system prompt, which is sort of before you
enter your user prompt, you're you're instructing the model what you want it to return and and how you want it to behave. I think in the last example
where he is,
you know, asking for some information about the yellow spotted lizard,
(25:59):
he provides the system prompt that says,
describe the following animal in terms of its taxonomic rank, habitat, and diet. And then his user prompt is is yellow spotted lizard.
And the
structured
the structure of the output
is exactly what he had defined in his system prompt. So it's it's interesting how those two different things can work together. I know people do silly things with that system prompt that says, you know,
(26:25):
responds like you're Darth Vader or something like that. Right?
Or write me a write me a poem, about our stats in the the style of Snoop Dogg or something crazy like that in those system prompts. But I think, you know, they're potentially very
powerful in
helping you
somewhat fine tune, and I know that's a loaded word here in this this context, but, fine tune the the output that you're going to get. And, you know, to
(26:52):
to me, I think that the future here in terms of applicability
at, you know, your organization
and and, you know, really
squeezing as much value out of these tools as possible
has to be creating an easier path towards fine tuning
or, you know, retrieval augmented generation, which is is still pretty heavy at this point,
(27:14):
because,
you know, you're still leveraging sort of this this full LLM
file,
which, you know, has all sorts of context in it about things that you probably don't care about. Right? And you have
a specific task maybe that you're you're interested in
leveraging generative AI
to assist with. So I'm interested in in projects like, that I'll just shout out here, the the Rasa project, which I think is a Python framework. But it allows you to
(27:44):
sort of combine,
you know, rules based,
I think, you know, if statements, if you will, for natural language processing, natural language generation
with
large language models to be able to help build some guardrails
around, you know, the the chat interface
system that you are providing
(28:04):
for,
your end users.
So,
you know, my hope is that
as we take a look at sort of more of these general use cases for LLMs, that we'll start to see
more targeted
use cases,
you know, that maybe are more deployable
on a a larger
scope of problems at your organization as opposed to, you know, a lot of creative use cases at this point. At Catchbook, we're
(28:30):
knee deep right now in in some r and d work in trying to build really the best text to SQL
LLM model that we possibly can on a specific database that we have, that we expose for our end users so that we can leverage, you know, Shiny chat essentially, which is the the work that Joe Chang and his team have done to be able to build chat interfaces into our our Shiny apps that are tailored specifically and perform really accurately,
(28:59):
against the the database, and they write, you know, correct SQL. So we're
still,
we we found sort of more hurdles than we expected. I think in Joe's
demonstration at at Shiny Conf
last year, it seemed like it it worked fantastically well. And I think that that was a small data set with really obvious column names, that that sort of helped. And I think in a lot of real world use cases, and I apologize if I'm going off on a tangent now. But that's not necessarily
(29:30):
the case, and these things aren't aren't as accurate,
as you may expect on larger
types of of context windows. So
that's stuff that we're working on. I
I'm excited, hopefully, to see this whole entire space
start to,
you know,
move, I guess, towards easier paths for developers to to fine tune,
(29:54):
these generative AI models.
Yeah. I see that as the future as well where you want a targeted model or or targeted setup of a model
for a specific case and that it is first in class in that specific case.
I remember Joe telling me that in the rag piece of it when they were trying to summarize
(30:15):
their quartal documentation, he shared this in the AR Pharma talk as well. It was completely off the mark compared to what they were expecting, which doesn't make a lot of sense when you think about it. You would think that just like we're talking about in these two highlights here that we have a given set of text and we know how to summarize that quite well.
But it really depends on what's being fed into it and what the back end model is actually doing. So part of me hopes that as these open source models get more mature, that there are ways for us to kinda tweak
(30:44):
pieces of those and then make a different version of a model that's more tailored to, like, a maybe, like I said, a data context. Maybe it's more tailored to documentation
context
or, you know, a more of a development context.
So we're seeing a lot of these being an attack right now, and there's no real clear winner here yet. I think it's a lot of wait and see to see where a lot of this shakes out. So
(31:09):
I'm I'm I'm curious where it goes as well.
We're we're really trying to do the low hanging fruit currently, but
that it can't stop there. And we're hearing all talks about different advancements
and and the level of AI that's being prototyped right now.
Sam Waltman's had some very,
(31:30):
provocative remarks about where that goes down the road, wherever it gets there this year. Call me skeptical. I'm not sure.
But I think the key part for us is to kinda slow down a little bit, do these more practical cases first where we try to
to to build the Skynet already, not to be too, Debbie Downer
(31:50):
on that. No. Absolutely. And I think, we've made a lot of progress on our text to sequel,
work. I'm pretty excited about what we've been able to do in the last week or so, and I'm gonna promise a blog post around that because I won't be the one writing it. So it'll actually happen
in the next couple months.
Okay. Well,
(32:11):
well, a human writer or is AI gonna write it? No. I'm kidding. No. No. Human writer.
Okay. That's great. You got you got the Are We Can Eat videos on YouTube. Yes. That's good.
But
it agentic AI. Right? Agents are gonna solve the solve all of our problems. That's what LinkedIn tells me at least. Oh, yeah. LinkedIn will tell you everything. I just want a a robot that washes my dishes. Is that too much to ask?
(32:43):
And our last highlight today,
well, we're not gonna tell an AI for this one because there's very much a human element that is used when we look at the best practices
of visualization.
And one area that you do have to address one way or another
is your use of effective use of colors in your visualization.
(33:04):
And luckily, the R ecosystem,
R itself, has always been, in my mind, 1st in class in visualization
since practically day 1 even with base R itself.
But if you're thinking about how can I get my head around,
what are some ways I can leverage colors in a better way, or how can I make it more routine in my visualization
(33:25):
workflow?
Well, no stranger to the highlights. Nicole Arrini is back on the highlights for 2025
with a very comprehensive
overview
on how you can work with colors in R across many facets. So we'll we'll tag team on this one. I will lead off with first of all, you know, you wanna use a color.
(33:46):
How do you actually define how to use it? Well,
there's always more than one way to do things in R. And one way that's been supported since day 1, the first way I learned,
they have built in names for colors. Obviously, the the standards like blue, yellow, red, whatever,
and many more, but you can just specify that when you wanna define a palette for your visualization.
(34:08):
She has tomato, sky blue, yellow too, and it's gonna show up right there.
Here's where it gets interesting. Maybe you wanna be a little more low level with it. Maybe you wanna integrate it with other parts of your visualization
pipeline.
That's where probably my favorite of the next approach is the hex codes
because I use those so much in my CSS
(34:31):
styling of Shiny apps.
It's kinda like that unit language that I'm used to now when I have a browser in my tab
of these different, like, color pickers.
I type in a name of a color,
and then I get the hex code. I might get other values, but then I can copy that, put it into my app as, like, part of a, a class or whatnot.
(34:53):
HEX codes are the way to go for that, and they're what's nice is recent versions of Visual Studio Code and Positron.
When you put this hex code as a character string with a pound sign in front, it shows you the color right next to it. I love that feature in my editor. So I've been I've been a hex code, you know you know,
(35:13):
addict because of that. So I I I love that.
If you wanna get even more low level,
RGB,
the red, green, and and blue mixture
value might be very helpful
as well as things like h HSV
and HCL, which again kind of putting more numbers on the different hues
or the, you know, the different intensities of this.
(35:36):
HEX codes are kind of my bread and butter, but, again, depending on where you are in your visualization
journey,
one of these is probably gonna be what you end up using.
So once you know how to use the colors,
now what are the best ones to actually use?
This is where she gives a shout out to one of my favorite pack visualization
(35:56):
packages, and that's Palleteer
by, Emo over at Posit,
where it there is, like, I think over a 100 of these specific pallet packages in your ecosystem.
Palleteer will let you use all of them and just choose which one you wanna use. So she's got an example here
(36:17):
looking at, the Met Brewer palette,
which is inspired by the Met Museum in New York,
the Terra palette, for example, and and you can just call it with a plot palette function right away and be able to use it.
But you may find a set of color choices that you think look great,
(36:37):
keyword you.
But, like I often tell my kids, it's not always just about you. Right? Because there's a very important domain, Mike, that we really hope that many get more into their into their workflow
and that is accessibility.
But Nicola talks to us a lot about new ways that we can
think about accessibility when we choose these colors.
(36:59):
Yeah. I think accessibility
around,
visualizations is something that we've talked quite a bit about on this this podcast
before, and I think the reason being is that it's it's really important to consider. And especially when you think about color palettes. Right? There's a lot of folks who are red, green color blind,
and it's very easy with all of the tools that we have
(37:21):
at our disposal
to
accommodate that. Right? And avoid
using your traditional
red is bad, green is good color palette.
Even though that's what a lot of my clients want. Sorry.
Unfortunately,
we gotta break them out of that cycle. I try really hard. I promise. I get it. And and there's a couple of great packages that Nicola shouts out here. There's one called the color blinder
(37:46):
package. No e there at the end, b l I n d r.
And that I guess, it has a function called cbd grid, which simulates how a plot that you've created may look to people with color blindness.
That to me is absolutely incredible.
I think that's awesome.
In in terms of sort of generating colors as well, you know, once you have gone through and created a
(38:12):
have a couple of different palettes that are color blind friendly or at least have a workflow to assess
the accessibility
of the color palette that you have selected so that you can do that iteration.
There's some additional tools and one that I thought was super, super cool because I've had a different workflow for doing this that is a little bit more manual,
(38:34):
and I'm excited to switch over. There is an R package I had no idea about called eyedropper
with a capital r at the end, and have you ever seen this package, Eric?
Long ago and I almost forgot about it. So I'm glad Nicole reminded me here. So I had always leveraged like a a Chrome extension
(38:54):
to be able to, you know, be on a web page
and click on a particular
essentially pixel
on that web page and get the hex code, right, that's behind that pixel,
representing the color that I want for brand styling.
And this eyedropper
package allows you to specify
(39:16):
a a path to a particular image,
which I think could be a local image and and maybe a URL as well. I'm not a 100% sure. You can do both. Yeah. You can do both. That's great. It looks like,
and you can specify the number of colors that you want to extract
from that image,
and it'll create a color palette by, you know, picking out the top, I would imagine,
(39:39):
you know, n number of colors that you specify in the the end argument,
colors from that particular image and and give you those hex codes that you can leverage in your own color palette. And I think that's really, really incredible. And and as always, you know,
I love Nicola's blog posts. They're they're littered with code. They're really well articulated. The visuals in here are beautiful. I think it's a master class in palette generation
(40:05):
and,
choosing colors for,
you know, not only accessibility,
but also, you know, visual styling and, you know, creating
these workflows. I mean, she leads off the the blog post by creating this small little custom function for
quickly visualizing a color palette, you know, right in the plot viewer in your IDE. So simple, something that I
(40:26):
should have, you know, done a long time ago, but I am excited to mooch that code, from the blog post and and leverage that in my workflow, because I think utility here is is fantastic. And
maybe a,
related shout out that I think is is probably appropriate to give right now is a recent project I think from the Posite team,
(40:47):
where they're allowing you to create this brand dot yml file,
which
can be used across all sorts of different deliverables,
that you may have including, you know, your quarto or or Rmarkdown maybe even reports.
I think, your your dashboards,
your flex dashboards, your quarto dashboards, your Shiny apps as well. And you can just continue to sort of reuse
(41:11):
this,
this same YAML file
with all of your, you know, corporate brand styling that you have across your different deliverables. So I think that's really, really cool. It allows you to define colors,
logos,
different fonts that you want to leverage.
So I think that's, you know, a nice shout out at the intersection of of this kind of topic here. But fantastic blog post by Nicola as always, and I learn a lot when she writes something.
(41:40):
Yep. And thanks to her wisdom here and many others in this visualization
space,
I feel like I don't necessarily need, the pace,
extremely
high overpriced
vendor to teach me about all this all the time. I can I can read this? I can play with it. And for my apps or for my given analysis,
(42:00):
I have enough tooling at my disposal to at least get, you know, a lot farther than I would be just on my own for sure.
I am keeping an eye on brand. Yammo, like you said. They are still working on the shiny for our support for that, but once that lands, yeah, I'm I'm taking that and running with it. It does work with quartal as you said, so I'm thinking about that for my quartal content. But but, yeah, when you think about what style do you wanna put for those brands or maybe, another type of analysis,
(42:29):
having, having knowledge about how palettes work and the best ways to choose and optimize them for for accessibility.
Yeah. This is must reading for any visualization
analysis in my humble opinion.
Just like our weekly itself, I think that's must reading for everybody too in the art community. But, again, I'm biased, and I'm proud of it. So,
Batool has put together an excellent issue here, and we'll give a couple of minutes here for additional fines.
(42:54):
Well, Mike was the, I guess, the unwilling victim in the preshow of hearing me speak once again about my enthusiasm
for Knicks these days, the,
packaging framework that's a slightly different take than containers.
And thanks to Bruno Rodriguez and Philip, they have put together the Rick's package for us to use Nick's principles
(43:17):
in the R side of things, and they just had a new version released
as of about a week ago.
And this version has a couple really important things that I've been waiting for that have landed the first support of. One of which
is the ability when you bootstrap these next environments for your r analysis,
you can now specify
(43:38):
a date for what you wanna associate that analysis with or the packages, I should say, for that analysis.
And then Rix will take care of finding
at that time the package versions that correspond to that date, not too dissimilar to what we may have seen from the MRAN project that Microsoft
ran many years ago with a way to snapshot your packages based on a date. Now we kind of get that flavor in RICs as well with this new parameter.
(44:06):
Also,
for the RM users out there, and I am one of them,
you now have a function to take that RM block file
and convert it to a Nix default profile
that will, at in different aspects, either try to get as perfect for the version of that package as possible or get close to it. There are trade offs. It's not gonna be quite one to 1,
(44:30):
but boy oh boy, this is a huge advancement for me as I think about legacy projects
and adopting them to a RIC's workflow. So I'm super excited.
I've already tried it out multiple times. It's working great, and I'm now using it with Positron no less
to have a custom R environment
fronted by Positron.
(44:51):
I dictate the version of R, the version of packages.
Positron just grabs it, and I am off to the races.
I feel like a kid again, Mike. I can't get enough of it.
I love it. I love it. I'll be there someday. I'm not there yet, but I will be there someday, and you know it. I just lag a little bit behind you.
I found a blog post, in the highlights as well from Stephen Paul Sanderson, and it is in sort of my world of of finance,
(45:18):
that we do a lot of. And it is an analysis
of NVIDIA's recent earnings call,
that leverages, I guess, data in this this platform called
Dotada,
Dotata's knowledge platform, which is essentially a database designed for investment professionals.
So if you're in this space, I think this is a really interesting
(45:40):
way that it looks like now we're able to access,
financial
statement, financial earnings call
content, in a from a sort of database like structure instead of having to do all sorts of crazy
sentiment analysis and natural language type of stuff, to be able to make sense of this data. So I thought it was a really interesting,
blog post that I want to shout out.
(46:01):
Love it. Yeah. I'm gonna be taking a look at that too. I, this machine I have here right next to me has an Nvidia card that's over 5 years old. I may be upgrading
soon, but not sure how much I wanna spend on the bank on that because, graphic cards prices are not exactly cheap these days. You can guess why. Yeah.
Not just gaming anymore, folks. None nonetheless. Yeah. It's a fantastic issue that Betola has put together here.
(46:28):
But, the Art Weekly project is, again, driven by the community here, and one of the best ways to help out is to send that great new resource at, via a poll request,
which is linked at the top of the page of each, issue.
You can fill out a quick template. We'll get the curator of that week. We'll get it in for you. And, also, we love to hear from you. We did hear from a few of you, you know, over the break. We'd love to keep that dialogue going. We have a contact page in the episode show notes that you can fill out. You can also get in touch with us on social medias.
(47:01):
I am on Blue Sky as as I've been speaking highly about. I am
atrpo
our podcast dotbsky.social.
It's still not natural for me to talk about yet, but I'm getting there. I'm also on Mastodon with atrpodcast@podcastindex.social,
and I am sporackly on the x thing with at the r cast. And, Mike, where can listeners get a hold of you?
(47:22):
You can find me on bluesky@mikedashthomas.bsky.social
or on LinkedIn, if you look up Ketchbrooke Analytics,
ketchbrooke,
you can take a look at what we're up to.
Awesome stuff. I love to see where you're up to on there. And, yeah. Let's keep the positive vibes going for 2025. That's our plans anyway.
(47:44):
So thank you so much, for being patient with us while we took a a a break at the end of last year. We are back in the swing of things of our weekly, and we're happy to share more great content on our road to episode 200 when we actually get there. So, again, thank you so much for joining us, and we will be back with another edition of our wiki highlights
(48:05):
next week.