Local AI Models with Joe Finney

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Hey Richard, Hey Carl, what do you know?

Speaker 2 (00:03):
Well, I know that our friend Michelle Rubusta Monte is
with us to tell us about something that's going on
adjacent to DEV Intersection.

Speaker 1 (00:11):
What is it? It's cybersecurity Intersection. Let's let Michelle tell
that story.

Speaker 3 (00:16):
Hey Michelle, Hey Carl, Hey Richard, how are you.

Speaker 2 (00:21):
Tell us about cybersecurity Intersection?

Speaker 3 (00:23):
Well, so, Richard and I are partnering with the group
that does DEV Intersection and next Gen AI, and we
are putting on a new conference dedicated to one hundred
percent security focused topics. And I mean, honestly, the lineup
of speakers is incredible. We have Paula A. Jenis, who's

(00:43):
here from Poland and does keynotes all over the world
and is one of the top rated RSA speakers and
black hat speaker. We're so lucky to have her. But
she's not only keynoting, she's got a workshop teaches you
about protecting your environments against hackers and shows you about
how to you know, do attacks so that you can

(01:03):
prevent them. It's pretty cool and sessions like that as well.
But we also have speakers from Microsoft. We have we
have speakers that specialize in you know secure coding practices,
Azure security, Zuero, trust architectures on Azure UH and people
who do decision maker tracks, so things around governance policy
and you know how to how to manage and your

(01:26):
production operations keep them secure. So it's an amazing group
of speakers, really excited about it.

Speaker 2 (01:31):
And I think I can count myself among the group
of speakers there.

Speaker 3 (01:35):
Well, yes you can. That is great.

Speaker 2 (01:37):
Yeah, I'm doing a securing Blazer Server applications talk and
also I think we're doing a Security this Week live
show there somewhere that is correct.

Speaker 3 (01:48):
Yeah, we'll be recording Security this Week Live. We're going
to have a great panel with some folks. The interesting
thing here is we don't really have a Microsoft and
dot net and Azure focused toecurity conference yet, so that's
the reason we're putting this on as well. You know
there are other security conferences, but they have a spread
of topics that maybe don't focus on the things you

(02:10):
do day to day. And you know this overlaps with
again our community of folks that specialize in again dot net,
Azure and yeah, they need to keep it secure too,
So with tons of talks.

Speaker 1 (02:23):
Cyber Intersection is part of a trio of conferences we're doing.
They have Intersection alongside the next Gen AI conference all
in Orlando the week of October fifth through tenth. That's
workshops and the main conference. And you can get a
special registration code if you sign up through Cybersecurity Intersection
dot com.

Speaker 3 (02:42):
Yeah, so if you sign up at Cybersecurity Intersection dot com,
then you put in this code so Alliance cyber three
hundred and you'll get three hundred off the entry price.
So that's a special code that only works at cybersecurity
dot com. And then you have access to all the conferences.

Speaker 2 (03:04):
Like Richard said, Wow, that's cool. Thanks Michelle. I'm looking
forward to it and I'll see you there. Hey, get
down rock and Roll. It's Carl Franklin and Richard Campbell

(03:26):
for dot net Rocks.

Speaker 1 (03:28):
Hey, Richard, how you do it, Bud?

Speaker 2 (03:29):
I'm good, getting psyched up to go down to Orlando.

Speaker 1 (03:33):
Yeah, it's almost time back to a new dev Intersection
and next jen AI and the New Cybersecurity Conference side
by side. Yep, yep.

Speaker 2 (03:42):
Looking forward to doing a live security this week's show
down there.

Speaker 1 (03:46):
That should be fun, fun and you're crazy thing with Maddie.
Oh god, you're going to aspireify dot net rocks here.

Speaker 2 (03:54):
I have no idea what to expect. That could be
a horror show.

Speaker 1 (03:57):
This is you know, you love a good you know,
trapease act, just going without a net.

Speaker 2 (04:03):
Absolutely, as long as I don't you know, screw up
too badly, it should it should work out fun.

Speaker 1 (04:10):
You know, a good crash it burns fun too, but it.

Speaker 2 (04:13):
Could be fun. Yeah, yeah, yeah, Okay, let's start with
nineteen seventy. That's the episode number. Oh yeah, and a
bunch of things happened in nineteen seventy.

Speaker 1 (04:23):
Where do you want to start?

Speaker 2 (04:24):
Well, the unhappy things, the Kent State shootings.

Speaker 1 (04:27):
Yeah, it's terrifying.

Speaker 2 (04:28):
On May fourth, National Guard troops killed four students during
protest against the Vietnam War at Kent State University in Ohio,
leading to nationwide outrage and the song what is it
for Dead in Ohio?

Speaker 1 (04:43):
Who's that?

Speaker 2 (04:44):
Neil Young or Crosby Sills, Nash and Young? I'm not sure.
Nigerian Civil War. The conflict ended in January when Biaffron
forces the Affron forces surrendered after a thirty two month
struggle for independ and it's the first Earth Day was observed.
On April twenty second, the Beatles broke up and let

(05:07):
it be. McCartney said he was leaving the band on
April tenth. That was the end of that. But John
Lennon instant karma. He wrote and recorded this hit song
in a single day, showcasing his prolific creativity. Diana Ross
and the Supremes gave their final concert in Las Vegas
in January fourteenth. Back to the bad stuff the Tonguhai earthquake.

(05:32):
Devastating earthquakes struck Tongue High County, China on January fifth,
resulting in significant casualties, with estimates of up to fourteen thousand,
six hundred and twenty one deaths. Yeah, and an avalanche
in someplace that in France that I can't pronounce zi

(05:52):
fool full, Sorry about that killed forty two people, making
one of the worst disasters in French skiing history. You
can talk about the science, yea science. Some things happened,
well I was.

Speaker 1 (06:06):
I mean the space one's the obvious one. After having
both Apollo nine, Pollo ten, and Paula eleven and Apollo
twelve all in nineteen sixty nine, there was only one
Apollo mission in nineteen seventeen. That was a Poulla thirteen.
It launched on April eleventh, and on April thirteenth they said,
we've had a problem.

Speaker 2 (06:24):
Here, Houston. We've got a problem. And we're a great
movie too.

Speaker 1 (06:29):
Yeah, and you've seen the movie, a beautiful rendering of
more or less what happened. The HBO Earth of the
Moon series, if you ever get a chance to watch,
that does a version of Apollo thirteen, but from the
view of the people on the ground, so you only
ever hear the astronauts over the radio, which is how
it was. Right. Sure, here's the crazy thing to realize.

(06:51):
So the explosion in the tank happens on April on
April thirteenth, the splash downy April seventeenth. It was four days. Wow,
the whole thing's four days. I know, it feels like forever.
It's four days.

Speaker 2 (07:03):
Wow.

Speaker 1 (07:04):
But it was four days of are these guys going
to make it? You know, like four days of sheer terror. Yeah,
it was. And of course they the lunar module Aquarius
was turned into a lifeboat because the power systems, a
little bit of battery that was left in the command
module is going to need for re entry, so they
basically powered down the command module and then use the
Life Sports system for two four to three and just

(07:26):
four days and they were able to get home amazing
and survive. It's a great story. And of course the
next Apollo mission would be delayed while they dealt with
some of those issues, and in nineteen seventy one, you'll
get Apollo fourteen. Talk about that next week apparently. Yeah.
On the computer side of things, Nicholas Worth releases Pascal Woll.
He worked previously on the language I'll Go sixty and

(07:47):
there's some derivations therein he was trying to do a
combination of sort of procedural and algorithmic programming. So Popular
Language did some great things. But on the heart work side,
for me, the show stealer is my you know, iba
Intel's most important product, the eleven O three, the d ram. Okay,

(08:10):
this is what Moore's law actually was about, was making
RAM right based on a bunch of other developments to
make a transistor based memory. They were able to make
a silicon substrate for an eighteen dip pin dip can
with one K of RAM in it for sixty bucks.

Speaker 2 (08:31):
Wow, that seems cheap back then, and.

Speaker 1 (08:34):
One cent per bit, and it was small because then
they were largely using core magnet Ferris cores for memory.
So this was very compact and it was adopted immediately everywhere.
It's it's uptake. That's also the same year that the
first version of the IBM system three seventy comes out
with all semi conductor RAM, but that was not Intel's RAM.

(08:54):
But shortly after that, Intel's RAM just dominates the market
and sends Intel on its trajectory. Although nineteen seventy one
they'll make arguably and even more in product important product.
Tune in next week for nineteen seventy one, nineteen seventy one.
But yes, the eleven O three was there, you know,
definitive product. They were rammed digit you know, semiconductor ramming.

(09:18):
And that's what I got.

Speaker 2 (09:18):
All right, well, I guess we should carry on with
better no framework.

Speaker 1 (09:23):
Roll the crazy music possible.

Speaker 2 (09:32):
All right, man, what do you got again? I looked
for a trending repost on GitHub and I found MCP
for Unity. Oh my, yeah, you know Unity Create create
games with the Unity. It's a graphical tool that uses
c sharp and JavaScript for scripting, but it also does
all of the three D stuff. So here's what it is,

(09:54):
proudly sponsored and maintained by Coplay, the best AI systant
for Unity. There you go create your Unity apps with
l l MS. M CP for Unity acts as a bridge,
allowing a assistance like Claude Cursor to interact directly with
your Unity editor via a local MCP model context protocol.

(10:16):
We've been talking about those. A local MCP client use
your lll M tools to manage assets, control scenes, edit scripts,
and automate tasks within Unity.

Speaker 1 (10:27):
Pretty cool. Interesting, Yeah, a good show to actually walk
through the process of, you know, including making a game
in Unity with with the MCPM, with l MS in
the role. Yep.

Speaker 2 (10:40):
Also code it with the AI. Dot com is up
and the first episode is there and we're basically using
playwright to with the code agent in the visual studio
code nice and using clauds on it. And we basically

(11:01):
one prompt told it to create a user documentation of
Jeff Fritz's copilot do John dot com website, and it
did a pretty good job. What we didn't show was
what's involved in setting up the playwright MCP so that

(11:22):
the agent can use it. Oh yeah, and it turns
out that's pretty complex. You need node JS and NPM
and all that stuff, and we're looking for a video
on how to do that, so look in the show
notes for that. Cool, but that's it for a better
no framework. Who's talking to us? I have a common
of a show nineteen sixty nine. Yes, that's last week's

(11:45):
show with our friend James monte Magno.

Speaker 1 (11:48):
And we talked a little bit about the AI tooling
inside of Visual Studio code and its relationship with Visual
Studio and so on. And our friend Richard Rukima, also
known as Coputer, has this common but he says, I
think Richard nailed it. Do you like the code or
do you like a solution? I consider my expertise working
with AI as a beginner, especially after listening to James,

(12:09):
but I felt that vibe of joy in getting things
done so fast. So do I like the if then else?
Or do I like ask you for a future reviewing
the result? I'm long past the joy of knowing how
to write procedural code. Yeah. An interesting aspect of this is, like,
is it the more experienced folks that are going to
embrace these tools faster? Because it's typically the more junior

(12:31):
people that tend to jump on the bandwagona new things,
but I hear the same tone over and over again. Yep.
Certainly in terms of respectful interaction with AI, I don't
prescribe to the harsh language, as I feel it reveals character.
It's an interesting statement rights in my character not to
be harsh fully or and to focus on being respectful communication.

(12:51):
I don't think AI should be treated any different, not
for the benefit of AI or the benefit of myself.

Speaker 2 (12:57):
Yeah, exactly. You're not going to feel good, you know,
using harsh language.

Speaker 1 (13:01):
Putting those mean words out there is as much impact
on you as it is on anything else. And leave me.
The software is not affected, that's thing, right.

Speaker 2 (13:11):
The only thing left to be affected is you.

Speaker 1 (13:13):
Yeah, so be kind to yourself. It's not necessary, right, Hey, Richard,
I'm pretty sure you've got a copy of Music code
By already, but thanks so much for your comment. But
if you'd like a copy of music, Cobey, I write
a comment on the website at dot net rocks dot
com or on the facebooks to publish every show there
and every comment there, and never reading the show, will
send your copy of music Go.

Speaker 2 (13:31):
Music to code By is still going strong after all
these years twenty two tracks. You can get him in
uh wave, flack or MP three and that's at Music
to Code by dot Net. Okay, let's bring back our
friend Joseph Finney. Joseph is a mobile product owner in
MVP by day and he builds productivity apps for Windows

(13:54):
by night. When he's not programming, he's burning running and
enjoying tasty coffee and beer in Milwaukee.

Speaker 1 (14:00):
Hey Joe, Hello, welcome back to having that.

Speaker 4 (14:03):
Good to be back talking more about the hot topic
of the day, AI.

Speaker 1 (14:08):
With a Century. Yeah, but you've got you've got a
cool angle of this. That's why I asked you that
to come on. So what are you working on?

Speaker 4 (14:15):
Well, one of my most popular apps that I make
is text grab, which is pretty basic. It's also the
basis for the Power Toys Text Extractor, which is basically
select a region on your screen of somebody who sent
you text that you can't actually select and put somewhere
where you want it. And it does some on device
local OCR. Pretty simple, and now with these new models,

(14:42):
the OCR is getting better. But it does change compatibility
and devices, but it's it's pretty interesting what we can
do here now with these local models Microsoft's making it
easier with some of their Windows AI APIs, and then
there's it just gets more and more complicated from there.

Speaker 2 (15:01):
Mm hmm. So I have an app that I'm running
right here that does little OCR and I'm using Tesseract
to read the text in a bitmap at a certain coordinate.
That is that the sort of representing the state of
the art before AI got into the mix.

Speaker 4 (15:17):
Yeah, I would say it's it's similar. Tests React was
the open source project that Google took over I think.
I think actually HP started it way back there and
then kind of Google took it over. Yeah, it's on GitHub.
There's a lot of models. It's very widely used and loved.
Text grab does enable you to download tests earect and

(15:39):
then you can interact with it through the CLI. Well
text grab will just interact with it directly, but there's
a little bit of setting up. You do have to
download it. It's a it's another installer. It's through ub
Mannheim I think who does the installation. So there's definitely
some hoops you have to jump through to get it working.

Speaker 2 (15:54):
And there's a data set that goes along with it, right.

Speaker 4 (15:57):
Yeah, Yeah, so yeah, you have to download the languages.
There's a lot. One of the benefits there of Tessaact
is that there's a lot of languages, and they have
packages for scripts, and they have packages for like handwritten
and so it's really high quality. Originally, Textcrab was built
using the Windows ten ocr APIs, which are definitely older,

(16:19):
not as good, but they're very fast. So that was
kind of the nice thing there. They're built in, they're fast,
they're quick for most stuff. It worked pretty well cool
test erect was a bump up, but again you have
that complexity where you have to download the models locally.
But it's open source, it's available, it's free. And now
there's these Windows AI APIs that Microsoft has released. I

(16:40):
don't think we know exactly what those models are. I
don't think they've shared. I haven't learned what they are exactly.

Speaker 2 (16:46):
But what was the acronym that you used before we
started recording for this new.

Speaker 4 (16:52):
Wind WINML, Windows and machine learning.

Speaker 2 (16:56):
Okay, and this is new, yeah, literally days old than
we don't know anything about it.

Speaker 4 (17:00):
Well, the win mL stuff is kind of a middle
layer here, Okay. So I would say there's like three
general levels of intensity. If you are a local Windows
app developer and you want to get ocr image language
models like all of that stuff. If you want to
do that in your app. I would say there's like

(17:22):
three different tiers of complexity that you can engage in,
and the first one is the new Windows aiapis. And
these were released kind of around the time the Copilot
plus PCs were released, Okayne, and they've been rolling out. Yeah,
they've been rolling out slowly. They were in experimental. You
had to be on the insider preview to build them.

(17:42):
To use them, you have to have a co Pilot
plus PC. But you know, there's a higher bar kind
of on the consumer side, but that means it's easier
on the developer side. So they basically in the code
when you're building, you just have to check does this
device support these APIs? If so, do it very simple

(18:02):
and like that's it. You don't have to manage models,
you don't have to manage memory or downloading, and you
don't have to worry about shipping. You know, a five
gig model with your app. They're already on the device.
If the device supports it, then you can kind of
light up those features, turn on those buttons, show that capability,
and boom, it's there.

Speaker 1 (18:21):
Kelly.

Speaker 2 (18:21):
My wife bought a new Copilot plus PC. She didn't,
of course know it. We went to best Buy together,
you know, and she picked it out. But the first
thing I did is immediately turned off all this stuff.
It's going to get in the way. The thing that
takes screenshots all the time. I can't remember the name
of it now, recall, recall, that's it. It was turned

(18:44):
off by default. So that's good. That's good. I did
not want that on.

Speaker 1 (18:49):
It's a really powerful tool. People love it, you know,
like because the bottom line is you can you can
ask the machine, he where did I see such and such,
and it'll find it for you. Yeah.

Speaker 2 (18:58):
I just don't have that kind of problem, like I
know where I saw stuff, and I keep good notes
and dot your machine. Yeah, she didn't want.

Speaker 4 (19:05):
It, so yeah, I also don't use it like I
have AI features in well, AI, I should say, I
know this show Richard has talked a lot about how
you have these big amorphous buckets of AI, and then
as soon as you start explaining it and giving a
more clear, straightforward name to it, it stops really being AI.
And that's kind of where the OCR and LLM and

(19:28):
image segmentation and image detection. So those are all under
this umbrella of AI, and it can be a little
I don't know.

Speaker 1 (19:38):
You left the impolite part, Joe, which is like, so
for me, the term artificial intelligence means something that doesn't work. Yeah,
there you go, because as soon as it does work,
it gets a new name.

Speaker 4 (19:49):
Software, right right, that's it's a module. Yeah, well, I
should say, then the using name space in dot net
is AI. But then after that there's always dot tech
that imaging that image recognition. So there's a bunch of
there's a bunch of APIs after the namespace that actually
point to the real APIs, the real functionality of what

(20:10):
you're actually trying to do. And I don't think you
can easily turn all of that off. I would say,
so there's a lot of experiences that are built on
top of this technology that's already in these Copilot plus PCs,
and you could turn those experiences off. You know, they're
not going to run by default. But Microsoft does a
pretty good job of managing bringing down the model, keeping

(20:33):
it up to date, and making it really easy for
developers to interact with, which is kind of what you want, right,
You want something really simple easy. It's a super complex problem,
but you could just say, you know, send this block
of text, summarize it, and then get it back.

Speaker 2 (20:46):
So in case anyone hasn't figured it out by now,
the Copilot plus PC has a local LLM built into it.

Speaker 1 (20:53):
Yep.

Speaker 2 (20:53):
And you know, this is the kind of thing that
you might think of if you were going to use
OLAMA right and download models and you know, train it,
run it on a laptop or something like that, the
gaming PC or something.

Speaker 4 (21:09):
Yeah, there's that's just kind of where I said, there's
like these different layers of the complexity and the easiest, simplest,
like lowest level, easiest for any developer out there to
integrate into their Windows app. Any Windows app by the way,
so WPF or when UI or wind forms you can
or MAUI, you can do them all. It does have

(21:29):
to have identity, some sort of identity because SB there's
Microsoft doesn't want to just open up these APIs to
any random raw ex But if you want to do
some more maybe more niche stuff, maybe a little bit
more complicated stuff, or you want to use this specific model,
you can kind of use what I would call like
the next step of complexity here, and that's win mL

(21:53):
and that's there's a little bit of a middle layer
there where you can go download your own on X
models and run those and it makes it easy. There's
like a basically a standardized interface and you say, run
this model. You don't have to necessarily optimize it for
the specific hardware and it can run CPU, GPU and
PU and it's an easy way. But again, there you

(22:15):
have to manage the model. So if you want that,
if you need that in your application, maybe you have
it specifically fine tuned for your application, or you have
a model that isn't in the box, or I don't
know if there are other legal or.

Speaker 1 (22:31):
Hey, I'm just appreciating you're talking about something other than
in the LLM, because it's just it's just overwhelming right now.
So you know, clearly there's a bunch of other models
out there and all of those infrastructure, and I'm including
links to onyx and things like if you haven't looked here,
there's lots of good work being done for specific tasks.

Speaker 4 (22:48):
Yeah, and I think immediately people can kind of get
annoyed by, oh, LM, why do I need an LLM
in my model? I'll need AI and it definitely has
become synonymous like AI and LLM. Yeah, but there are
so many If you go to hugging face and you
look at all the different categories, I mean OCR, image segmentation,
image detection, object detection, huggy face, oh yeah, hugging face,

(23:12):
hugging face, hugging face. Yeah, this is a I think
Facebook is kind of backing it. And it's a big
repository for models, so you can access models, you get
download models, and if you're thinking.

Speaker 1 (23:27):
Before the insanity of lllms, we had we had good
tooling around just building machine models for object detection and
recognizers and OCR all these good things, right, Like, it's
just there was so much going on before chat GPT
showed up and just overwhelm the message.

Speaker 2 (23:45):
Wow, hugging face looks awesome.

Speaker 4 (23:47):
Yeah, it's it is a huge, kind of big repository
of models online where you can go download them. But
if you're a normal person who's just curious and says
I want to kind of to try some of these out,
it's not as easy. You can't just download them and
then run them. They are not programs, their models, so

(24:08):
you need to interface with them somehow, and there is
actually a way if you are inclined. You can download
an app from Microsoft called the AI Dev Gallery app.
And what this is it's kind of a playground for
people who are curious about models and different models and
how this all works. It's open source on GitHub, it's

(24:28):
in the Microsoft Store and it is a really low
barrier to entry if you are interested in trying some
of these models out on your own device.

Speaker 2 (24:36):
Wow.

Speaker 4 (24:36):
So you can download models from hugging Face. You can
run them. They're very limited, basic samples, so don't expect
anything grandiose or chaining them together. But it's a great
way to play with those Hugging Face models.

Speaker 2 (24:48):
Very cool.

Speaker 1 (24:48):
Did you ever play with Cagle, because we've talked about
this on the show Ages Ago. Just like there is
another playground for practicing your mL skills.

Speaker 4 (24:58):
I've never tried. It is in a a website or
a technology.

Speaker 1 (25:01):
They actually run competitions for you know. The sort of
famous one for them was the predict how many people
survive the Titanic sinking. There was a bunch of different
models or different competitions, and some of them have a
lot of money in them because they're actually you know,
organizations encourage folks to mature a model particular problem space

(25:25):
that they can then use elsewhere. There was things like
aneurysm detection and even sports predicting. So just again a
reminder that there's things other than llms.

Speaker 4 (25:39):
Right, And I would say that is the like the farthest,
the highest tier of integrating AI models into your app,
your local Windows app is making your own models, training
your own models from scratch, So you can do that.
I mean, you can ship models and integrate them directly in.
It's again way more integration work, but it's way more

(26:01):
fine tuned. So if you have a specific application where
you need a model that can do very niche things
or very specific data sets, it's possible. It's doable, and
there's ways to do it. You should check it out.
One of the nice things about this current age of
programming is a lot of these big popular apps are
open source, so you can just see how it's done,

(26:23):
and you obviously read the license, but a lot of
this stuff is available to see how other people are
integrating these AI models.

Speaker 1 (26:31):
Guys.

Speaker 2 (26:31):
I know we've talked about deep seek a bit on
this show, and Joe's nodding his head, so he knows
about it, and this was the model that came out
of China that uses a lot less resources and is
therefore cheaper to run than you know, chat GPT was,
and everybody was like, oh my god, open ai is

(26:52):
going down, and it didn't. And then there were concerns
about you know, if I use deep Seek, am I
sharing data with you know, the country of China and
is it safe in all of these things. But you
can also I think, correct me if I'm wrong, but

(27:13):
download it the app and run it locally like olama.
Is that true?

Speaker 1 (27:18):
Yeah?

Speaker 4 (27:19):
That So one of the nice things about deepseek is
how small it is. But they also have NPU optimized
models which you can go download and there's also an
extension for vs code.

Speaker 2 (27:33):
Wait wait, go back to the is M or NPU
and what is that?

Speaker 4 (27:38):
That's the neural processing unit. So you kind of have
your CPU, your GPU, and your NPU.

Speaker 1 (27:44):
And this was.

Speaker 4 (27:45):
The core the chip, the part of the CPU in
these ARM devices that really made it easy to run
these models locally and efficiently.

Speaker 1 (27:56):
Okay, part of the requirement for a copilot plus PCs
that it has an MPU of at least what is it,
forty tops or trillion operations per second.

Speaker 2 (28:04):
So if you have a copile plus PC, you can
download deep Seek and use it even if you don't,
and you're probably going to get good results.

Speaker 4 (28:13):
Yeah, you don't have to have a NPU, but a
lot of these models. So Microsoft makes a LM called
five Silica, and this model they have they've been releasing three,
three point five, they just released four. It's optimized for
the CPU and the GPU and not the NPU right now,

(28:34):
at least the models that they've released, and there are
models out there that you can get that are optimized
for the NPU. So if you do have a device
that is OM device or low power device and you
want more of an optimized model, you can find them
and run them. And you can also do that in
VS code. There's an extension called AI Toolkit for Visual

(28:54):
Studio Code, and that's another kind of playground esque place,
but you can also do the model refinement and fine
tuning in there. So there's a lot of ways that
you can experiment with these models without really being a pro.
So if you're just curious and you have a lot
of hard drive space, that is the one thing that

(29:14):
I'll say, I recently upgraded my surface hard drive from
a five to twelve to a two terabyte because these
models are big and if you want accurate ones, they're
very large.

Speaker 2 (29:26):
I just saw Richard probably knows about this, but there
are now twenty two terabyte SSD drives. Yeah, for like
around five hundred bucks. Can you wrap your mind around that.

Speaker 1 (29:38):
It's a lot of storage.

Speaker 2 (29:39):
Oh my goodness, Like me know, Joe's like is shaking
his head, like what.

Speaker 4 (29:44):
One drive twenty two terramytes.

Speaker 2 (29:46):
Twenty two terabyte SSD five hundred bucks?

Speaker 4 (29:49):
You should that's not a typeout.

Speaker 2 (29:51):
No, there's a couple of different brands.

Speaker 4 (29:53):
That's amazing.

Speaker 1 (29:54):
Yeah, ridiculous, Yeah, that really is. I think I should.
I don't think they are sists. I think they're spinning drives.
Oh really two terabytes? Yeah SSDs the solid state ones
and aren't that big yet?

Speaker 2 (30:05):
Okay?

Speaker 1 (30:06):
The still twenty two terabytes is madness? Like that's just
a lot of storage.

Speaker 4 (30:11):
Yeah, it really is. And the AI Toolkit and vs
code does allow you to interact with these llms through
the web, and so GitHub will host some of these models,
other providers will host them, and so you can kind
of do comparisons. So there's the local foundry, and that's

(30:31):
what Microsoft has branded there. You know, I've called it,
I think the second tier kind of where you have
win mL and you have your local models and you're
doing that work. So you have your local models and
you can compare those two cloud hosted models and test
them because again, you know software, you have to be
able to test it. So it is hard too with

(30:52):
these how do you compare them? Like, which one's good,
which one's bad? Is it good enough? Is it good
enough in our use cases? And it can be tedious
to test manually. But there are a lot of tools
out there to experiment, get started, and if anybody's curious,
I definitely you should check out the aidev gallery for sure.
That is a lot of fun to play around with
those different models and for a little bit more advanced scenarios,

(31:16):
what more language focused. The AI toolkit in vs code
is another really fun I'm looking at deep seak here
right now. You can download it on your device and
run it.

Speaker 2 (31:27):
Wow, it seems like a pretty good place to take
a break. So we'll be right back after these very
important messages.

Speaker 1 (31:34):
Stay tuned.

Speaker 2 (31:36):
You know, dot net six has officially reached the end
of support and now is the time to upgrade. Dot
Net eight is well supported on AWS. Learn more at
aws dot Amazon dot com, slash dot net.

Speaker 1 (31:53):
And we're back. It's don that Rocks emergor Campbell. Let's
call Franklin. You talking a bit to our friend Joe
about work with local models and also and the non
LLM stuff just sort of a good reminder there's been
all kinds of cool stuff going on in the mL
space that didn't necessarily have to do with language per se.
But you know, you've you've hinted this a couple of

(32:15):
times in the first half. It's like, if you want
to own the model, you know, there's a lot of
models available to download from hugey face and all these
other places. Why would you want to own a model
because it sounds like a lot of work. It's like
owning a framework.

Speaker 4 (32:31):
Yeah, yeah, it is like, don't trust somebody who says
they can write their own language and write their own
ide You're like, oh.

Speaker 1 (32:38):
Their own garbage collector, you know, their own crypto library.
Like these are all scary things to me. So when
someone says I'll just make our own model, I'm like,
why do we need to do that?

Speaker 4 (32:48):
Well, if you're in the industry. If you have insane
amounts of data and a niche in a specific industry,
it might be worth it for you to look into
doing this. And if you have a hard time processing
large amounts of data to get insights and actions out
of it, which is kind of the idea here, right,

(33:09):
what you have an entire language that you have to
train these models on, or you have an entire data
set of images with boxes drawn around the dogs or
dog breeds or very specific things like that. If that's
what you need to do, is something where it's not
available or it's not good enough, there's really no other

(33:29):
way around it than to build your own model today.
But it really is that data.

Speaker 1 (33:33):
It's I mean that being said, this is all sort
of non terministic thing, like you're never going to get
one hundred percent out of a machine learning model.

Speaker 4 (33:41):
It's probabilistic, right, absolutely, even maybe especially so some of
the image detection ones, and a lot of times they'll
give you back a number a fraction of confidence, and
I think maybe this is why they don't get as
much play as they're not as exciting for individuals to use.
It's like the could take a picture of your cat
and then your phone will draw a box around it

(34:03):
and say that's a cat. Yep, that's a cat. So
I think it's a lot less interesting. The language ones
just kind of capture people's imagination and there's a lot
more back and forth. But when you really think about
building an application, like what are you doing? Maybe you
have a you're playing around with your Raspberry Pie as
a security system for your house, and you want to
add a vision system and you want to do box

(34:25):
detection and you have hours and hours and hours and
hours of security footage. Or maybe you have a specific
niche application where you're trying to, you know, detect a
particular squirrel who's given you trouble. It's a fun you know,
it's a fun experiment and you.

Speaker 1 (34:38):
Can do a bear or a bear.

Speaker 2 (34:40):
Joe, do you have a toi less squirrel bird feeder?

Speaker 1 (34:44):
No?

Speaker 2 (34:44):
I do not seeing this YouTube? Check YouTube for toil
less squirrel terrible right. It's basically it goes between you
know what you hang the bird feeder on and the
bird feeder, so it's got a hook on either side.
It detects weight and so when there's a squirrel on it,
it just starts spinning and the squirrels go flying. It's

(35:04):
hilarious to whirl the squirrel.

Speaker 4 (35:06):
Yeah, that you could build an AI powered twirl a squirrel.

Speaker 1 (35:10):
There you go, There you go. I don't think that's necessary.
I am thinking about animal recognition this particular part of
the world where you know. The one that would be
tricky that I would really challenge myself would be whale
detection because we've had you know, you don't have a
lot of time to pick up on the fact that
there's whale blow, like they're going by, and it could
be orcers and it could be humpbacks, and it could

(35:30):
be grays, and it could be porpoises, and it could
be dolphins. Like you have to be a lot of
stuff going on. You have to be on the surface.
We hear no, no, we hear them like we hear
whale blow before we see the whale because it travels
like when they when they exhale its loud.

Speaker 2 (35:46):
Well, you could identify a whale by the sounds it's
making too.

Speaker 1 (35:49):
Yeah, I wonder. Yeah, speaking of it still seems nuts
to build your own model like that just seems like
a thing I don't want to own.

Speaker 4 (35:56):
Yeah, it's it's definitely the research side of things. And
I know people have been saying for a long time
that data is the new oil, right, this is the
new black gold of do you have the data? Do
you have the databases? Is it structured, is it consistent,
is it clean? Is it real? Is it good? And
if you have all that, I think we have a very

(36:19):
small number of people who can say yes, we have
that right and you don't have to spend all that
time cleaning the data, which is such a challenge where
you have so much noise in the data today that
if you're trying to train a model, Yeah.

Speaker 2 (36:31):
If how I was going to use a local LM,
I would want it to understand C sharp, JavaScript, Blazer,
you know, and CSS. That's and I don't know how
realistic that is. Like I know that the current models
like Claude's on it, and you know even chat GPT
understand it. But for lack of a better word, sorry, Richard,

(36:53):
didn't mean to offend you. There. They're programmed, you know,
they're they're trained against it. But what does it take
to do that locally, to train the models to train well,
or to get a model that understands you know, programmers
speak languages and stuff they do.

Speaker 4 (37:10):
Yeah, local models will and they can write code. I
think part of the challenge that you'll see if you
start using them is speed. So the response speed of
a local model is going to be much slower actually
than a cloud hosted one because your computer cannot compete
with a server with a rack of GPUs. Yeah, well
maybe yours, Carl, not mine.

Speaker 2 (37:31):
Oh, I don't know. I don't think so. But you know,
I think if I had a great Copilot plus PC,
you know, with a lot of RAM and a lot
of storage, and I just set it over in a
closet somewhere, I could probably use that.

Speaker 4 (37:47):
Yeah, you should try it.

Speaker 1 (37:48):
Yeah.

Speaker 4 (37:48):
Another challenge is going to be context, which is how
big of a context window can the model actually hold
in the provider there's all of that, there's a lot
of infrastructure in between the model and actually getting stuff out.
So speeding context, I would say, are going to be
your biggest risks where you don't necessarily just want it
to give you new greenfield CSS. You want it to

(38:09):
give you new CSS in the right spot for your codings.
Which is that?

Speaker 1 (38:14):
And I want a much harder question.

Speaker 2 (38:15):
I wanted to remember everything we've said, Like I want
as big a context as I can possibly get. So
is that just a measure of more RAM or is
it the more that context you have, the slower it's
going to be to come up with a new answer.

Speaker 4 (38:31):
Yeah, that's a good question. I would love to hear
an expert who actually knows more about context and how
that differs from the training data and how it differs
from fine tuning, because in my experiences with local AI,
I have a pretty narrow context window that you could
basically feed it, Hey, here's everything I know, and you
feed it with the prompt yeah, and you say okay,

(38:53):
now do this and then give it back to me.
But you're not feeding it documents.

Speaker 1 (38:57):
The thing that's made a difference for me has been
the video card and the amount of memory in the
video card, Like playing with frame Pack and a couple
of other models, and so I'm running a fifty eighty
with sixteen gigs of v RAM, and that has made
a huge difference for running bigger models. No, I'm not
talking about building models, but actually executing a more complex workload.

(39:18):
And if you have got the money to spend, because
they're thousands of dollars like those top in RTX cards.
Now you can get ninety six gigs in them. Jeez,
it's a ten thousand dollars card. But you know that
seems to be the thing that makes the most difference
for a lot of these kinds of tools when you
want to kindle a lot of contact.

Speaker 2 (39:35):
What about an NPU? Is that gonna do it less
than more than a ten thousand dollars video card.

Speaker 1 (39:40):
No, because there's just no You know they talk about
that Copilot plus PC has forty tops. I don't know
what that means. Yeah, that's the trend trilling operation per second.
It's the measure of its compute power for neural nets. Okay,
my fifty eighty has thirteen hundred TOP. I see so.
And when you look at what Nvidious selling the data
centers and things, is their giant GPU like that with

(40:01):
huge amounts of memory, this super fast memory and them
for scale processing.

Speaker 4 (40:05):
Yeah, the NPU, I think was more of a play
for a continuous operation or in the background and on
mobile devices where battery and power consumption is a much
bigger concern for individuals, where they're thinking, well, I don't
want this GPU chugging away in the background. Can I
get something? Can I get something good enough, and that's

(40:25):
kind of where that minimum bar is that doesn't absolutely
consume my battery life. You know, you open your computer
up and it's like, hey, I was working in the
background seeing if anything was happening.

Speaker 1 (40:35):
No, thank you. Yeah. Yeah. And it's been an argument
now that you can jack up a PC enough with
those with a couple of those big GPUs and run
a mid size LLM on it. So you know, certainly,
I've had conversations with folks where it's like, I am
not prepared to send any of this data to the cloud.
What can I do one hundred percent local? Yeah.

Speaker 4 (40:56):
Another thing that you do have to consider if you're
going to get into building and those apps are especially
local apps, is the idea of multi modal. Yeah, these models,
these local models, at least the Windows aiapis are not multimodel,
so you will have to.

Speaker 2 (41:11):
In other words, you can't talk to them and write
to them exactly. Is that what you mean?

Speaker 1 (41:16):
Right?

Speaker 4 (41:16):
So you're going to have to build that. I mean
you could, but you're going to have to put a
speech recognition model in front of the LM or a
object detection model plus an OCR model plus that you know,
you have to maybe chain these models together and then
you can get that multimodal experience where you can drop images,
you can put PDFs in, but you have to be

(41:36):
able to read the PDF. So these lllms don't read
PDFs by default locally. You do have to get them
into a text format. So if you're thinking about how
you can apply this into your work, and I know
a lot of enterprises, a lot of companies, a lot
of their data is not in raw text format, so
you do have to get it there.

Speaker 1 (41:56):
Yeah, but there's an MCP for PDFs. So you know,
glue these bits together.

Speaker 4 (42:02):
Right, yeap, but you will have to do the gluing.
Some assembly required.

Speaker 1 (42:05):
This is the job, right, Like, this is not just
an app you run, but we are assembling parts to
try and get to a place where a model could
be built.

Speaker 2 (42:15):
So if you were going to build a local LLM
Joe yourself using some existing technology, would you first reach
for deep seek or would you go for just the
stuff that Microsoft is exposing in Windows.

Speaker 4 (42:31):
Yeah, I just reach for this stuff or a Microsoft
is exposing in Windows and their five model. It's pretty good,
it's pretty robust, and I would say it's a nice
middle middle ground there for building on top of and
fine tuning. I don't have enough time to be building

(42:51):
all these applications and learn the APIs and learning the
political history of where all these models come from. So
it is a The benefit of Microsoft as a software
provider is it's the one throat to choke, right, this
is the one person you go to. They provide a
lot of the tooling, they provide a lot of the models.

(43:12):
Is it the best of any of the world's the
absolute best. No, But when you're doing a lot of
different stuff, sometimes you just have to have some heuristics
here and just make the decision making. There's an infinite
number of decisions that you have to make when you're
picking all of these. So starting just with the built
in tools, the built in APIs, it's a great easy

(43:32):
way to get started. And if they don't work for you,
then you can start making other questions and decisions. And yeah,
but I would say start with the built in stuff
definitely at first.

Speaker 1 (43:44):
Okay, yeah, here I knew Ivious read I'd read this.
I just looked at up again. Gptoss is a version
of GPT three that can be run locally on a
machine with sixty four gigs around and a fifty to
ninety with twenty five gigs of v RAM. So that's
roughly six or seven thousand dollars PC somewhere in that neighborhood,

(44:07):
depending on how much you pay for the video car.
The video cards can be driving around it. But that's
running you know, GPT three, which is what the original
GitHub copilot was built. Again, Like, that's a pretty torquy,
pretty good little LM one hundred and twenty billion parameters.
Like it's not GPT.

Speaker 2 (44:23):
Four, but.

Speaker 1 (44:25):
Especially in a narrow scope application like a NOME set
of code, that's pretty robust. Man, you could do a
lot with that.

Speaker 4 (44:33):
Yeah, you could do a lot with that. And also
you have to consider the big question of why would
you build local ever, you know, why do it at all?
Obviously privacy is a concern for a lot of people
of why would you do this stuff locally on your
own computer? If you have network concerns, if you don't
have reliable or high quality or high speed internet, then
obviously this is the only solution for you. But then

(44:56):
also there's the cost concern and the cost question of yeah,
you don't necessarily want to make some code that runs
out and is running all these llms, and then you
come back with a bill for you know, thousands of
tens of thousands of dollars because your credits went crazy. Right,
But when you have it local again, try There's so
many cool tools, the AIDEV gallery, the AI toolkit, and

(45:20):
then there's the APIs available already today. There's so many
ways to get started and try and see. I you know,
what is your application, what could it be? Try it
out because you might not have to sign up get
an API key at all. You could do all this
stuff locally. And then if you want to do batch
processing of again your own data, maybe you want to
kind of use these models to put the data into

(45:42):
a particular shape or clean it or work through it.
But you don't want to pay tokens to do all
that work. Well, do it locally, do it overnight. Build
an app, your own app, not something you ship necessarily,
but do it locally, you know, process that data locally,
and then go from there. Maybe you're going to build
your model, but first you have to get all the
data in the right shape.

Speaker 1 (46:02):
Right, and and you're trading time for money right right.
Essentially the game you're playing here. It's like, Okay, if
I run it on the cloud, it's going to cost
me more, but I get it done less time, or
I'm restricted to my own hardware so it may take longer.
And then you start, you know, doing the economics. So
just looking up the high end. Yeah, the ninety six
gig in Nvidia RTX pro six thousand Blackwell, that's the

(46:26):
big Box twelve thousands.

Speaker 2 (46:28):
Well, you know, it's not only the money, but as
Joe said, the security and the privacy that may trump
any kind of money, and you know, and that may
be the requirement you know.

Speaker 1 (46:39):
Sorry that was Canadian dollars, just nine thousand Americans.

Speaker 4 (46:42):
Ah, well that totally changes.

Speaker 1 (46:45):
Yeah, everything's different. Now he just saved me two thousand
three grand grand. But again, if I'm playing that game
of the cost benefit, like what am I spending on
tokens at that scale? True? And I really get the
sense that as this sort of bubble starts to burst

(47:05):
and people need to make money, like tokens ain't getting
cheaper nowp.

Speaker 4 (47:09):
Yeah, I have been using Claude and Codex and Copilot.
There's definitely times where I have three computers running and
they're I'm just kind of like telling them to keep
going over. They're checking and building, but it's never going
to be cheaper than it is now, Like this is

(47:30):
the cheapest is going to be. They're trying to get
as many users as possible, but that floor has to rise.
I mean, I know Anthropic was having some issues a
couple of weeks ago with limits and quality, and Codex
I think had something a month or so ago where
the limits. And again, if you're relying on these cloud services,
not only are you relying on them to stay up

(47:51):
and your connection to them to say live, but you're
also relying on the model and the pricing and the
availability at all from a business to point for them
to stay up. Because it might make sense today, I was.

Speaker 1 (48:03):
Talking to some folks abroad that are big, like running
five sixty seven simultaneous instances because they're working that fast
right tuned models, reaching these things, and they said that
over July fourth everything got dramatically faster, like they got
a ton of work going July fourth because Americans weren't
working like these, like these cloud infrastructures are stressed to

(48:24):
the limit and slowing performance as it is right and
say and the only and the proof we've had is
like when the stress isn't a high, things are better.
So there is this interesting argument about at what point
does this make more sense to be local versus remote?
And this is going to be a shared resource too,
like these big boxes don't have to be per dev
They could be shared out again with potential performance issues

(48:47):
like well, of course, I'm such a hardware geek, like
I'd love to build out a rack of this stuff.

Speaker 2 (48:52):
It would be fun, wouldn't it.

Speaker 1 (48:53):
It would be and you know, and then now I've
got the heat and power problems right.

Speaker 4 (49:00):
To live it firsthand. Well to your point about shared resources,
that is one of the nice things about win mL
that just released Execution Provider that Microsoft announced making it
easier for local devs to integrate models is if you
have an application and you need a model, do you

(49:20):
download it? And then every single one of your applications
is downloading a five gig LM. Yeah, obviously that becomes
untenable very quickly unless you have that twenty two terabyte
drive in your computer. Solow you yeah, yeah, more than one.
It does allow you to share models across application rich
so you can have one machine install.

Speaker 2 (49:41):
Richard, you were right. I thought they were SSDs.

Speaker 1 (49:44):
They're not. They're HDDs ds. There are a few SSDs
over eight terabytes, but most of them the line seems
to be eight. By the way, the RTX six thousand,
six hundred watts each.

Speaker 2 (49:56):
That's why I have solar panels.

Speaker 1 (49:58):
Yeah, that's it, you know, like oil BOYD. I'm just
thinking about how much you remember in the end, this
is moving electrons around and generating heat like you just
made rocks make heat. Like that's saying time watts. You're
gonna feel it. You don't want to sit in the
room with that thing only man, No, it's going to
be crazy. But it is an interesting point of view
as we're still going through this to say, what are

(50:20):
we going to shift local? What are we going to
run remote? Like, what's feasible at what makes sense for
folks here? And I think, you know, not everything has
to be cloud and not everybody wants it there.

Speaker 4 (50:30):
Right, And I think you just have to be you
know wide. I'm not saying to get super deep on
all of this stuff, but the tools for you to
get your feet wet are available, and when you're CTO
or more probably more likely, your CFO comes to you
and says, hey, we can't afford this bill anymore. Your
critical application can't use this LLM. You have to stop,

(50:53):
or you have to change something because either somebody's prices
went up or the business model changed.

Speaker 1 (50:58):
Yeah, what are you going to do?

Speaker 4 (50:58):
What are you going to reach for? And getting your
feet wet in some of these local models, it's a
great way to have an answer or have some sort
of solution or see if that solution will work.

Speaker 1 (51:08):
Now you're swapping op X for CAPEX and then, you know,
using CFO speak like, we have two ways to solve
this problem. We spend month over month on it, or
we made a capital investment and spend less. You know,
let's do the math. You know, if you want to
talk to CFO, bring a spreadsheet.

Speaker 4 (51:24):
Yeah, exactly. And it's as as we've said, as you've said,
stuff is changing so fast. So if you get super deep,
if you start training your own model, and then tomorrow
somebody comes out with a model that just makes all
that effort useless. This is again, this is like the
sweet spot, right, Isn't this where the Windows developer has

(51:45):
kind of always loved to live where they're like, yeah, yeah. Yeah,
we're not like hardware level, we're not doing machine code.
But then we're also not just like bleeding like the
best of the best. It's like, okay, we're in the
middle here where we got models, we got a local
It's it's efficient, it's it's a good balance.

Speaker 1 (52:01):
Yeah. Well, and I'm going to call back to Cagle
again because one of the other ways you can get
a model built is to put out a bounty on
Cagle in a competition to have someone build it for you. Effectively,
there you go. So you've got the data set, but
you don't want to actually do the construction. You can
host a competition and define your problem space and provide

(52:22):
the sample data, and a bunch of people compete for
to deliver you the best model. It's a weird world, man,
is like, if you want to go deep into mL,
there's so many interesting things to be done here. M hmmm.

Speaker 2 (52:33):
I had the weird meta thought that you could get
a model to build your model instead of you know,
farming it out for a bounty.

Speaker 1 (52:42):
Well, you're not wrong to interact with an LM to
start constructing a plan around how a model would get built,
because that you know, in the end, they are a
pretty clever search tool for best practices.

Speaker 4 (52:53):
Yeah, search and tokenization is a really nice thing that
you can do with your local LM of crunching some
of your data, your text, tokenize it make it easier
to search, have that more natural language available for your users.
It's a really hard thing to code, but if you
have local l MS, I can help you build that.

Speaker 1 (53:13):
Why not. Yeah, that's cool.

Speaker 2 (53:15):
Anything else on your mind that you want to touch
on before we call it a show?

Speaker 4 (53:20):
Not really, I mean we touched on a lot here. Yeah,
we just try it.

Speaker 1 (53:23):
We went we went on a ride today friend again. Yeah,
but this is the kind of deep.

Speaker 2 (53:28):
Dive into local lms and local AI that I really
wanted to get to. So I'm very very happy we talked.
Thank you, Joe.

Speaker 4 (53:36):
Yeah, happy to be here.

Speaker 2 (53:37):
I'm right and we'll.

Speaker 5 (53:38):
Talk to you next time on dot net rocks.

Speaker 2 (54:03):
Dot net rocks is brought to you by Franklin's Net
and produced by Pop Studios, a full service audio, video
and post production facility located physically in New London, Connecticut,
and of course in the cloud online at pwop dot com.
Visit our website at d O T N E t
r o c k S dot com for RSS feeds, downloads,

(54:25):
mobile apps, comments, and access to the full archives going
back to show number one, recorded in September two thousand
and two. And make sure you check out our sponsors.
They keep us in business. Now go write some code,
See you next time.

Speaker 4 (54:40):
You got jas.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

The Joe Rogan Experience

Two Guys, Five Rings: Matt, Bowen & The Olympics

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Local AI Models with Joe Finney

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

The Joe Rogan Experience

Two Guys, Five Rings: Matt, Bowen & The Olympics

All Episodes

Local AI Models with Joe Finney

Stuff You Should Know