All Episodes

March 11, 2025 103 mins

1️⃣ In this insightful episode, the conversation explores the growing world of open-source large language models (LLMs) and their transformative potential when deployed locally. 
2️⃣ Guest Stefano Demiliani joins Kris Ruyeras and Brad Prendergast to break down the technical challenges and rewards of running models like DeepSeek on local hardware, from navigating hefty resource demands to leveraging techniques like quantization and distillation for efficiency. 
3️⃣ The discussion dives into practical business applications—inventory management, autonomous AI agents, forecasting, and even image recognition—all powered by offline models to prioritize data security and cost control.
4️⃣ Listeners will discover how integrating these customizable, secure solutions with tools like Microsoft Dynamics 365 Business Central can streamline operations and unlock new efficiencies. From setup essentials to the collaborative future of AI agents, this episode offers a clear-eyed look at how local AI is reshaping business innovation with privacy, precision, and purpose.

Send us a text

#MSDyn365BC #BusinessCentral #BC #DynamicsCorner

Follow Kris and Brad for more content:
https://matalino.io/bio
https://bprendergast.bio.link/

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Welcome everyone to another exciting episode of
Dynamics Corner.
What is Local LLMs DeepSeek Vy4?
I'm your co-host, chris.

Speaker 2 (00:11):
And this is Brad.
This episode was recorded onFebruary 20th 2025.
Chris, chris, chris.
Local language models.
Local large language models.

Speaker 1 (00:21):
Is that what that?

Speaker 2 (00:21):
means.
Yes.
This was another mind blowingconversation.
In this conversation, welearned about large language
models running large languagemodels locally what are all of
these models and how we cancommunicate with these models,

(00:42):
with Business Central With ustoday, we had the opportunity to
learn about many things.
Ai with Stefano D'Amelio.
Good morning Good afternoon.

Speaker 1 (01:01):
Good morning for me.
Good morning for me Good night.

Speaker 4 (01:04):
Good morning for me, good night, good afternoon for
you.

Speaker 1 (01:11):
It feels like nighttime here, but it's early
morning, it always feels likenighttime here.

Speaker 4 (01:17):
I always forgot the time zone.
Yes, you are early morning.

Speaker 2 (01:24):
Well, you are six hours ahead of me, okay, and
then nine hours ahead of chrisokay, so perfect.
So, yeah, it is, it's perfect.
It's perfect for me becauseit's not nighttime, it's perfect
for you because it's late.
Yeah, exactly, it's perfect forchris because it's very early

(01:44):
so it's perfect for everybody.
It's normal that I have anuploading uh message on top yeah
, yes, yes, yes it it collectsthe local audio and video so
that we have some high qualityfiles to put together to make

(02:07):
you sound amazing.
But you already sound amazing.

Speaker 4 (02:11):
No, not too much you are amazing with your podcast.

Speaker 2 (02:17):
Yeah, thank you we're only amazing because of
individuals like you.
And what's the greeting inItaly?
It's not ciao, it's how do yousay?
You know, usually we'll saygood morning.

Speaker 4 (02:31):
Hello, we say ciao or no.
We usually use ciao, it's thestandard.
Buongiorno.

Speaker 2 (02:41):
Buongiorno.

Speaker 4 (02:42):
Buongiorno is another way, ciao is more informal.

Speaker 2 (02:49):
Ok, and then when you say bye, do you say arrivederci
?

Speaker 4 (02:52):
or ciao again, arrivederci exactly you speak.
Italian perfectly, I'm ready togo to Italy.

Speaker 2 (03:02):
I'm ready to go to Italy.
Still haven't made it over toEurope.

Speaker 4 (03:06):
It's a struggle, but hopefully this year I'll be able
to make, I'll be able to makeone of the appearances over
there, one of the conferencesout there it's always a
challenge one of the nextEuropean conferences yes, yes,
there's several coming up.

Speaker 2 (03:22):
It's a matter of trying to find the one that
works out best logistically.
Yeah, I agree.

Speaker 4 (03:27):
It's not always easy to balance every event that
there's outside, so balanceevents, working family and so on
is not easy.

Speaker 2 (03:40):
No, it's not easy In Europe we spoke about before, I
think casually, europe is likethe United States in the sense
that, oh, excuse me, the UnitedStates in itself is like Europe,
where you have the UnitedStates as a large continent or a
large country, excuse me and ithas many states In Europe Now,

(04:00):
probably one more If you joinCanada.
Don't even get me started onthat, I don't want Canada.
If you join Canada, don't evenget me started on that, I don't
want Canada.
They can keep Canada.
Let's give Canada to somebodyelse.
But we travel in the UnitedStates across states, like
Europeans travel acrosscountries.

(04:24):
So when there's Europeanconferences it's a little bit
easier for you to move around.
I understand.
Also, for you to come over tothe United States it's a little
difficult because you understand, in essence it's a day of
travel somewhere and then youhave to attend a conference or
do something, then a day oftravel back.
So you don't usually dosomething like that without
trying to get additional time todo yeah it's easier for you

(04:47):
though because you're east coast.

Speaker 1 (04:49):
If you're flying east coast to europe, it's it's a
much shorter flight, like for me.

Speaker 4 (04:52):
I have to cross the country, yeah, and then go the
other way I remember when I wasin the us, uh, some years ago,
from los Los Angeles to movingto New York.
It was about, if I remember,four or five hours of flight,
something like that?

Speaker 2 (05:11):
Yeah, some of the flights, like you said.
Yeah, this is about five to sixhours, depending on where on
the East Coast that you go, sothat is just itself going one
side to the other.
It's a little challenging,chris, it becomes.
Which airport do you go to?
Yeah, and Europe is fortunatethat they have a great rail
system because you can go fromcountry to country easily.

(05:32):
And.

Speaker 4 (05:32):
I often forget that, so I see some of these events.

Speaker 2 (05:35):
I was talking with someone they said they were
recommending if I wanted to goto one of the events, we'll fly
to this airport.
You could probably get a directflight.
Then you can take a traineasily for a few hours to get to
the destination, which was muchshorter, when I looked at it,
than flying oh yeah, for sure,to an airport and having the
connections yeah, they do havegood.

Speaker 1 (05:57):
You do have good transportation.
Ours is like a greyhound bus,but that takes like forever to
get around.

Speaker 2 (06:05):
I wish I do wish we had a better transit system.
Some of the cities have greattransit systems.
Boston has a subway and theyhave some rail, exactly.
And then New York has it usedto be a good system, but now,
from my understanding, it's adisaster.
You avoid it.
There's ways that you can getaround, uh, but if you want to

(06:26):
go from boston to florida, forexample, you can take a train,
but the train will take you aday and it's so it's it's
challenging it's challenging butthank you for taking the time
to speak with us.
I've been looking forward tospeaking with you about a topic
that is interesting to mostpeople these days, even more so,

(06:50):
I think, in the developmentpoint of view.
But before we jump into it, canyou tell everyone a little bit
about yourself?

Speaker 4 (06:57):
A little bit about myself.
My name is Stefano.
I'm working mainly in thebusiness central area and in the
Azure area, so this is thetopic that I cover.
In my company.
I am responsible for all thedevelopment team inside my group
.
My group is called Lodestar andwe are quite a large group in

(07:19):
Italy and I have theresponsibility of managing the
development part of businesscentral area and the Azure area,
so serverless applications andso on.
Recently, as you can imagine,we have also started working on
the AI staff, and so I'mcurrently also leading it at the

(07:44):
moment small, but I hope thatwe will grow team that is
involved on providing also AIsolutions to the customers.
I have a long history from inbusiness central area,
previously NIV.
I started in NIV I in version2.1, navision when it was

(08:08):
NaVision 2.1.
Then it was acquired byMicrosoft and so on.
So I follow all the roadmap ofthis product and now we are here
.
We are in the cloud.
So there was lots of evolutionin the product, lots of steps

(08:29):
and there really was.

Speaker 2 (08:33):
I.
One day we'll have to sit downwith a few people that have been
working with it as long as youhave and just talk about the
evolution of the product fromwhere it was, back with the
classic client, with the nativedatabase that they had, then
when they added SQL, then whenthey added the roll-tailed
client, you know, continuethrough the progression of the

(08:54):
evolution of both the productand the language.
And I said it before, originallythey had three versions, if you
recall.
They had the financials version, the distribution version and
the manufacturing version.
So depending on which customertype you were, you would get a
specific version of.

Speaker 4 (09:11):
Navision and that was Navision has had a lot of
evolutions in the years.
I remember we started with theclassic client and the native
database, so this was extremelyfast, so very, very great on

(09:32):
that, with a lot of limitationsprobably when going to big
customers and unfortunately westarted.
My first Navision project in mylife was with a very big
customer because we decided tomove to Navision and all the

(09:52):
healthcare system that we haveand we historically in my
company we have an healthcarededicated sector and we have a
solution, previously handmadesolution based on Oracle
database.
In years, two or three yearsbefore the introduction of the

(10:15):
euros, we decided to move thissolution, solution to Navision
classic database, because it wasonly possible this solution had
, if I remember four or fivehundred users and it was a very

(10:38):
big solution.
And then we moved to SQL Server.
When we moved to SQL Serverfrom classic, there was a lot of
problem conversion of data andsomething like that but the
solution is still live and thecurious part of that is that we
are in 2025, we'll be thecentral line and so on, but we

(10:59):
have also today customers thatare using the old Navision,
converted to NAV 2009,.
But we have still today livecustomers and also big customers
that are still on the platform.
We are trying to convince them.
Wow.

Speaker 2 (11:14):
Is that it?
I know of a customer as wellthat's using Nav 2009, and I
think they have close to 400users and they haven't decided
to make a move.

Speaker 4 (11:24):
The curious part of that.
What sometimes makes me crazyis that in my everyday job at
office, maybe during the day, Ineed to switch from yes Code,
iel language and so on toOpenClassicLine and IV2009 to
fix something or to addsomething.
So also today we need to switchfrom totally different.

Speaker 2 (11:46):
Wow, it is interesting to see the
difference and, as you hadmentioned, you get used to
working with AL and VS Code andall the tools that you have
within VS Code, all the thingsthat were added, and you go back
to 2009,.
You see what we really had todo to do code, even when they

(12:06):
added the separation forfunctions.
It was a big deal for me thatthey had the gray bar where you
could separate between thefunctions, which was an even
fine reference.
It was good.
Also, I didn't get a chance tospeak with you in person.
I know we've communicated withtext and written, but
congratulations on the book thatyou and Julio put out.
It's a great book.
I did pick it up.

Speaker 4 (12:23):
I have it.
Yeah, we have worked quite alot on that.

Speaker 2 (12:28):
So we hope that I can only imagine.
I can only imagine.

Speaker 4 (12:31):
We receive a lot of positive feedback from the
community Very useful.

Speaker 2 (12:35):
It's very useful.
It is on my shelf.
I have it right behind me.
Yes yes so it's uh, I have it aswell no, so, uh, thank you for
doing that and creating that andcongratulations on putting
together something soinformative, uh for users.
But now let's jump into thisllm stuff.
Yeah, because you have beendoing some things that I don't

(12:59):
know, if I can say I understandor don't understand, but anytime
I see something that you post,you're always doing something
new with local language, largelanguage models, but you're also
doing a lot locally, exactly.
I see so you're installing andsetting up AI or language models
on your computer.

Speaker 1 (13:21):
Yes, your local machine.

Speaker 4 (13:22):
Wow, exactly, computer.
Yes, your local machine.
Wow, exactly uh what we have uh, but I think that everyone that
is following uh technologyinformation uh today on socials
or on internet or something likethat.
You everywhere read about ai,uh, ai is a topic that is
absolutely exploding and uhthere are.

Speaker 2 (13:44):
I don't think you can go five minutes without hearing
it Exactly.
I really don't, except whenyou're sleeping, I think, even
maybe what you're talking about.
If you're listening to the news, if you're having a
conversation with someone atwork, if you're reading
something online, I think youcan't go five minutes unless you
, like you had mentioned, Chrisunless you're sleeping or you
just are sitting by yourself inthe woods somewhere without

(14:05):
hearing AI Exactly.

Speaker 4 (14:06):
And I totally agree.
And the history about my thesestuffs that I'm doing today is I
think that the majority of usknows that the big AI vendors
like OpenAI, microsoft, google,something like that so these are

(14:29):
now also Twitter or X sorry,not Twitter X Grok, as we
recently released Grok3, that isextremely powerful.
So the concept that we embracedsome years ago is that we start
providing AI solutions by usingstandard AI models.
So Azure, openai was our firstchoice, and this was absolutely

(14:58):
easy to do.
Just go on Azure, set up amodel, deploy.
Deploy a model and then you canuse your model in business
central or in differentapplications you want to use.
We.
We have some problems on that,so on on some scenarios, and the
problem of that is thatsometimes, when is it not easy

(15:24):
to provide and convincecustomers that an AI solution is
something that can be a winningchoice for them?
So you need to demonstratesomething, and some customers
also are not so prone to leaveyour data accessible to internet

(15:47):
or maybe have some devices,particular devices.
We have, for example, scenariosin manufactories where they
cannot access internet or don'twant to access internet for
different reasons, or cannotaccess the browser, for example.
This was another limitation nobrowser as the way to interact.

(16:09):
And so for that reason, thiswas one of the reasons that
turned me the light to startexploring something different.
And the second reason for thatwas that there are a lot of

(16:29):
scenarios at least in myexperience lot of scenarios
where AI can be useful, but forthese scenarios is not
absolutely needed the full powerof a giant LLM.
For example, why I need to payfor I don't know GPT-4A when I

(16:50):
only need small staffs or I onlyneed to do function calling or
something like that.
Sometimes AI for a big companycan be costly for probably
nothing.
For a big company can be costlyfor probably nothing, and it's
not absolutely not alwayschoosing the best performance

(17:15):
LLM is gives an advantage to thefinal customer.
So, with these reasons, Istarted exploring a new world,
that is, the open source LLMs,because it's probably a world
that is not so spread everywhere.

(17:38):
But the AI world is also fullof open source LLMs, and these
open source LLMs are alsoprovided by big vendors like
Microsoft is providing opensource LLMs, google is providing
open source LLMs, meta Lama andmore.
So DeepSeek is also provided asan open LLM.

(18:01):
These LLM these LLM are, inmany scenarios absolutely
powerful can be executed offlineand sometimes can give the same
result to the customers asusing one of the full version

(18:26):
that you have available inOpenAI or Azure, openai or X or
something like that, givingabsolutely the same results but
without going to internet,without totally private, and so
on.
So that's why I startedexploring this world.

Speaker 2 (18:45):
My mind is full of questions.
So you're working with opensource LLMs to run AI locally,
the language models locally,versus running them online.
I have several questions.
With that One we'll get to.
How do you set all that up, butwe'll talk about that after.
How do you determine thedifferences between the models

(19:09):
that you choose to use, whichthey and you had mentioned some
of the big names that we hear ofoutside of the open source ones
, with microsoft, with google,with meta and now XAI how do you
know which model to use, orwhat's the difference between
the models?
Because I see, like the GPT-4.0, grok 3, grok 2, Cloud, sonnet

(19:36):
3.5.
I see all these differentlanguage models and how do you
know what the difference isbetween them?
Or is?
it just all the same, and it's adifferent name based upon who
creates it.
Are they created equal?

Speaker 4 (19:50):
No, if I can try to share a screen, if possible, so
that we can.

Speaker 1 (19:56):
Yes, that would be wonderful.

Speaker 4 (19:58):
We can talk probably now.

Speaker 2 (20:03):
Very cool Excellent.

Speaker 1 (20:06):
I'm excited about this.
I'm excited.

Speaker 2 (20:08):
There's some cool stuff on your screen with graphs
moving.

Speaker 1 (20:10):
And you're a Mac user .

Speaker 4 (20:15):
But now it's working.
Sorry for the problem, but Idon't know why no one will know,
so we can see your screen.

Speaker 2 (20:26):
You have a window open with some graphs and some
things moving.
Yes, what?

Speaker 4 (20:31):
I will start first showing is this, this window.
So Hugging Face.
Hugging Face is one of the main, probably probably one of the

(20:51):
main portals and platforms whereopen source LLMs are
distributed from all thedifferent vendors, and so every
vendor that wants to distributean AI model today in the open

(21:12):
source world release onAgingFace and on AgingFace you
can see, if you click on models,you can see that here there are
tons of models deployed here.
Some are models completely opensource, models like and not

(21:33):
very known models like, as youcan see, a lot of names that are
not so famous.
But there are models thatinstead are extremely famous and
they have also theircounterpart that is not open
source and is released as a paidservice, like, for example,

(21:56):
probably one of the most famoustoday is DeepSeq.
Deepseq is a very powerfulmodel.
Deepseq, as the full DeepSeqmodel, is a big model with 671

(22:17):
billions of parameters, so it'sa very extreme large model that,
in order to be executed locally, requires more than 400
gigabytes of RAM.
Wow.

Speaker 2 (22:32):
So you need 400 gig of RAM to run this locally.
Wow.

Speaker 4 (22:38):
That was one of my questions.

Speaker 2 (22:40):
The hardware requirements.
Well, you have a large modelthat is run online, such as
DeepSeek and the ones that wehad mentioned.
That was the first question Ihad is if you want to run these
locally, what are therequirements that you have to
run them locally, Because Idon't know of many people that
have a 400 gig of RAM computer?

Speaker 4 (23:10):
people that have a 400 gig of ram computer.
It's uh, it's something that uhyou cannot execute uh in a
local, uh local machine, buthere for uh open source model,
that's uh an important conceptto understand.
That is called quantization.
So quantization is, in simpleterms, is a technique that an

(23:34):
LLM vendor can use to reduce thecomputational and memory cost
requirements of a model.
So in try to explain that insimple terms, is like starting
from a full power LLM.
So an LLM that is provided bythe vendor cannot be executed

(23:58):
online because it requires adata center in order to be
executed.
These models pass through aprocess that reduces the
precision of the model, so canreduce the floating point
required representation of thatmodels.
So it's something likecompressing that model and

(24:23):
create from that model asmallest model with the same
capacity but with less precision.
That's the idea.
So you start from a giant.
You can detach smaller childrenof that giant with a bit of

(24:51):
smaller precision.
But smaller precision doesn'tmean precision in terms of
responses or in terms ofcapacity.
It's something like reducingthe neural network inside that
model.
So if you can see here, forexample, without going to
mathematical concept, becausequantization is honestly a

(25:18):
mathematical concept, if you cansee here this is the full
DeepSeq model 671 billion ofparameters.
These models cannot be executedoffline unless you have a

(25:39):
cluster of machines, because itrequires not less than 400
gigabytes of RAMs and GPUs inorder to be executed online.
So I cannot execute it offlineand probably you cannot execute
it offline in your machines andprobably also many of them,

(25:59):
Unless you got a data centerthere, Brad somewhere.
It's under my desk.
This is why these models areprovided as services from the
cloud, so you can execute,activate a subscription to
DeepSeq or deploy DeepSeq today.
Also on Azure.
It's available on Azure AIFoundry.
You can deploy the full DeepSeqand you can use as services.

(26:23):
But here you can use asservices.
But here you can see that alsothere are available the
distilled models and thatdistilled models are a reduced
version of DeepSeq in this case.
So models that are passedthrough a process called
quantizations and through asecond process in this case,

(26:46):
from the case of DeepSeq calledquantizations, and through a
second process in this case fromthe case of DeepSeq called
distillations.
And distillation, as you cansee here is another technique
that is using open source AI.
So the distillation is amachine learning technique that

(27:10):
involves transferring knowledgefrom a large model to a smaller
one in order to create a modelthat has the same features and
knowledge of the big but of themedium.
In this case, dpsic transfer itto a smaller model.

(27:32):
So in this case you can seethat here DPSIC is providing
several distillation of DPSIC,so it's coming from these models
.
These are the base model thatis used to.
Deepseq has trained this modelin order to have a new model
called with these names, ah.

Speaker 1 (27:56):
It's a voluntary model.

Speaker 2 (27:59):
So with this process, just to take it back, so in the
cloud, they have a model thathas billions of parameters, as
you had mentioned.
They go through a distillationprocess and they reduce it so
that it can run locally on areasonable machine.
Exactly, you said that theprecision is off, is there a

(28:23):
difference in the results?
What's the difference with them?
Reducing it versus running itin the cloud?
Is it speed in response?
Is it accuracy?
I don't even want to use theword accuracy.

Speaker 4 (28:35):
The main difference that you can experience on some
scenarios is probably accuracy,because the full model has
obviously more parameters, soaccuracy is sometimes at least
not always but for some tasksaccuracy is probably better.

(29:00):
If you have followed some of theposts that I have done, I've
done, for example, some tests onauto-generating JavaScript
complex scripts for creatinganimations or something like
that, and for, for example,these tasks, probably the full

(29:21):
model is more accurate With thedistilled model, so the local
model is less accurate With thedistilled model, so the local
model is a bit less accurate andyou need to more turn up the
prompt in order to have the sameresult.
But here, for example, forspeaking for interaction with
business center, for example, orfor creating agents or

(29:44):
something like that, thesemodels are absolutely comparable
to the online model, with theadvantage that you don't pay
nothing, that you can deploy itoffline with also a reasonable
amount of RAM.
It depends of the number ofparameters that the model has.

(30:07):
So this number that appearshere is the number of parameters
that this model has.
So, for example, this is 70billion parameters.
This is 32 billion parameters.
This for example is the modelthat I used and I'm still using
for my tests with DeepSeq.

Speaker 2 (30:29):
Which model of DeepSeq are you using for your
tests?

Speaker 4 (30:32):
32 billion parameters .
Here's a distillation ofDeepSeq using 32 billion
parameters and this worksabsolutely great.

Speaker 1 (30:44):
But how do you tell which like?
If you look at the 32 billionparameters, like you're running
it clearly on a MacBook.

Speaker 4 (30:52):
Yes.

Speaker 1 (30:53):
And how do you know if your MacBook will handle that
?

Speaker 4 (31:03):
To know if the local machine can handle that you can.
The number of parametersthere's a calculation that gives
you the rough estimate ofgigabytes of RAM that you can
use in order to run theseparameters.

(31:23):
Very, very rough number is ifyou multiply this number.
Multiply this number by 1.5,for example, is usually a large
estimate of the number ofgigabytes that you need to run.

Speaker 2 (31:47):
So you multiply the number of parameters by 1.5, or
which number Exactly this givesyou about the number of
gigabytes that these?

Speaker 4 (31:54):
1.2, 1.5, 1.5 if you want to stay large.
This is the number of gigabytesrequired to efficiently run
this model locally.
So, for example, this requiresto have at least 40 gigabytes of

(32:15):
RAM to run locally.

Speaker 1 (32:19):
Okay, oh, wow Okay.

Speaker 4 (32:21):
If you have strict requirements or, I don't know,
if you have, for example, alocal machine with 16 GB of RAM.
Probably this is the model touse.

Speaker 2 (32:32):
So if you have 32 billion parameters, you multiply
that by 1.5, roughly in the SL,again 1.2, 1.5, so that's where
you get the 40.
So it's not 32 billion times1.5, it's 32.
So it's a number of billions.
Okay, to be clear.

Speaker 4 (32:49):
There's a more precise number, so more precise
calculation that compares notonly the big parameters but also
other set of parameters.
But in my experience, when Ihave to quickly evaluate if I
can use this model on online oror offline or not, uh, taking in

(33:12):
into consideration theresources that I have, I use
this estimate.
So, uh, these multiply per one,one, one point, uh, two,
something like that 1.2,something like that 1.5, if I
want to stay large, gives me ifthis model is able to run on my
machine or not, 16 gigs 16, 17gigs for that one.
It can also be run on iPhones.

(33:33):
It did take me to a wholedifferent world here.

Speaker 2 (33:38):
So you can run this on the phone, but I just want to
take it back up a notch beforewe get my mind.
Has this whole list ofquestions Amazing?
So we have a large languagemodel that's in the cloud that
went through a distillationprocess to now run locally,
where there's different models,or mini models, I guess you

(34:00):
could say, or distilled modelsthat have different parameters
where you had mentioned.
In some cases, what you maylose is some accuracy In some
cases, not always, not always.
Now I hear about languagemodels being trained constantly
with information on the Internetor trained by different sources

(34:22):
, with this being run locally.
Does it have all of thatinformation and what happens if
the model gets updated?
Is that the whole point ofhaving different model versions
is it has a different set ofdata, or if because a different
set of parameters.
Let's just say we index theinternet for a website.
So let's's just say we indexMicrosoft Learn today and have a

(34:46):
model that's focused onMicrosoft Learn.
They constantly add documents.
I now have a local copy ofDeepSeek that use that Learn
source.
How do I get updatedinformation?

Speaker 4 (35:03):
Exactly.
The main limitation of thelocal LLMs is that they are
periodically refreshed, so it'slocal.
When you have downloaded alocal LLM, like for example,
here in my machine, I have thisset of local LLMs, some from

(35:26):
Microsoft, some from Lama andDeepSeq.
Let me try to do this.
Local LLMs are downloaded withthe knowledge of when the vendor
releases that model, so, forexample, your latest update date

(35:53):
, and sometimes they responsegiving you that you.
For example, 5.4 is not a recentmodel.
It has knowledge, so it hasknowledge, public knowledge of

(36:13):
facts, internet facts until thisdate, probably now.
I have not updated it yet.
Probably if I download a newupdate you can.
It's something like Docker thetechnology so you can download
the model.
It creates a local model.
Then you can pull again inorder to see if there are

(36:34):
updates of that model.
So when I used this model thisis, for example, one of the most
powerful, in my opinion, smalllanguage models that can run
locally.
So FI4 from Microsoft.

Speaker 2 (36:48):
Which model is that?
Again did you say FI4.

Speaker 4 (36:52):
Microsoft FI4 is good Microsoft FI4?
.
Yes, it's this model here.
It's one of the best, in myopinion, models from Microsoft
that can run fully offline.
So, in terms the probably themain limitation of uh open

(37:14):
source and local language modelare if you intend it to use as a
model that knows internet.
So this can be probably thescenarios where they can have
the main limitations Becausethey are created and deployed in

(37:38):
a particular way.
They know the knowledge untilthat particular way and then you
can download.
But honestly, this isabsolutely not my scenarios.
So my scenario is not having achat GPT offline.
That works perfectly because Ican here it fails only if I know

(38:01):
internet facts.

Speaker 1 (38:02):
So if I know who is the USA president, president, I
don't know if it's able to soyou're saying that when you're,
when you download these smallLLMs locally, running locally,
does it not have access to theinternet at all, or can you tell
it to have access to theinternet?

Speaker 4 (38:24):
Jen, usually the model that runs offline by
default has no access tointernet.
You can enable access tointernet, but as default it has
no access to internet.
So because it's trained withthe knowledge when the vendor
releases it.

Speaker 1 (38:44):
At that time it was published, got it Exactly.

Speaker 4 (38:45):
So, for example, if I'm asking DeepSeek, when the
vendor released it At that timeit was published, got it Okay?
So, for example, yeah, if I, ifI asking DeepSeek who is the
USA president, he's giving methat.
As for my last update, joeBiden is the president, because
it's not.

Speaker 1 (38:59):
Right, october 2023 is the last.

Speaker 4 (39:01):
It's an online model so, but so the question is if
you want to have a reliable chatGPT probably an offline model
sometimes can fail because youneed to be sure that it was
updated with the latest datacoming from internet.

Speaker 2 (39:25):
So that's a good point that you make it.
It's all a matter of, or it isa matter of, how you're going to
use, or what you need to usefor the model that you're
running locally.
Right, I want to get into thisand I hope that you publish it
someday.
How do I install this?
But can you train it with yourown data as well on a local

(39:47):
model?
Yes, so if I had, if I was anorganization that had security
reasons that had policies for my, my employees, or I had other
documents that I wanted to putinto the AI so that the members
of our team could use the AI tofind something simple.

(40:09):
So we may have a handbook foran employee, handbook that has
the policies for taking time offfor holidays, where an employee
could just type to the modelwhat are the holidays we have?

Speaker 4 (40:23):
Exactly.
Here is exactly the point wherethese models are, in my opinion
, interesting.
So I think that these modelsare not extremely interesting if
you want to have a chat GPToffline, or at least if you want
to have a chat GPT offline.
There are scenarios where theyare extremely interesting.

(40:45):
For example, if I need to asksomething for coding, they can
give me an answer without goingto internet, so I can also use
it on an airplane or everywhereI want.
Also, from an iPhone, forexample, I can use these models.
But the second scenario is withcompany data, and here is where

(41:13):
I've spent my last months onand also we have live projects
on that using these models,because you can use these models
fully locally, without payingnothing and without having
access to internet for doingbusiness stuff.
So, for example, I at least inmy case I don't have customers

(41:38):
that ask me to provide an AIsolution for going to internet
and asking everything they wantbecause there are co-pilots,
there's or there's a chat gptfor that.
All, all customers that areasking us ai solutions wants ai
solution that works with theirbusiness.

(41:59):
So they want the to have a aisolution that are able to talk
with business central aisolution that are able to talk
with Business Central, aisolutions that are able to talk
with their documents orreasoning, with data coming from
their corporate data, andsomething like that.
So these are the AI solutionsthat are useful for that

(42:19):
customer.
So business solution, not ageneral chat.
So an offline model is great onthat because you can use
function calling.
You can use every of thefeature that you have in one of
the online models wheresomething like GPT-4A or

(42:39):
something like that so, forexample, this model that is very
small can be executed in also60 gigabyte machines.
It has the same power of GPT-4Ain terms of function calling,
agent creation and manipulation,something like that.

(42:59):
And this can work completelyoffline and I can show you some
examples completely offline.
And then I can show you someexample.
So very, very stupid example,but just to show you something,
let me move this here.
For example, I don't want to gointo much details into the code

(43:22):
, so take this only as anexample, but let me reduce this.
So here, for example, I have avery stupid code that uses a
local model.
So this is my local modelrunning in my local.
It uses DeepSeq, so the versionof DeepSeq that I previously

(43:45):
mentioned in my local, and thisis DeepSeq.
So the version of DeepSeq thatI mentioned previously mentioned
in my local environment and inthis example here I imagine that
I want I am a business centercompany and I want to have the
possibility to pass my data tothis model in order to be able

(44:09):
to have an AI solution where Ican ask something about my data.
If you want to do that in usingonline models, for example,
staying in the Microsoft familyyou need to, for example, deploy
I don't know GPT-4A for havingthe DLLM, and then you need a

(44:32):
vector database and a vectormodels like text embedding, ada
or something like that, becauseyou need to convert data coming
from business center tosomething that the model can
understand, and for doing thatyou need also to have a vector
database.

(44:53):
Microsoft has Azure AI searchfor that, and this costs a lot.
So this solution can cost notless than $400 per month minimum
to have a full RAC solutionworking with business central

(45:18):
data and an online model.
The same result of this canalso be executed totally offline
, and this is a very quick,stupid example.
So here I have my model runninglocally, so it's runs on my

(45:38):
machine.
The model is DeepSeq in thiscase, but you can use one of the
available model.
I use DeepSeIG here in thisexample, because the SIG is a
reasoning model.
Now, one of the latest trendsin AI is reasoning models.
Reasoning models are modelsthat, before giving you the

(45:59):
final response, performs a longreasoning process.
They can explain all the stepsthat they use for reasoning and
then they can give you theresult.
And here I also use theembeddings because I want to
pass data, and this is, forexample, one of the available

(46:23):
embedding models open sourcemodels.
I use this because it's thesmallest.
One of the available embeddingmodels open source models.
I use this because it's thesmallest.

Speaker 2 (46:34):
So you have a local language model, DeepSeq,
installed.
You want to train it on yourbusiness central data all local
so it doesn't go out to theinternet.
So you also now need to createor install another model.
What was the model?
You called that To process orto hold your data.

Speaker 4 (46:49):
The model is the embedded model is this I use
this, but you can use different,so the embedded model is used
to work with your data withinthe language model that you're
using.
Without going to the old steps.
Yeah, I'm personally a big fanof this tool, this SDK called

(47:17):
Microsoft Semantic Kernel.
Microsoft Semantic Kernel is aSDK deployed from Microsoft that
permits you to SDK, deployedfrom Microsoft that permits you
to create AI solutions that areindependent from the model, plus
many other features, but one ofthe main features is that you
can create and it abstracts youthe creation of the AI, your AI

(47:38):
solution, despite the model, andthat you can use.
So, with this tool, here I'mcreating my service and in this
service I'm passing data.
Here I put stupid data, butimagine that I pass data from

(48:02):
the sales coming from businesscenter.
Yeah, I, I simply passed mydata.
Just just to provide you anexample, uh, as a list of data.
So the concept is that I, tothe memory of my uh ai model, I
need to pass all the data thathe needs to know, and this data

(48:23):
can be the content of businesscentral tables or summarization
of the business central tables,I don't know.
Here, just to provide a veryeasy example, I passed a set of
data.
So, for example, the sum of theamount of the sales for a
customer in a month and then foreach customer, the same amount

(48:47):
for each of the product categorythat I I using.
So the model now knows that hehas a total amount and a total
amount for this customercategory, uh, this uh item
category.
So here, each data for eachcustomer.
Imagine that this can be yourrough business central table, or

(49:12):
what you want.
So you could pick the data thatyou want to load the customer
table vendor table, customerledger all of the whichever
specific things that you wantyour model to know Any specific
thing that I want that my modelknows.
That's the idea and then I canstart asking DeepSeq.
So here, for example, I runthis.

(49:34):
It will not be extremely quickbecause here I use the biggest
model I can use for that also asmall distillation.
So also the 16 billionparameters is okay.

(49:54):
But here my models hasmemorized all this data and now
DeepSeq is reasoning.
I don't love to match thereasoning part of DeepSeq
because it's long.
You can.
There's a way also to avoidDeepSeq reasoning.
But yeah, I've asked in theprogram.

(50:16):
I've asked the model to give methe sales amount for digital
services in 2025.
So the models need to go intoeach customer, retrieve the
sales amount for that particularcategory and do the output.
And here is, you can see thereasoning.

(50:38):
So my model is responsible forthat.
First of all, sorry I forgot tomention.
I opened this for that reason.
When you run that, I can rerunagain.
You can see.
When you run a local model, youwill see that your GPU is going
to the max, because the localmodel first uses GPU.

Speaker 1 (51:03):
Oh, okay, then memory .

Speaker 4 (51:06):
So I will later relaunch the process.
You will see that my GPU willgo to the top, because the model
any LLM uses GPU at max inorder to perform reasoning,
calculations and so on.
Then when GPU is not available,it uses RAM and CPU, but first

(51:30):
of all it's GPU that is used.
But now you can see that mymodel has responded.
So DeepSeq has done thatreasoning.
Okay, I need to figure out thetotal sales amount for Jesus.
Blah, blah, blah.
It's explaining all this mentalreasoning.
So first look at Contoso and itretrieved that in these two

(51:51):
months.
Contoso has done that fordigital services, then Adatum
only one month Then Kronos andso on.
Then it gives you all theexplanation.

Speaker 2 (52:04):
Okay, now I need to sum and blah, blah, blah and the
total result is this so itbasically you can see what it's
doing to come up with the numberwhen you loaded the data.
You only have to load that dataone time, correct.
Yes, one time so you don't haveto do it for each query or each
question or each prompt Data.

Speaker 3 (52:25):
do it for each query or each question or each prompt
you can.
So if we had a business, we hada business central database.

Speaker 2 (52:29):
we could, in essence, in your example, load the sales
Every day.
We could export or import,however you phrase it the sales
information into our languagemodel.
So now it has up-to-date salesdata.
So anytime we run this it willhave the most accurate

(52:50):
information.
Exactly, oh, that's excellent.

Speaker 4 (52:53):
And as a data store, so data store for these
embeddings.
Now you can have different typeof embedding.
For example, microsoft now hasreleased the support for
embeddings also in SQL Server orAzure SQL, and Azure SQL is
absolutely a good choice interms of money if you want to

(53:18):
use also the online version,because having embeddings in
Azure AI Search or in AzureSequence there are Azure Search
is very costly.
Why Azure Sequence isabsolutely cheaper than that.
But here, just to show thathere is I have asked a question
to my LLM running locally abouta set of data that I have done

(53:45):
and he has done reasoning and hehas provided me a result.
So this can be useful if youwant, for example, to have a
service that is able to analyzeyour business central data and
gives you the query according tothe user question.

Speaker 2 (54:04):
I can't wait to play with this.
I'm calling you later and we'regoing to set this up on my
machine, but once you have Justto show what I forgot to mention
before.

Speaker 4 (54:15):
If I do that again you will see.
Yeah, so during the process ofreasoning, so during the process

(54:51):
of reasoning of your localmachine is increasing.
So imagine a data center.
What happens?
So data center, I read, thelatest data center in US.
The consumption of energy in USdata center, if I remember, is
consumes the.
The consumption of energy inthe US data center, if I
remember, consumes 13% of theenergy in the US.

(55:12):
All the power that we have inthe data center.
So what?
The main?

Speaker 2 (55:22):
What is that that you're running that shows the
graph of the usage the GPU, andso the tool that you're running
that shows the graph of theusage the GPU, yeah, it's called
MacTop.

Speaker 4 (55:34):
It's this tool.
Let me open it, browse.
I use this.
There are different.
I use this.
It's an open source.
You can simply this resourcemonitoring for Mac.

(55:55):
Yes, it's a resource monitoring.
It's quite useful.

Speaker 2 (55:56):
So you're using an open source resource monitoring
tool for a Mac.

Speaker 4 (55:59):
Yes, it's open source , absolutely, that's good.

Speaker 2 (56:01):
This is excellent.

Speaker 4 (56:02):
You can easily install with this.

Speaker 2 (56:06):
So we install our language model.
We can I use the word export,but we can send the data to the
language model from our businesscentral environment or anything
else, any other data that wewant to send to it.
The model will learn the data,train the data, the data you can

(56:28):
ask in this case, deep seek aprompt.
It will show you the reason.
I like that so you can seeexactly what it's doing to come
up with the calculation.
And now we have the result.
So now we're doing thiscompletely offline.
So those that have questions ofsecurity, of data being
transmitted in the cloudsomewhere or teaching a model
that somebody else couldpotentially get the data we

(56:50):
eliminated that because thisdoesn't go out to the Internet.
Now that we have that languagemodel installed locally.
Can we use it with BusinessCentral itself?
So Business Central with thenewer versions has Copilot where
we can prompt or ask questionsand it will do things.
Has co -pilot where we canprompt or ask questions and it
will do things.
Is there a way that we coulduse our local model?
Within.

(57:10):
Business.

Speaker 4 (57:10):
Central to get that information.
Every local model, local model,in my opinion, are suitable for
some types of scenarios.
So Business Central, everylocal model, first of all, as
you can see from here, so everylocal model is available as a

(57:32):
local service.
So it runs as a service fromyour local machine or your
machine in your local networkand you can use with the same
APIs as the online model.
So if I use DeepSeq offline,it's exactly like using DeepSeq

(57:53):
online.
If I use VI4, the Microsoftoffline one of the Microsoft
offline model is the same asusing GPT-4A online.
So in terms of API calls and soon, obviously a local model is
local because it runs in yourlocal network.
So Business Central Online,calling directly a local model,

(58:18):
you should expose this toBusiness Central Online.
So this is honestly, you can dothat, not maybe directly, but
with a middle layer in the alphathat is able to cook to from
the center.
You call something like anAzure function and then an Azure

(58:39):
function can call your localservice.
This is absolutely available,possible Azure function can
expose this in a virtual network.
So in order to have thesecurity of the messages.
But and this is possible soBusiness Central can call a
local model, but you needsomething in order to expose the

(59:00):
local service to BusinessCentral.
If you want to have somethinglike a local a copilot inside
Business Central using If youwant to have something like a
local eco-pilot inside BusinessCentral using a local model.
Honestly, my scenario that I atthe moment used in real projects
are opposite.
So it's a local model thatneeds to interact with Business

(59:24):
Central.
So I need to.
The scenario is I am a companythat I have business center
online, but for my AI solutionsI want to have AI solutions that
runs offline.
So my AI solution is offlinebut needs to interact with
business center in some ways.

(59:44):
So, for example, we have AIsolutions that to reach in
projects that we have done withcustomers is there's a customer
that is working in themanufacturing industry.
They want to have in theproduction departments they

(01:00:04):
cannot use browser but fordifferent reasons and they want
to have the possibility to havea chat that is able to work with
business central data.
So an example is I am in theproduction machine and I want to

(01:00:25):
know where this item is used inmy production order.
I can directly open my consoleand typing where is this item
used in my production orders andthen the local model called
Business Central and can givethe response.

(01:00:46):
That's helpful.

Speaker 1 (01:00:48):
So they don't have to go to Business Central, right?
They just ask local, exactly,or?

Speaker 4 (01:00:52):
something like can you set the inventory to design
into five pieces?
Can you move the starting dateof this production order to
tomorrow?
And we have a solution for that, fully running locally that
permits you to interact withyour production orders,
manufacturing inventorymovements, something like that

(01:01:16):
fully offline.

Speaker 2 (01:01:18):
So the language model that you are talking about, or
what you have set up, is notonly learning on the Business
Central data but it'sinteracting with Business
Central to where it's updatinginformation in Business Central.

Speaker 4 (01:01:34):
Exactly, all locally, exactly.
Another example that I havehere in my machine that I can
maybe quickly show is that wehave in no sorry, not this, but
that we have in sorry, not this,but that we have recently
deployed in a solution is onesecond.

(01:01:57):
Is this One second?
I need to open the window.
Okay, this is, for example.
We have some scenarios where weneed to do image recognition and

(01:02:23):
, for example, we have acustomer that asks us to have
the possibility to recognize ifa warehouse is something like
this, it's not needed.

(01:02:46):
Yes, something like this.
So, picture of the warehousestaken from the camera, and they
want to know if the warehouse isover a certain level of fields
in order to block thepossibility to do pickings, so

(01:03:09):
to put the ways on thatlocations.
So what happens in thisscenario is that there are some
cameras on this warehouse thattakes the picture every X
minutes of these warehouses.
They store the camera imagesinto a folder, in this case, and

(01:03:30):
then here we have a local model, lamavision.
In this case, lamavision is apowerful local model, open
source model, that is able to doimage recognition, ocr,
something like that.
Offline too, right, offline.

Speaker 2 (01:03:44):
Oh, I want to set something up to analyze all of
my, I have like 60,000 photosthat I had taken over the course
of my life.
I wonder if I could use thelanguage model to organize them
for me.
Yes, absolutely oh.

Speaker 4 (01:03:59):
It's possible.

Speaker 2 (01:04:00):
yes, I'm emailing you .

Speaker 4 (01:04:01):
We're setting up a date, I'll send you some wine If
I we're setting up a date.
Oh yeah, for example, if Ilaunch this application, it
starts analyzing I'm hoping tohave not change the parameters
so it starts analyzing thephotos.
Can you see here that is goingto?
Cpu is going to the max, so GPUis going to the max because

(01:04:23):
images are processed.
And what happens here is that,yeah, it's analyzing my
warehouse images.
The prompt that I have underthat is analysis image.
Try to recognize the level offill.
The fill level of thiswarehouse Gives me a JSON
response.
This is the JSON response witha fill level and we store that

(01:04:46):
field level inside the beans inBusiness Central.
So the location in BusinessCentral.
So the model is first analyzingthe image locally and then
calls Business Central in orderto update a field that we have
in the location card.

Speaker 1 (01:05:07):
Wow, so make recommendations for you.

Speaker 4 (01:05:12):
In order.
Exactly.
This is a local service or alocal agent that is able to
periodically analyze the imagecoming from the warehouse and
store the data in businesscenter In order to.
In business center we have anextension that blocks the

(01:05:33):
possibility to do put away incertain locations that are
filled over a certain level.
So this is handled by an agentrunning automatically that every
time checks that camera images.
Handled by an agent runningautomatically that every time
checks that camera images,analyze the images and blocks.
Another example here that weare not yet deployed this is

(01:05:55):
deployed in LIME environment.
Another example related to thatis, for example, that we are
trying testing at the moment isrelating to object counting.
So we have customers that dothat.

Speaker 1 (01:06:16):
Oh, and it counts that.

Speaker 4 (01:06:17):
So we have customers that sell apples, and each apple
must be placed into a box.

Speaker 1 (01:06:28):
And how many apples can you fit in the box Exactly?

Speaker 4 (01:06:31):
And this box contains apples, and we are testing a
local agent that scans every boxof apples and returns the
content, so it takes an image, apicture Exactly every box of
apples and returns the content.
See it takes an image, a pictureas you Exactly it takes the
pictures here.

(01:06:52):
If I have now, it's just thetext, but if I have this agent
that every time now you can seethis working starts analyzing
each image and gives me thecount of the number of the
apples that there are in thisimage in a JSON format that I

(01:07:15):
can use in order to do actions.

Speaker 1 (01:07:18):
Wow, that's so cool.
This is amazing.
So your own local.

Speaker 2 (01:07:21):
So, as you can see, here I have a description and
account it's well, this it theimpressive thing here, besides
it being local and not having touse a cloud service which may
have cost or if you're workingwith sensitive data.
But these are just additionalpractical uses of ai within a

(01:07:44):
business, or even a businesscentral implementation where you
can easily see in this yourscenario where you're counting
apples, where you may have hadan individual have to count
those before now, you can use AIto count those, or even
managing your warehouse withoutsending someone out to see now.
AI can analyze your warehouseand tell you.

Speaker 4 (01:08:08):
It's an autonomous agent that can work when you
want and.

Speaker 2 (01:08:17):
I'm sold.
This to me has opened up a lotof thought.
Even geez in my house, I coulduse this in my house to do stuff
.

Speaker 1 (01:08:29):
Quick question on the inventory Can you use these
mini LLMs to do maybe evenforecasting?

Speaker 4 (01:08:39):
Yes, you can absolutely do.
There are LLMs that are good onthat.
Wow, deepseek, for example, isgood on that.

Speaker 1 (01:08:51):
So you can have your own local LLMs.

Speaker 4 (01:08:54):
If you pass, obviously the LLM, as in the
previous example, so the LLMneeds to know the knowledge.
So if you pass, for example,your I don't know purchase order
, sales order or Item ledgerentries.

Speaker 1 (01:09:11):
Item ledger entries, something like that.

Speaker 4 (01:09:14):
If you pass that to the model, the model is able to
reason on that and it cananalyze your trends and gives
you the response.
That's amazing because you knowhow many times where people
want to do that.
It absolutely works.

Speaker 2 (01:09:31):
There are so many practical uses of this with the
different models.
I'm speechless in a sense.
I can see so many differentuses of it because now we can
interact with business centraldata bi-directionally.
So you're getting informationfrom a JSON in a JSON format
that you can send back andupdate business central, but you

(01:09:54):
can also teach it on the data.
Yeah, and it's all local, soit's secure.

Speaker 4 (01:09:58):
It's pretty local.

Speaker 1 (01:10:01):
Say that, chris, so it's more conversational now,
versus just looking at numbersand then like trying to figure
out okay, this is, this iswhat's recommending.
Now you can to like I'mthinking ahead a little bit here
where you can use this tool tomake the recommendation and
forecast and possibly perhapsyou can send that information

(01:10:24):
back to Business Central basedon the results.
Exactly that's crazy.

Speaker 4 (01:10:30):
Yeah, the power of Asvering, so forcing the model
to not just Asvering text butAsvering in a structural format.
You can in the prompt you cansay the model I want always the
response in this particularformat is powerful because you
can then extract data from theresponse and do actions Like in

(01:10:54):
this example we update thecontent, we update the location
card and something like that.
I have, for example, hereanother example that I'm
currently testing, for example,in our company we have I think
for you is the same we have alot of customers, business

(01:11:18):
center, online customersdeployed on different tenants
and sometimes when we update anapp, one of the apps that we
have on AppSource we have quitea large set of apps in AppSource
we would like to update thoseapps also to the online

(01:11:41):
customers immediately, becausemaybe we have a fix or something
like that, and sometimes thisrequires minimum standard.
Standard is going to each ofthe tenant in the admin center
and the app.
Otherwise, you can use APIs forthat in order to spread the

(01:12:02):
apps to everywhere, but APIs aresomething that are not so at
the end of everywhere.
So our consultants, for example, are not useful to use
automation APIs or somethinglike that in order to update the

(01:12:23):
apps.
So here we are testing an agentfor that, an AI agent, and here
there's a set of AI agents thatare able to talk with our
consultant, asking what theywant to do and providing actions

(01:12:48):
.
So, yeah, just very quickly toshow, because it's a prototype
at the moment, but yeah, we havedifferent agents, so a team of
agents working together and theteam of agents is there's a what
I call here is a customersupport agent.

(01:13:09):
That is the agent that isresponsible to talk with my
consultant.
There's a manager that isresponsible to decide if an
action can be done and there'swhat I call a PowerShell
developer.
That is the agent that isresponsible to do the actions.
So, just to show you somethinghere, if I run this agent, okay,

(01:13:34):
I have here a customer supportagent that is talking to me and
this give you.
Okay, hello, blah, blah, blah.
May I kindly ask if you havebusiness center app that you
would like to update today?
If so, please provide the appID and the tenant ID.
If you would like to update allapps, so update all apps in a

(01:13:54):
given tenant.
Please only provide the tenantID.
Okay, yeah, I can write, let me.

Speaker 2 (01:14:03):
So you designed this agent and you told it to create
the prompt or the question tothe consultant to answer Exactly
.

Speaker 4 (01:14:13):
Here is the agent that is.
I've made my prompt later Iwill show you is just simply a
question to ask.
Politely, ask to my agent whatyou want to do.
I've given the instruction thatif the consultant wants to
update an app, he needs toprovide the app ID and the
tenant ID.

(01:14:33):
If he wants to update all theapps in the tenant, he needs to
provide the tenant ID and notthe app ID and you have
different agents within theagent working together.

Speaker 1 (01:14:46):
yeah, wow, so that goes back to where we're having
a conversation brad like,remember how, how it's different
agents doing specific tasks,and this is a perfect example
where it's calling all thedifferent agents say you need to
work together to do thisspecific task well, you have an
agent manager, right that?
Is so amazing.

Speaker 2 (01:15:06):
So you have agents that have specific functions,
and then you have an agent thatmanages the agents and uses the
specific agents.

Speaker 4 (01:15:15):
Yeah, it's exactly like this.
So, yeah, if I put, for example, this if I ask update, app this
and I forgot to insert thetenant ID, the manager.
The manager asked to thecustomer support to talk to the

(01:15:39):
customer, that the tenant IDmust be provided.
And then the customer supportagent asked me okay, thank you
for providing the ID.
In order to proceed withupdates, could you please
provide the tenant ID?
That's so crazy.
And I put another GUID, forexample.
Let me copy another.

Speaker 1 (01:16:00):
GUID.
I'm so excited about this.
This is a perfect showing ofhow agents work together.

Speaker 4 (01:16:06):
Okay, now I provided this.
The manager analyze.
Okay, everything provided.
Now the PowerShell executor iscalled and now there's a third
agent that updates the app.
Here is a call to admin centerAPI is done via function calling

(01:16:27):
pass the tenant ID and the appID.
A call to admin center APIsdown via function calling pass
the tenant ID and the app ID.
So there are three agents thatworks together in order to make
a task when a customer supportis responsible to ask what I
need to do, complete an action.
The manager is responsible toinvolve each agent according to

(01:16:47):
the task and the PowerShell isfor me is an agent.
This is a perfect illustration.

Speaker 1 (01:16:54):
Perfect illustration of how, what the future is gonna
be, with different agents doingspecific tasks.

Speaker 2 (01:17:01):
This is amazing.
I mean we've gone through.

Speaker 4 (01:17:05):
Without a GUID, the manager as you can see, that is
the agent, the model here inthis case is able to recognize
that this is not a valid GUID.
So the manager is okay,customer support told to the
customer and say to him that theGUID is not correct, and here

(01:17:32):
the customer support say to methat please ensure that both
should be valid GUID.
So here is an example ofinteraction between agents and
this can be useful in order to,for example, provide a user
interface for consultants inorder to update apps on tenants.

Speaker 2 (01:17:54):
This is mind-numbing to me.
I can see so many differentpractical uses of this, so let's
take it back If somebody wantedto work with this.
So let's just take a sequenceof steps which I keep telling
you.
I'm calling you later and we'regoing to set this up on one of
my.
I use a mac.
I use parallel, so I'll createa mac vm and we'll set all this

(01:18:16):
stuff up.
What are the steps that someonehas to go through?
So the first thing is they haveto determine which model they
want to use correct, exactly.
The first is determine whichmodel you want to use correct
Exactly.

Speaker 4 (01:18:26):
The first is determine which model you want
to use.
Based on your scenario.
And starting point.
So first of all let me go astep back.
The first is my opinion.

(01:18:47):
The first is okay.
If you want to run a local model, first of all select the
platform to host your localmodel, and there are different
platforms to host local models,some more complex, some less
complex.
I honestly suggest using Ollama.

(01:19:15):
Ollama is a great platform forhosting local models.
You simply download Ollama forWindows, for Linux, for MacOS,
and when you have Ollamadownloaded, simply Ollama has a
set of models here, the samemodels that I previously showed,

(01:19:41):
divided for vendors.
For each model there are.
If a model is unique, likeMicrosoft VIE4, there's only
this model to download andsimply write olamapool, vie4,
and it downloads you the modellocally.
If you have a more complexmodel, like DeepSeq, you can

(01:20:07):
download one of the availabledistillation of the models
starting from this.
That is the big DeepSeq, thebiggest available in Ollama that
can be run locally to thesmallest I've previously used

(01:20:32):
this, so deep seek simply runthis and your model is up to be
executed into your local machineand available as a local
service.
If you don't want to use Ollama,there's LMS Studio.
That is another available toolfor running local model.

(01:20:56):
Lms Studio is much moreuser-friendly because Ollama has
no user interface.
Ollama runs as a service like Ihave here, or LAMA runs as a
service like I have here.
Lm Studio instead has a userinterface that you can chat with
the model.
It's something like moreuser-friendly.
Otherwise, there are othertools like this, like this.

(01:21:28):
This Lama CPP is another toolavailable to run local models.
I don't remember where is therepo, if it is yes, it is.
You can download this tool andrun easily with a simple command

(01:21:49):
, one on the model, with usingthis command minus m name of
your model or URL of your modelthat you can download from here.
The URL of this that you haveappears here.
Or you can launch a server.

(01:22:10):
I, honestly, all my sample thatI use are using Ollama.
It's easy and it's powerful.
When you have the platform, youcan then decide the model Our
model to use.
Obviously, it depends on yourneeds.
Sometimes you need a lot ofpower, like DeepSeq is able to,

(01:22:37):
for example, doing reasoning.
So if you have something likeneeds to do advanced reasoning
like, for example, I have tocreate a forecasting application
probably DeepSeq is betterbecause it can do complex
reasoning.
More parameters.

(01:22:58):
Exactly Parameters.
Yes, the model of theparameters depends obviously on
your local resources.
So download accordingly to yourlocal resources.
So if I have, for example, Idon't know 60 GB of machines,
probably here is my limit Icannot download these because

(01:23:26):
otherwise it will be too slow tohave a response.
But these are absolutely teststhat you can do.
So you can download the model,try.
If it's too slow, go to thesmallest version.
And my personal experience,deepseq is a great model for

(01:23:49):
advanced reasoning.
So if you have, if you requireadvanced reasoning, general text
question or code generation,deepseq is good.
In the way the open sourcefamily, my favorite models in

(01:24:14):
absolute are these Lama 3.3 forme, is one of the models that is
able to satisfy every need Ihave today, especially when
working with Business Central.
Is able to perform functioncalling, is able to do, honestly
, quite everything.
It's not a reasoning model.
So if you require complexreasoning, deepseq is better,

(01:24:38):
but for every other task, lama3.3 is great.
Otherwise, my required choiceis FII4 from Microsoft.
That is another great OpenSUSEmodel, honestly quite comparable
to what the result that youhave in GTP, gpt-48 in many
fields, and these are alsolisted here in this order

(01:25:03):
because they are the mostdownloaded.

Speaker 2 (01:25:05):
OpenSuite models Okay so we take a platform, we take
a model, we install it and we'reup and running, basically.

Speaker 4 (01:25:16):
You are up and running.
Obviously your model is up andrunning.
You can use your model like alocal chat, like I've done here.
So here I have all my localmodels and I can select one and
start using the local model as achat.

Speaker 2 (01:25:34):
What are you using to connect to?
Which application are you usingto connect to your models?

Speaker 4 (01:25:39):
This is another open source application called.

Speaker 1 (01:25:45):
This MSDF.

Speaker 4 (01:25:48):
If you want to have a user interface Otherwise, via
command line, you can.
Every model offers the commandline interface to interact.
So when you download the model,the model starts and then from
command line you can starttyping and the model answers.
Model starts and then fromcommand line you can start
typing and the model answers.
If you want something more userfriendly, a local user

(01:26:13):
interface is required.
I use this because it permits.
This is an open source userinterface that is able to work
with local models.

Speaker 2 (01:26:27):
What is the name of it?
Again, misty, misty.

Speaker 1 (01:26:32):
M-S-T-Y.

Speaker 4 (01:26:37):
M-S-T-Y Exactly, so you can download for the
platform you want andautomatically it discovers if
you have downloaded local modelsand all your local models are
available here and you can alsoadd online providers.

(01:26:58):
So if you have I don't know ifyou have an account with OpenAI,
an account with DicOnline andso on you can also use the model
from here.

Speaker 1 (01:27:08):
So then you have a local desktop application.

Speaker 4 (01:27:13):
Exactly.
I always use this because it'suseful for testing, for example,
if you want to test a prompt orsomething like that it is nice
for testing.

Speaker 2 (01:27:24):
Then, instead of for creating application,
application are creating in codein my case, so I have some so
we now have an interface to themodel via command prompt or via
a tool, and then as far assending our data to it, does
that vary from model to model onhow to do that?

Speaker 4 (01:27:49):
Sending data to model is not related to model, or for
sending data to model, you haveessentially two ways.
First of all, you can use theREST APIs exposed by the model
itself when you download themodel.

(01:28:11):
The model is available, as Ipreviously showed, as a service,
local service, so you can useREST APIs to talk with the model
and these possible scenarios.
But in this case you need toknow the format of each REST
APIs of each model.
Usually they're quite the same,but you need to know the format

(01:28:34):
of these models.
It's always explained if you goon Hugging Face Hugging Face is
the main portal for open sourcemodels Each model has the
explanation of their APIs.
I honestly don't never do that.
That's why I previously showyou my examples here.

(01:28:57):
Always use use example that Ishow.
Also use abstraction tools,like, for example, here I'm
using a semantic kernel.
A semantic kernel is anabstraction tool, so with
semantic kernel, I don't need totake care of knowing the rest

(01:29:18):
API that I need to use withGPT-48, with DeepSeq, with
OpenAI or something like that,because it does that for me,
with the deep seek, with theopen AI or something like that,
because it does that for me.

Speaker 2 (01:29:32):
So you downloaded Samantha Kernel and you
installed Samantha Kernel andthat interfaces with your local
model.

Speaker 4 (01:29:36):
Exactly this is when creating advanced solutions and
you don't want to rely on RESTAPIs.
It's a recommended approachbecause this solution can be
easily swapped between differentproviders and, honestly, when I

(01:29:59):
create an AI solution or an AIagent or something like that, I
would like to be able to usedifferent providers Also.
The previous example that Ishow, where I show a solution
where three agents work together, I would like also to have the
freedom to have previously in mysolution there was three agents

(01:30:22):
the manager, the customersupport and the PowerShell
executor.
I can, in that solution, I cansay that the PowerShell executor
uses GPT-4a, while the customersupport only uses GPT-3.5
because it costs less.

(01:30:42):
So I can spread the modelacross agents also, and so
creating a AI solution that areplatform agnostic sometimes are
great, because your samesolution can be executed with

(01:31:02):
the platform and if, a monthlater, I want to change the
platform, I can change thateasily.
For example, one example wasDeepSeek Online.
Deepseek Online, when it wasreleased, was the cheapest model
of the history, so DeepSeekcosts really quite nothing and

(01:31:31):
it's very powerful.
So compared, for example, toMicrosoft's GPT-4A, gpt-4a costs
10,000 more for each callcompared to DeepSeq.
We have some solution deployedlots of months ago, when DeepSeq
was not available, that simplyby changing the parameters and

(01:31:57):
changing that parameter toDeepSeq works without changing
nothing.

Speaker 2 (01:32:01):
So that's the key then.
So it's to install the model,choose the platform, install the
model semantic the platform,install the model semantic
kernel, and then you're on yourway, and then as you had just
mentioned Exactly, you can usetotally cross-platform
applications.
So you're not tied to a model.
At that point, the semantickernel will communicate with the

(01:32:22):
model.
You just tell it which model touse.
So in your case, as you hadmentioned, you had started with
one model.
A new model was released and,simply by changing which model
to use, your application wasstill functional using the new
model exactly this is great.

Speaker 4 (01:32:39):
This is great exactly , it's not a platform Gnostic,
and that meets you too.

Speaker 2 (01:32:45):
And my favorite part is this all is running on a Mac.

Speaker 4 (01:32:49):
And this runs on a Mac, I'm obviously I love
Windows but honestly, for AIstaffs, the Macs have something
more.
The.

Speaker 2 (01:33:02):
Macs.
Listen, I like Windows too,don't get me wrong, but the Macs
always have something more, andI'm thankful that we can
communicate with BusinessCentral with VS Code, Especially
for.

Speaker 4 (01:33:11):
AI stuff.
The Mac has lots of more powerthan compared to Windows.

Speaker 1 (01:33:19):
I'm glad you said that.
Yeah, can you repeat that again?
We're best friends.

Speaker 2 (01:33:26):
Well, Stefano, you had blown my mind this is
amazing.

Speaker 1 (01:33:30):
I just downloaded MST , by the way, just so that I can
interact with this stuff.

Speaker 2 (01:33:35):
I'm just telling you I hope you're not going to bed
soon because I'm going to sendyou a text with a question
asking for all these links and ameeting.
So just give me a few minutes.
This is amazing.
You've covered so much andyou've inspired I know me and
I'm sure, anyone listening tosee how you could utilize a

(01:33:59):
local language model or AI inthat sense, there are lots of
scenarios where this fits.
It fits everywhere.
Just your scenarios of thewarehouse the apples, the agents
.

Speaker 1 (01:34:15):
The vision.

Speaker 2 (01:34:16):
It's just to show that you've just, in those
examples that you've given us,have crossed many areas, how you
could use AI to gain efficiencywithin an implementation, and I
think it's wonderful.
It's amazing and I'm sort ofspeechless because my mind is
thinking of all the applicationsof this.

Speaker 4 (01:34:40):
Yes, I think it fits, especially when talking about
the term that now is a lot oftopic today.
So the agentic AI featuresbecause, especially in the
business central world, we arealways so the most common
features are I click an actionon business central and this

(01:35:03):
action does, calls an LLM anddoes something, so it's
user-centric.
Here we are moving a step overof that.
So local LLMs, in my opinion,are extremely powerful when you
want to create agents that workwith business center.

(01:35:23):
So I am a company and inside mycompany, I want to have offline
agents that does actions ordecision or something like that.
Also we business center data.
So or do actions insidebusiness center data, like in
this example stupid example, butthey can, I think, can give the
idea.
So these are local applicationsrunning autonomously and also

(01:35:52):
maybe in teams, in team teams,not teams the application, but
teams in terms of groups ofOrganization teams yeah, Exactly
, you can have multiple agentsthat work together in order to
achieve a task.

Speaker 1 (01:36:07):
I think it eases the minds of organizations or
businesses where they may beafraid of using LLMs online and
they want to maintain their datawithin their organization.
This, right here, is a gamechanger of seeing a good example
of use of local LLMs.

Speaker 2 (01:36:26):
It's not only the security concerns of sharing
sensitive data, and I use theword sensitive meaning anything
that someone feels they don'twant to share with someone else.
It doesn't have to be sensitivein the sense of identification
If I don't want to share mysales, for example, but it's
also a way to help control yourcost, so it's a-factor you have
a fixed cost in a sense, becauseyou have the machine or the

(01:36:51):
hardware to run it.
Yes, if you have the righthardware, but that's a fixed
cost in a sense, outside of theelectricity to power that
hardware, whereas with some ofthese other models, depending
upon how much you use it, yourcost could fluctuate or vary.
Where this gives you a fixedcost and you have control of the
data, um, I don't, I think Idon't even know what to say

(01:37:15):
anymore.
My mind is full of all of thisand now I have a greater
appreciation, uh, for all of thethings that you've been sharing
and posting about locallanguage models or running
language models locally, largelanguage models locally, see I
thought it's not just locally inyour machine.

Speaker 1 (01:37:33):
You can I mean you could technically have this on
azure.
It just means it's offline,right?
It just right so you could putin a virtual machine.
It just means that you don'tneed to give it access to the
online world yes absolutely well, mr stefano.

Speaker 2 (01:37:52):
Thank you, I I was sold.

Speaker 4 (01:37:55):
He had me a hello, as they say.
I know that for you is evening,so having this year no, this,
this is great.

Speaker 2 (01:38:04):
Thank you very much for taking the time to speak
with us.
This was wonderful.
You shared so much informationto help break down what running
a local running a large languagemodel locally entails, and also
extremely valuable scenarios onhow it can be applied.
If anyone would like to contactyou, has additional questions

(01:38:29):
or may want to learn more aboutlarge language models and or see
some of the other great thingsthat you've done, what is the
best way to contact you?

Speaker 4 (01:38:39):
I'm always available on LinkedIn or on X or on Blue
Sky.
You can reach me directly onthat social it's probably the
best or directly from thecontact of my website.
It's the best way to reach medirectly.
I always ask there to as manyof you know.

(01:39:02):
I'm always available to askthere, so feel free to contact
me if you have follow-ups.

Speaker 2 (01:39:11):
Excellent, excellent, and I definitely would like to
have you back on again to followup to this, because, seeing all
the great things that you'vebeen doing, I can only imagine
where you'll be in a few months.
So we'll have to see if we canget you on later in the year to
see where you have taken some ofthis.
Are you going to directions inNorth America?

Speaker 4 (01:39:31):
Unfortunately I will skip directions in North America
this year I will be at theDynamics Mines and we are
organizing the Business CentralDay event in Italy with
directions.
We have a lot of work in orderto be able to do this event and

(01:39:58):
it is extremely near toDirection NA.
My initial plan was to go withDuilio to do the session that we
have done in Direction NA aboutlarge customers.
This was a very appreciatedsession and we would like to

(01:40:20):
repeat that session to DirectionNA, but when we started the
organization of DirectionNA, butwhen we started the
organization of Direction Italy.
Unfortunately we are forced tohave a fixed date by Microsoft
Italy because they give us theheadquarter for the event and so

(01:40:44):
it's extremely near todirection and for me it's not
possible to be outside mycompany.
It's so large.

Speaker 2 (01:40:54):
We understand that.
We ourselves run into thechallenges of which conferences,
of which events to attend,because there are many and, as
we talked about, there's sometravel considerations as well.

Speaker 4 (01:41:05):
The problem is that sometimes these events are
really near each other.
So when you have a get a majorchoice that requires uh.
So sometimes my company isflexible to permit me to go a
week outside for events abouttwo weeks uh when you're doing

(01:41:27):
all this great stuff.

Speaker 2 (01:41:27):
I can see that.
That's okay.
We'll have pizza with willioagain.

Speaker 4 (01:41:30):
So it does, it's, it's uh, we, I will be, we will
be us for sure in this year.
Uh, it's a promise that I'vedone with willio.
If not, the ration na may be,uh, the other direction US, or
something like that, but we willdo it.

Speaker 2 (01:41:46):
Well, there's Days of Knowledge, we'll do it.
And then Summit is in October.
And the call for speakersopened up for that, and that's
in October.

Speaker 4 (01:41:54):
We are planning to go in one of that.

Speaker 1 (01:41:58):
I'll be looking forward to see you in person.

Speaker 2 (01:42:01):
Yeah yeah.
That would be excellent.
Well, sir, thank you very much.

Speaker 4 (01:42:04):
We appreciate you taking the time with us, and I
look forward to speaking withyou soon and, as always, for
your great podcast and yourgreat initiative you are doing.
Thank you, thank you very much.

Speaker 2 (01:42:14):
We appreciate you.
Thank you very much, sir.

Speaker 1 (01:42:16):
Thank you, stefan.
All right, ciao, ciao, bye-bye.

Speaker 2 (01:42:19):
Ciao, bye-bye.
Thank you, chris, for your timefor another episode of In the
Dynamics Corner Chair, and thankyou to our guests for
participating.

Speaker 1 (01:42:29):
Thank you, brad, for your time.
It is a wonderful episode ofDynamics Corner Chair.
I would also like to thank ourguests for joining us.
Thank you for all of ourlisteners tuning in as well.
You can find Brad atdeveloperlifecom, that is
D-V-L-P-R-L-I-F-E dot com, andyou can interact with them via

(01:42:52):
Twitter D-V-L-P-R-L-I-F-E.
You can also find me atmatalinoio, m-a-t-a-l-i-n-o dot
I-O, and my Twitter handle isMattelino16.
And you can see those linksdown below in the show notes.
Again, thank you everyone.

(01:43:13):
Thank you and take care.
Advertise With Us

Popular Podcasts

Stuff You Should Know
Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.