Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Adopt your processes to work with a virtual being.
I don't like chatbots actually because that is just an
interface. Thinking about virtual colder
because it's much more interesting when you can give it
anything and invest in the tools.
Welcome to episode 46 of Tool Use, the weekly conversation
about AI tools and strategies brought to you by an edit.
I'm Mike Byrd, and today we're talking about practical AI for
(00:20):
businesses. What strategies really work for
robust, fair and effective AI systems?
This week we're joined by Wolfram Ravenwolf, the AI
evangelist at Elamind, a Thursday Eye regular, and an AI
engineer. Wolfram, welcome to tool use.
Thank you. Great to be here.
I'm loving it would. You mind give us a little bit of
your background, how you got into AI?
So funny anecdotes on my first computer, which I got at the age
(00:41):
of 10 in 1988. It was a Commodores C64 and I
named it Eliza after the mother of all check bots because as a
child I was totally fascinated with AI already.
I even bought in basic, basic implementation of Eliza so on
this computer. And so my heart has been beating
for AI for a long time. But personally, I went another
(01:04):
route and I was just going into IT and to do to, I have 30 years
of Linux experience and that's worked as a system and network
administrator for over 20 years.The last couple of years we are
dev OPS engineer, Kubernetes stuck out all this stuff.
And yeah, then the CHA cha BT moment happened and I was
(01:24):
already back on track for AI that way because in summer 2022
when Stability Fusion got launched, that was what really
excited me when image generationwas great.
And then CHA GBT came and I wanted to have it on my own
systems and use it myself and and said llama was leaked the
original metos llama. And that is when I started
building an AI system to work onthis and experiments with it.
(01:48):
And I posted all my experiments and evaluations and everything I
coach on Reddit. And that was back in the day
when I think Hartford was smoking his uncensored models.
The block was commanders, I think them and I was testing
them though both were age aspects and it was less than
three years. This is amazing how fast the
time is progressing of first time moved in AI land.
(02:11):
Amazing. Yeah.
And I recently joined Alamine now just in April as an AI
evangelist and engineer and we are working on Eva platform that
will be released soon. And so, yeah, evaluation, that
has been my back story, what I've been doing all the time,
because to run AI, you want to run the best AI you can,
(02:31):
especially if you have a source constraint as everybody else in
the end. And though it was always very
important to me to test these models to see how they work and
how to optimize them. So I tested models, I tested,
you know, inference settings, I tested prompt templates,
everything that affects the model.
And nowadays you have like systems and you have multi
(02:53):
modality and all that stuff, which makes testing and
devaluating even more complicated, but all the more
relevant and necessary. It's been such a crazy time and
the ability to have this open source community come together
and and share their experimentation.
That's one things that got you on my radar was just being able
to take these different things, share the community so we can
(03:13):
all learn together. One area that I found there's
like hits and misses with is theexcessive demos that come on the
AI space. So when you're working with
companies, what are some of the common misconceptions that
people have about what AI is actually capable of versus what
people think it can do? Yeah, that's very interesting.
You have so the different levelsof understanding of AI.
So the management often just reads about AI and it's a new
(03:37):
big thing. So we have to have it and in the
company where I worked before that where I was doing DevOps
engineering. So our our founder was saying
OK, AI scissor speak, we need it.
And he sent everybody the whole company on a 2 day workshop that
was must be summer 2023. So still relatively early.
(04:00):
And he said everybody there and had to workshop organized how
can we use AI anywhere in the company.
So it was from the mindset it was ready trade and I lost it
because I was already on that trip in the way.
And so I left that and I then transitioned into a newly formed
AI department where I was working as the AI engineer and
(04:20):
set up inference systems and arranged the models locally and
on the cloud where we also hiredthe data engineer.
And we worked together on this and one manager on our team.
So we could really put it in thecompany.
Yeah, yeah. Until the company got bought by
another company and through the restructuring and stuff and all
(04:42):
that. So it was a bit slowing me down
and that's why I decided I wanted to switch it up and go
work for dynamic startup with a team of engineers and push
evaluations further that way. So, yeah, of course you have
these people, if you have some, yeah, normal people in a way
(05:03):
that everything, you have the developers that are
experimenting with AI, you have the management, you have
marketing which wants to use some of these tools like image
generation is big, all the videogeneration that is now getting
even better. And yeah, I think everybody can
(05:23):
and should use AI just because it is the most important
technology that humanity has ever deceived.
I think fire has changed everything, electricity has
changed everything, and now AI is changing everything as well.
And I'm convinced of that. So even when people come back, I
had somebody come back from maternal leave and I asked her,
(05:46):
OK, how are you using AI? She said I don't use AI at all.
I just have a little baby and I don't have time for this.
And I said, OK, but it's the baby is keeping you up at night
or is toothing teething, anything like that.
You can ask the AI about that and it can help you.
It doesn't have to be just a business tool or anything can be
(06:06):
your helper after you wear. So people should get used to it.
Like with computers. It's not just a business
machine, home computer, personalcomputer, gaming devices,
everything. You get to use the technology,
you love the technology, and then it can help you in your
professional life. I fully agree, in personal life
I've been using AI in different ways and it's just unlocked a
(06:26):
few different things I've been immensely helpful that I didn't
really foresee as a possibility and I've been deep in this for a
while now. For those businesses that are AI
curious, just starting to get into it, do you have any general
low hanging fruit that is something that they can quickly
adopt or something they can quickly run an experiment on so
they can start seeing that valueright off the bat?
I think you need to take a bit of money in your hands because
(06:47):
it is an important tool and goodtools that cost money.
And if you start with the free tools and yeah, it's nice to to
get a look, but it will not convince anyone.
So if you are this has getting gotten much better than when old
May I made to the GBT for all the default model for everybody.
That was a big movement, of course, but before that, when
(07:09):
you had shared GBT 3.5, for example, on the free tier that
you tell people AI is straight, they check out a free model and
they are saying, oh, it's not working as much as it's
shortened. So if you want to show somebody,
you don't even, you don't reallyhave to go to the, for example,
JGBT poll TR level, but the pluslevel for just 20 bucks for get
(07:32):
one account and give it to your,your, yeah, your, your, your
broadcast, your employees and let them work and experiment
with it and use it. And I would highly recommend to
hire somebody to show them the ropes.
Or if you don't have anyone, like all technology, when
computers came into the companies, you had some tech
(07:53):
guys who were interested in it doing it as a hobby and they
could help everybody. So you need somebody who has a
bit of knowledge with that and the curiosity.
And I think you should be able to find someone if you have a
good company, because then they can show others how to use the
AI, what to consider, what to do, and which AI is good.
If you just give somebody a J, GB, T account and they don't
(08:15):
know where to select all three, for example, and what it is good
for and what it's not. And so there's a learning curve.
And what I'm pretty sad about isthat the AI usually doesn't even
know about it. So you can't just use any AI and
ask it which of the models that are available in the same
interface is good for this or that.
So that is usually not at the prompt.
(08:37):
And yeah, that is a bit of a downer actually, because why
doesn't the AI help me decide which AI is good?
Or why don't I have just one interface that routes to the
rights AI depending on that? That would be a big unlock, I
guess, making it much easier forpeople.
But yeah, get the best AI you can get somebody to show you how
to use it and then use it, practice with it, learn it and
(09:01):
don't blame people if stuff goeswrong because we all know what
AI is not perfect, but people are neither.
That is a very important thing to consider.
People are not perfect. And if it does matter, you have
to have a four eyes principle atleast, or checks that also apply
to AI and tell people that they are responsible.
So what the AI does. So if it does matter, they have
(09:23):
to double check it. And that is not important to
learn where you can trust the AIwhere you can't.
But the same would be with an intern.
I like that example that you compare AI to a smart intern who
knows a lot, but he doesn't knowyour company or she doesn't know
what you are working on. So you have to be explicit to
teach it. And in a way, everybody has to
(09:44):
be a manager that way and teach the AI, show it what to do and
how to do it. And yeah, adapt your processes
to work with such a virtual being in your company.
Now, virtual cold worker, I don't like chat bots actually,
because that is just an interface of type and stuff.
(10:06):
Thinking about virtual cold worker 'cause it's much more
interesting when you can give itanything and invest in the
tools. Have some tools where you can
use AI without much friction. It has to be easy to use.
That is one of the things I wrote myself.
So at the company I wrote the tools so you could just press
the hot key and send whatever you have selected to the AI to
(10:27):
have it translate or check for correctness or write an e-mail
response and stuff like that. So it's rather easy to do.
And then I taught the people to use that too and not just give
it out, but teach people and refresh them and to make sure
they have it updated. And so you need somebody to
really do the evangelism in their company and make sure that
(10:48):
the tool is used and set the people who get better with the
tool. That is very important.
And the tool, it must be easy, it must be powerful, it must be
easy, and you have to have somebody responsible for making
sure that it is getting used. So many good points in there.
I, I, I fully agree with a lot of it.
The idea of having a coach or a teacher to come in and just give
(11:08):
you an idea of what's possible, An environment of encouraging
curiosity where just say, if you're about to a task, how can
you fit AI into it? Can you just like have a little
back and forth with it? Can you actually try to find a
tool that accomplishes it? It's all really good stuff when
a company is trying to select anLLM.
So they've started using different tools and they've
realized there's something that satisfies most of the case, but
(11:28):
they want to build something in the house that's perfect custom
to them. But they don't know which LLM to
go with, whether it's opening Anthropic, any of the big ones,
any of the local ones. Do you have a process or a
general idea that people should follow in order to help evaluate
these models before they get into two specific stuff?
Like when would I choose model Aover Model B?
That is something most businesses actually want to have
(11:49):
a local AI though. They can use data privacy
relevant stuff and not send it out to the Internet.
Right here I'm in Germans or in Germany and in the Europe you
have very strictly the biggest speculations, of course.
So most companies, they can't even use open AI, for example,
unless it's a specific Azure version and they have special
(12:10):
contracts and stuff like that. It's very obvious that they need
something local and of course and they are asking the next
question, do I do it in house ordo I rent a server on service?
So there are some services like mistrust for example, you could
rent it in the EU and then you would test regulations that work
with it. But if you want to do it
internally, I would definitely come in to 1st get started with
(12:33):
the the big models that you can on your system.
Of course, it's always resource constraint.
But yeah, follow the news, follow the people like me who
report about it. And in the end, right now I
would say if you meet in Germanyor in Europe, you have the
languages. So you you don't have much
(12:53):
choice actually, because the Chinese models are very smart.
But in the European languages I specifically tested in Germany,
they are not as strong. So you have German 27-B or
smaller if you need to, but the biggest gem I you can run
locally that is available. This is 27 B.
So that would be my first choiceor one of the Mistrum models
(13:13):
which are very strong. I would test these.
And you always have to make yourown benchmarks because in the
end, nobody knows what you are doing with the AI.
And that is what I always recommend to people.
If you have something the AI is not doing, it's not working,
write it down. That is a great benchmark
because if it's not working, youcan test it with another AI to
(13:34):
see if it works. So you can overtime collect a
couple of these cases where you have some personalized
benchmarks that you can use to evaluate if the AI model is
finally being able to solve the issues you have.
And it's amazing to see that progress.
I remember when Quill, the QBQ came out and I could run
(13:54):
finally, could do stuff locally that was impossible to do
before. The reasoning was so great.
And I was very amazed when it finally did these tasks that I
had to refer to online AI first.So that was a big unlock.
And yeah, as a business, ask your your your employees to
collect these special cases where, you know, it would really
(14:17):
help the business if you have this sourced.
And then the if it can be solved, if it's not specifically
protected or private or anything, do use an online AI
because unfortunately they are still better than the open
source AI right now. Or if you are using keepsake.
I want. OK, that is so big.
(14:39):
Yeah. I don't think most businesses
will start with that. But of course you can build on
to it. But yeah, I would always take
the best AI that is currently available, no matter where it
runs, to test and see if it actually does what you want to
do. And then if it works, then, you
know, AI can do it currently. And then you can go down in size
to local models or to quantized models and see if it can still
(15:03):
be done. And so you can tune into which
model you want to use. And yeah, like I said, I would
stop with Jemma 27-B, see if that works.
Yeah, I I like command R and command A, for example.
There has been a long time. So everyone's locally, but you
need a commercial license for them.
Yeah, check it out to trend. I love trend.
(15:25):
Maybe even build a pipeline where you have the smart Chinese
model do the task, and then havemodels that may not be as smart
but it can write your language better to just summarize and to
convert it so it's more presentable.
There's multi multi model systems you can set up to really
start leveraging the strengths of each of them.
One thing you touched on which Ithink is brilliant is having
(15:46):
everyone in the company try to just store what doesn't work and
then have that as an eval set. Start having the local benchmark
so you can start building up these capabilities or testing
different LMS in a system. Do you have any approaches or
strategies to help make sure this is either automated or just
easier to do? And how would you store this
information for starting to collect these failure cases that
(16:06):
we want to have succeed? So usually a company has some
kind of knowledge management like Notion or something or wiki
or just documents on your system.
So just put it somewhere that isthe easiest path and it's pretty
easy to copy and paste. And usually you take it out of
somewhere and then you give it to the AI.
So you already have it stored somewhere.
(16:26):
Maybe if you are writing it up on the spot and the AI sales,
then before you save it, it would be cool if you have some
AI guy in your company to look at it.
So I when I was doing that at the previous company where I
was, yeah, may I have an interest as well?
The people always called me and so I looked at the query and
(16:48):
could have them and tell them, OK, this can never work because
this or that and maybe the context window is too small or
you haven't given enough information.
So that is important because youwant good test cases.
But if somebody is not available, just write it down.
Somebody can later go through them and picks the best cases.
(17:08):
So just write it down some yeah And if you have some yeah broker
was in your company or something, you can think about
making a proxy or something thatis actually saving fees or you
set up local. They are like open that UI for
instance, where you have the village to write answers and put
them in the database internally.Yeah many options but you
(17:30):
definitely should collect because AI gets better all the
time and having some good puzzleevaluation is very important to
find out if it is working for your use case.
Because a big MMLU pros Co-op doesn't help you if it doesn't
speak German very well and doesn't do your task if that is
what you need. And just Speaking of MLU and the
(17:50):
other major benchmarks, do you even find any value in them
anymore or is it the local setup, benchmarks and evals that
matter much more? I find a lot of value and
actually I'm doing the MMLU fullbenchmark myself with the models
because the big thing is these of these benchmarks if they are
widely available, which is a negative stretch well, but at
(18:11):
least most of the models that get released, they get this
course for these benchmarks. And now you locally you are
usually running a contrast modelbecause you don't have the
resources for the full FP 1632 bit version.
And then you can use, so you're available MMA, your pro
benchmark locally with your Quons and your prompts and
(18:32):
stuff. And see prompts is not the right
word, but your settings for example.
And then you can see how does the score change between what
they published and what you get on your own system.
And I've been doing very specific benchmarks.
For example, when Trend was released or quilled that model
as well, I was doing various Quons.
(18:53):
I'm doing the same benchmark andso I can see if the four bit
version is much, much worse thanthe 8 bit version for example.
And so that helps a lot to find the proper quant and see.
I often found some issues with the model itself where it was
the, for example, an 8 bit version or different format like
GGS was doing much worse than anEXO tool for instance.
(19:16):
That's that can be caused by wrong tokenizer config in the
model or just a fade quant or stuff like that.
So yeah, the benchmarks helps you to find these out.
And that is also the tip I have to get because when you download
the model from Hugging Face, forexample, you download the new
model was released, you get the GGS and yeah, you go.
(19:39):
You should watch it, especially if it's new, because sometimes
there are these issues and they get fixed and then there's an
updated version and yeah, you may not even notice.
So you may be running a reverse version.
And that is really bad if you just test it that way and then
you'll say, oh, the model is bad.
I don't know what everybody elseis doing because on my system it
doesn't work. But maybe you have a version
(20:01):
that has a problem. So updating is actually also
important for the models and sometimes they are optimized
like the Anslos team is doing, which I respect a lot.
So where they look at the model and say find some little issues
of the commands that were made were not optimized because yeah,
it's a science for itself. And it's very important to make
(20:24):
sure that the model you are using is good and it doesn't
touch if the original model is straight but the versions you
have has some issues and doesn'twork very bad, that is also very
important. And I'm glad you brought the
Unsloth guys because they publish everything openly and
it's really cool to see just theprocess of diving in.
As someone who's more in the application layer rather than
the the training and ML layer, Istill find it so interesting
(20:47):
just being able to see how you can really dissect these things
to to optimize performance. I wanted to add to that because
that also shows that there are many, many factors besides just
the model weights because you also have the prompt format that
you are using the actual prompt.Of course everybody knows the
prompt, but the format itself isalso very important.
And if you are using the chat completion endpoint, usually the
(21:11):
inference engine does the formatting for you if the
template that is part of the model is correct and there have
been issues with that as well. And if you are using just a
completion endpoint, you have todo your own formatting which is
still locally sometimes used. So that is also very important.
I did a lot of template work before.
For example, 3D tenant is actually one of the inference
(21:32):
engines where I did a lot of thetemplates and cobalt CPP for
example, both ways on the end and back end I was using.
So I was making sure that the templates were right.
That is one thing. And of course sampling have a
major so they are yeah, everybody knows temperature, but
there are so many others. Repetition, penalty, top K and
(21:53):
all these tails, these sampling stuff.
Most I don't use but you always have to check which are the
default settings here, influenceengine or the model provides.
Especially if you are using for example Olama, they have those
baked in the model files and youshould make sure that they work
like the context. If you don't have the right
context length, then your big model or your good model is not
(22:16):
as good because it will be forgetting stuff.
And there's always of course resources.
The bigger the context window, the more resources you'll need.
So you have to see where where it fits.
And that is also something everybody has to do to optimize
the inference engine and make sure that the settings fit to
their use case. That's very important.
If you are using that and importing big documents in your
(22:37):
context window is the small, it can't work.
And that is not the models faultor the software's fault.
So that is also something you have to evaluate.
User error is a real thing. You you'd mentioned O llama.
I'm a fan of O llama. I also use Jan for my local
inference. Do you have any tools that you
use for local inference, whetherit's for testing or for using,
that you'd recommend people check out?
Oh, I started with, let me think, Ubabooga.
(23:00):
Actually Ubabooga's VEP UI was one of the first cobalt CPP O
llama at work. I used that.
So that and yeah, of course there are Llama CPP, the
granddaddy and now that it has its own UI and especially its
own server where you can use it as an API that was a big unlock
as well. Excelama is very fast if you can
(23:23):
put it all on GPU. So my personal AI workstation is
48 GB VM. So it's my my desktop PC.
But I got 2230 nineties into andI know that a lot of companies
are also using 240 nineties, fifty 90s, mostly 2 cards and in
(23:43):
that unless you go really big with an HR 100 or something like
that. But that's a company I was
before. We also got one system with two
traffic. No, it was just one.
It was one, but it only had 48 GB as well.
So basically you should use whatever works best for you.
Of course Olama is great but I wouldn't use it for production
(24:08):
it it's more for testing I wouldsay.
While same like doctor and Kubernetes, I would use the car
for easy deployment and development and Kubernetes if
you want to go full scale. So all VLLM, for example, that
is really fast and perform and case of I have no experience
with. But there are so many options as
you see, and you should check out which one works best for the
(24:31):
situation where you are. Do you want to use AI internally
for your for your your team? Then you don't need as big and
parallel insurance, depending onthe company size, of course.
Then when you want to provide itto your customers and add it on
scale and stuff. So yeah, I don't have much
personal recommendation. Right now.
(24:51):
I'm on the Mac using LM Studio because it's so easy to use and
use whatever floats your boat I guess.
I I'd say should all work very well and no matter what you do,
evaluate, evaluate, evaluate, you always want to benchmark the
system, see which is faster and that can depend on the model as
(25:12):
well, for example multi modality.
That is also an issue where you don't have it available for all
the systems unfortunately, so you may be forced to use a
special inference engine that supports the multi modality for
the models you are using, although the support is getting
better now. One Ave.
I'd like to explore a little bitis when people are having these
local models. I also use a MacBook so I can
(25:33):
just run it on my machine. You know the the smaller models
but does the trick. But when you get to the small
business scale or even medium business and they have the
privacy concerns. So they don't want to use open
AI, but they're unsure of how todeploy it or where they should
have their local AI, whether they want to set up a server
somewhere in order to run it. Do you have any recommendations
on how people can kind of explore the hardware side of
(25:55):
things, whether it's owning or even renting servers and at like
what stages they might want to explore more before over
committing if they don't really know what their solution is yet?
That is a bit like a server in general.
I think if you are a very small company, you may be more well
served as a cloud services because you don't have a
dedicated team to maintains assistance.
(26:16):
If you are medium sized company,you probably have an IT
department and a lot of servers locally is all getting one more
and put it in somewhere in a server room.
It's not a big deal. Of course you should have some
people maintain it that know about the L&M part or the iPad
in general as well, which inference Angel like we just
discussed to use and to keep it updated and to the models and
(26:39):
stuff like that. You should have somebody
knowledgeable of course, becauseyeah, it, it doesn't help if you
don't have anyone to keep these systems updated.
Among us, the company where I was before when I left, I asked
together, do you have some new models now that's been made
available like 4 recently though?
No, they are still on 3.5 I think, or maybe 3.7 even, but
(27:03):
not any new one. And you need somebody who takes
care of these systems, of course.
And yeah, credit cards, they arehard to get, getting more
expensive, unfortunately, we allwant more.
Or you could go the other route and say, OK, I'm going for like
DeepSeek and I want to do the multi model, the MMOA models,
(27:24):
mixture of expert models where you can put it on RAM and still
have good performance. If you have a very fast CPU,
that is another route. Or maybe do both.
Of course it costs more that way.
But yeah, you want to have options.
Of course, that is one thing. And yeah, it's still buying a
(27:45):
system and having someone maintain it is of course it's
permanent cost that you have to maintain.
Whereas if you use a cloud service like as AI or one of the
others or they are specific providers, you can use Hugging
Face for example, if you are using them, they take care of
the maintenance. So that may make sense.
You just have to check your regulations if you can do that
(28:06):
or if you have a local system. Mistrial now just started with
their where they want to build abig compute and offer it.
So there are options. There will be more options.
I think as always, you may want to start with some service where
you have just you know what it costs and you can cancel it at
(28:28):
any time and switch to another service.
If you get a big CPU and now themodels need more GPU, then yeah,
you are locked in a bit and you have to upgrade it.
And if you go the other way, youinvested a big GPU system and
now the Moe models what where itis at, then you may be locked
down there. So as a business you'll probably
(28:49):
want to be the flexible. There's a couple thousand boxes,
not the problem. Then you can just get a local
machine and to make sure that everything is only running
locally. What I did at the previous
company was just have two systems 2 times open back UII
was using, there were two systems, 1 was externally where
I had all the external models integrated like JGBT clawed.
(29:12):
What did you have as well with the big llama for example, stuff
running externally and it was all rats.
The interface was seen so peopleknew, oh careful this is going
outside. And then we had a local system
which was all cream and it was all local so people knew OK, I
can put anything in here. It is staying on my local system
(29:33):
in this, Yeah. So I would recommend to start
with anything that is good and then decide if you wanted in
house, build a team and build a server.
Good advice and I really like the day of having a very visual
UI about what's internal and external.
So you can kind of be like safe to to put in anything your
business data and then just general queries.
(29:53):
Let's see, I really had to tell the people that they should use
the external system by default and only go to the internal
system if it is sensitive data. Because of the system we had was
a 48 GB VM, the only at the quantized local model, so it
wasn't full quality. And even the big ones, they
didn't compare yet to what you could get from Open AI or in
(30:16):
Shopic. We're getting there bit by bit.
Do you see, do you think of any other hidden expenses or general
cost for businesses who want to host their own AII?
Think there is an issue if you, yeah, if you only offer internal
AI and it is the best you have and the people are not allowed
to use external AI, but models you are serving are not working
(30:41):
for the use case, then you may have a cost that you don't even
see because basically lost opportunity.
Or yeah, you you don't get the full benefit AI could give you
if you were going to a better AI.
And I love open source, I am bigopen source advocate, but we
have to be realistic, especiallythe business context.
(31:01):
It's not just ideology. And yeah, right now it is still
though that the AI is still advanced, more advanced.
And I think the gap is closing all the time, especially if the
Chinese are doing a great job atthe vet.
So. And I use local AI, use external
(31:21):
AII think most will use both andsee where it works.
But just going internal and forbidding your employees to use
external AI may be limiting. And that is a cost that you
won't even see. In that way it's probably even
even more loss if you don't use AI at all.
I'll put banning it that would be bad idea.
(31:43):
Yeah, it's no longer their day and age to be in denial that
this is the future. One benefit that I really like
with post your own model or or leveraging the open source is
the ability to fine tune it on the data.
Do you have a general perspective on at what point
people should explore fine tuning?
Or should they just, you know, stick with hosted, stick with
local, and then just run it stock?
(32:05):
Interestingly, personally I havenot find you the model.
I've been thinking about it, butit's not something I have been
doing. I have been more prompt
engineering and getting my way. That way, even with the big
commercial models, they are doing what I want.
But yeah, if you want to find you and of course you need
somebody to do it and you need some resources of course.
So yeah, it it doesn't have to be very expensive.
(32:27):
And I know from what I see around me and our evaluation
software confirms it, that you can get a lot of better
performance out of a small modelif you find unit for your
specific use case. So I'm not saying fine tuning
instead. So if you have such use case, it
is definitely worth it to to test it, but you'd need a lot of
(32:51):
investment in that way that you need somebody to do it.
You give them the time to do it or some money and then it can
work. So yeah, if you don't have
somebody dedicated to doing it, you may hire somebody just for
this job contracted out or you just try it with a prompt.
And the models are getting better and I think you get a lot
(33:12):
with good prompting of course. And the thing we find you model
is you are fine tuning and then you have the model and now a
better model comes out, so you have fine tune to fine tune
again. You can take your prompts.
You have to adapt them of course, but you can take them
very easily. And that is all the reason why I
didn't find you the model for myself because I want to always
(33:32):
use whatever is available to tested to use the best.
And I would constantly be retuning the models.
And yeah, I'm getting sort of from my use cases, all the use
cases I have, I get it with the prompting.
So I would say the easiest way is to work on the prompt.
But if that doesn't bring you where you need to go and you
(33:53):
don't have to prompt engineer oranyone who has the expertise to
happy with that. And you should have an
evaluation system that helps youif it's getting better or not,
because otherwise you would be flying client and you don't want
that. And yeah, so I would start with
a prompt because it's the easiest way and it's the
cheapest. But if it is not getting you
where you want to go, consider fine shielding.
(34:14):
Max out prompting until you can't prompt anymore, and then
start exploring these other options.
One thing I'd like to pivot to is you've built yourself a
personal assistant, and I think a lot of the future of our AI
software is going to be very individual, very personalized.
What advice would you give to someone who has an idea for a
little tool or assistant that they want to build and they just
(34:37):
don't know where to begin? They go and chat to BT Help Me
Build X but then what do you think they should do?
Yeah, that is the best start youcan actually do.
If you are having an idea, talk to the eye about it.
Of course you could talk to people.
Don't forget that all it's keep talking to people, but yeah, you
may not have somebody around youwith so you same expertise or it
may be an area where you don't even yeah, you don't know much
(35:00):
about. So all your spai.
Of course, then you have to alsobe very careful because we all
know AI can tell you something that's on so plausible, so
logical and it's suddenly false.So but yeah, you always have to
double check and the goods, the better models are doing agentic
stuff. They are searching the web,
they're giving you sources. So that is getting better.
And yeah, that would be the trust.
(35:22):
Start taking the AI you like andthat you have been working with
and ask it. Always use the best AI.
In this case I would definitely use O3 for example, instead of
four because it is doing it's thinking and checking and tool
use and stuff like that. It helped me fix my workstation.
My workstation was totally broken.
I just went to JJBT on my other system and she went through
(35:45):
everything from debugging and finding out that it was a broken
power supply. It wrote the mail to the where I
got the system from so I could get a new power supply.
And then I took the picture of all the cables and it told me
which cable goes where and whichI needed it, which I didn't.
I, I showed a picture of what cable is that because I'm not an
athlete guy. I have all software guy.
(36:06):
So I showed it everything and I took pictures and it said, OK,
you have to unscrew this and then you can pull this out and
do things. And that it gave me exact how to
do it. It, it even looked in the manual
of my motherboard and zoomed in,you know, the all three stuff it
can zoom. And it even took the pictures
that were relevant and dropped them and showed me just what I
needed to see. It was amazing, absolutely
(36:28):
amazing and help me totally check the system.
So and if you want to build something, just do the same.
Of course, if you need to programs and I would recommend
an editor like for example, cursor or Vince or use clock
culture instance. Because if you want to build
something that would be more would work better than if you
(36:51):
add copy and pasting stuff on the web UI.
Although even all three can now give you downloads and create
stuff for you. And yeah, it's amazing what the
models are doing and can do. And it will get even better.
And I think now is the best time.
If you have an an idea, you should be able to realize that
idea. The time has never been better
because you just need to be smart enough and have some
(37:15):
motivation to get started. And then you can quickly learn
all these things. Maybe that could be a point
where I can tell you another anecdote about some things that
changed my mind about AI. Then I was using AI as works at
the previous company. I was always of the opinion you
need to be smarter than the AI to make good use of it.
Because if you aren't, you don'teven know if the AI is telling
(37:36):
of the right thing. And you can't direct this.
But I was. When I moved to the AI team, I
left vacancy on the administration team, so we
needed a lead of administrator. And yeah, it was.
We had a hard time finding someone who was good enough of
his positions or in the end he decided to convert a Windows
(37:58):
admin. And yeah, so I showed him
everything. And he had AI that was the big
unlock because she was smart enough and new computers, but he
had no idea about Linux. But he could have the eyes, I'm
saying. And even if he didn't understand
it, he could have it explained. And it showed me that you don't
have to be an expert in the topic where you are using the AI
(38:21):
to make good use of it. If you have the right mindset to
work with it that it explained. But don't just tell it what you
want to do, but also let your let it explain how it's doing it
and why it's doing what it's doing and stuff so you can.
He learns the looks that way. And it was much, much faster
than if I had just to do everything my by myself.
(38:41):
I was so proud of him. And he was shabby.
Stuff he has learned on his own through the AI.
That was a good thing. I had another colleague, which
we had to, yeah, he should have done the job before, but we had
to let him go because what he was doing is he was saying I do
this and he just let it RIP. And when I was seeing that and
(39:02):
he didn't know what the, I was writing a simple task.
It wrote a lot of code. And she just, I don't know what
it's doing, just do it. And I was saying, no, that's not
the way you can do it. It was a test system.
But do not do this on productionor anything like that.
And that is a mindset thing, howyou work with it.
Let it explained, ask it again. My colleague always said we
(39:25):
should have an automated. Are you sure to be sent for the
I say something. So yeah, you have to to learn to
use the tool. And a silly example, but I've
been using just the ChatGPT app around my yard.
Take a picture of flower or a plant like, hey, what is this?
And it'll give me an answer. Very believable.
And then I just do a quick Google search after just to
confirm the plant. And then I look at the photos
(39:46):
and it got three out of four. So 75% accuracy is decent.
But it was so confident about the one I got wrong that yeah,
got to be skeptical. Even with generating writing or
communications. It sounds like AI people are
getting better at being able to identify it.
So take the input as an input not as like.
Like the sole source of truth. Yes, exactly.
It's still the human that is using the tool.
It if you have some agents that are totally dependent, maybe
(40:09):
robots and something that we cantalk about that.
But until then we have a tool. If we have a user and the user
is responsible for what they aredoing with the tool.
And Yep, the tool can be the best tool.
Like you can have the best hammer and you can still hit on
your finger and touch yourself. Well, that's somebody else.
You want to get the screw in thewall.
(40:30):
It's not like taking the board of strong.
Yeah, that does. That's a great.
Analogy. AI affinity, I call it AI
affinity like you have computer affinity.
Where you I then I I saw when computers got in the companies
you were looking for people who were able to use it.
And yeah, some elder people werenot.
(40:51):
Some other elder people were doing it easily.
So that is a mindset thing. It doesn't have to do with age
or anything. It is just are you open for the
technology and are you willing to learn it?
And that's no matter how old youare.
That is the thing that that we need with AI now.
And you had mentioned when people are building their
projects use O3, so whether it'sthe Chachi PT app or just go on
(41:12):
their website, use a cursor or an IDE like Cursor Windsor or
something like that. Do you have any other tools that
you'd recommend people check outif they're in the interest of
building or even just trying to leverage AI in their work?
I'm a tools guy, I have to say. So I, I want AI to work for me
everywhere I go. So I want to have it.
I got a Google Pixel phone, for example, just because it has all
great integration. I don't even have to unlock it.
(41:35):
I can just say the code word andthe AI reacts and that stuff.
And I have all the AI tools on it as well.
And now that I'm on a Mac, I also got the tools like Ray
Raycast. For example, if you have
Raycast, it costs about the sameas a J GB T subscription, but
you get J GB TU, you get entropic thought you get all the
(41:59):
other stuff. Even Croc is in there and llama
and a lot of things. And then takes the time to learn
to set it up. Like I told you, I wrote this
tool where you can just press a keyboard hotkey to send anything
to the AI. I just implemented that with ray
cast as well. We have the caps lock key is my
(42:19):
hyper key, which is for all the things because caps lock usually
don't need it. I don't.
So for me it's just working as amulti function AI key and I just
press it and this edit key and Ihave on my mouse for example a
button. When I press it I can just draw
a rectangle around the screen and it already gaze goes to an
AI chat and I can ask about this.
(42:41):
I have a key I can press so I can talk to the computer and it
gets transcribed. That is a highlight actually is
also tools or if you are talkingtools, Ray cast and highlight.
And there's code typist, which is standing a local, small local
model and gives you auto complete in any text window in
everywhere where you are typing.And these are little things that
(43:04):
if you combine them, you have anAI powerhouse that you can use
for anything. And I still do go to Churchy BT
for examples or can use all three with all the integration
that they have there. Perplexity was a big thing I
recommended to people because that was at the time when search
was not yet standard in Open My Eye and transhibity.
(43:25):
And a lot of people ask stuff and the model didn't search the
way it's high lessonated. And that was also a bad time.
So I told people to go to perplexity and do that.
I don't want to make too much advertisement, but I think if a
tool is good, it's always a goodthing to talk about it.
But yeah, I'm not sure if I keepperplexity.
If that browser is any good, I will, but otherwise I use it
(43:47):
much less now. I use all three a lot,
especially after the rest rate limits.
I like it a lot. Clot is sometimes a bit too
abstraction or how do you say so?
It can be a bit boring. But actually I have my personal
assistant. My Amy is is a big prompt I have
(44:07):
said I use everywhere. I use it with local AI and I use
the same prompt with Chichi BT Ihave it in perplexity.
I have it everywhere I can have fun, even in Gemini as a gem.
So I always interact with the AIthrough the lens of that sassy
assistant, which makes it more personable.
So personality for AI is very important, I think because when
(44:30):
we are using a tool all the time, it is great if you have a
tool that is fun to use. And when I'm having an issue
with the computer and I get a sense of response that makes me
laugh, then it's not as bad as it would be if it's just a
boring old assistant stuff. That's such a great point where
these are tools that we can really shape and mold into
(44:54):
whatever preference we have for interacting with them.
We don't have to just choose between light mode and dark
mode, you can actually set personality and tone.
One thing that your your answer reminded me of is I have Hammer
Spoon set up locally, which is an interface, a tool for
interfacing with Mac OS. And by taking Hammer Spoon's API
and feeding into ChatGPT also I can get scripts for
accomplishing different things, set it to different hotkeys and
(45:16):
all of a sudden my laptop got that much more powerful just
because I had a little idea for inspiration for window
management or for transcription and is able to do that.
Obsidian, my note taking app, you can generate plug insurance
very easily now. So it's just all you have to do
is be willing to experiment and play with it, and then you can
get pretty much anything you want out of these systems.
And the systems will help you because when I was sitting at
(45:37):
the Mac, everybody is a Mac and I have not been I have been
using Windows for let me think 30 years now and I totally
customized it to make it work the way I want this auto hot key
and other stuff and tools. And now The thing is when I was
on the Mac, I asked AII told it,hey, I, I'm on a Mac now and I
(45:58):
feel lost. I want this I want that.
And like just mentioned, Hammerspoon, that was although a
tool the AI recommended. And another tool that I
recommended was Kevin Beaner. And yeah, I even wrote the
scripts. I just told it, OK, I want my
caps lock key to be a paste key and my shift key to be a copy
key. Because if I just press caps
lock, I just want my clipboard to be inserted.
(46:19):
And if I press shift without anything else, I want to have it
copied because I have a copy andpaste tab.
So I have these two keys. I don't have to press 2 keys
that together. And it wrote the script that did
everything. And yeah, it's great.
You tell you have what you want.It tells you which tools to use
and it will write the scripts for you or give you instructions
how to use it. For example, I wanted to they
(46:39):
call the video and do something.So it told me I get kept cut and
I got it. And I said OK, I'm here, what
can I do now? And it said OK, now you have to
do this and that and you want todo this.
It explains you the tools. If the tool is popular and well
known it can help you use it. That is so amazing.
It's so much fun. All right, well, last question
for me, this has been great. If someone's working at a
(47:01):
company and they want to help promote the use of AI or try to
get more ASSS, Barton, do you have any advice for how they can
talk about it, bring it up and just try to help accelerate the
company move forward? Yeah, depends on your manager or
your founder or boss. If they are numbers person, then
you just look up some studies where they show how much
(47:22):
efficiency gains you get from the AI, how much money you can
save and how many other companies are using it.
That is one thing. But otherwise, it's always good
to show an example. If you can show them how you
manage to do them, things that would have taken you much
longer, if you can just throw them or.
Yeah, that would be the best thing.
(47:44):
I think showing is more than telling.
So. Yeah.
And tell them that you are into AII think a lot of people may
not even know. I when I talked to my boss, he
didn't even know that I was doing all these things in my
spare time. And then he said, oh, I have to
get you on an AI team. We have to do this.
If I hadn't talked to him about it, she wouldn't have known.
(48:04):
I would have done it in my sparetime.
And yeah, I wouldn't be knows. Yeah, it was all on the way
there. So it's important to talk about
it, to show off that it what it does and yeah, show it.
I think it's so good that you should be able to find some use
cases where it can really show that that has a big advantage.
Fully agree Wolfram, I've reallyenjoyed this conversation.
(48:27):
Your passion just comes right through.
Before we let you go, is there anything you want the audience
to know? Yeah, stay tuned because I have
not been posting a lot of evaluations recently, but at
Ella Mind, we are working on a really solid evaluation platform
and yeah, there will be stuff tocome and as soon as I can talk
(48:48):
more about it, I will definitelydo so.
Very. Excited for that.
All right. Well, thank you for coming on
and we'll talk to you soon. Thank you and thanks for having
me. It's a pleasure talking to you.
Keep going with that great show.