AI LLM Prompting Tests - My Results on Prompt Engineering - Opinionated SEO - Digital Marketing News

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Phil (00:00):
Welcome to the opinionated SEO.
I wanted to talk a little bitabout some of the things that
I've been working on over thelast 18 months, but really more
focus in the last few months andespecially the last few weeks.
I've been spending a lot of timeworking with some of the new,
large language models and how tobest utilize them.

(00:22):
And the nuances for each as itcomes to prompting.
So I've been using the followingGoogle Bard, OpenAI, GPT 4 and
GPT 3.
5, Anthropx Cloud 2, Llama 2,and Jasper.
So I'll go into a little bitabout each one, maybe some

(00:42):
overview, what I think is kindof exciting about them, what my
prompt strategy has been, andwhat my response quality has
been.
So let's talk about what myoverall task has been, and I'm
going to keep it a little bitgeneral, but the client wanted
overview content for specificpages that take into account a
unique location for a product orservice.

(01:04):
This means combining location information with
product service content tocreate something useful in the
format of content for the enduser.
So let's start with Google Bard.
Google had a major releaseupdate actually on July 13th,
2023.
There's a lot more available.
There's a lot more languages.

(01:26):
They have really enhanced a lotof the coding side, but I wanted
to give it a shot and see how itwas compared to how I looked at
it when the beta first opened.
So let's start with my promptstrategy after a lot of.
testing.
The prompt seemed to work bestwhen I utilized long form
paragraphs, numbered tasks, andused refining techniques using

(01:49):
multiple prompt iterations.
My response quality, I was notable to get a production ready
response that I would feelcomfortable putting on a
website.
without serious editing.
When I mean production ready, Imean that that content I can
copy and paste and put that onthe website and feel confident
that end users seeing that wouldfeel like it was well written

(02:11):
and that it was helpfulaccording to Google.
BARD could not followinstructions and I found that it
needed me to remind it of atleast four different
requirements that it keptmissing.
It would say, you're right, Iapologize for the mistake.
That was very common when Iasked it to verify the
instructions were carried out.
BARD does not do a good job ofgiving good prompt advice

(02:34):
either, so it was a lot ofexperimentation, and I found
that it just didn't really taketo a lot of the refinements.
The next is OpenAI GPT 4.
So my prompt strategy, actually,my original prompt that I had
created for this task was forGPT 3.
5, and I've since adapted it for4.
GPT 4 chat, especially since midJune, works very well when I

(02:58):
give it a conversationalrequest, almost as if a
conversation with a contentwriter was transcribed.
I'm able to get away with noshot prompting, and it follows
all of the directions I give it.
From a response qualitystandpoint, I can get a
production ready piece ofcontent in a single response.
I can run a list of commandsthat ensure it's followed my

(03:19):
request exactly as a follow up,but oftentimes it doesn't need
to make any changes.
Let's talk about OpenAI's GPT 3.
5.
So my prompt strategy for thisis using really my revised GPT 4
prompt, and once it finishes, Ihave a follow up prompt which
forces specific requests thatthe response just typically

(03:40):
lacks.
This actually includes specificformatting, and not replacing
some text with variables that myCMS will use to pull real time
data.
So it requires two promptschained together in order to get
a quality response.
So let's talk about the quality.
My response can be productionquality only after running my

(04:00):
secondary prompt to clean up andreinforce specific rules and
formatting.
The quality of the content isvery close to GPT 4.
I do provide it with a largeamount of background data and
content that it utilizes.
And so it's really synthesizingthat.
And I feel like that's why itcan get to that quality.
The next is Anthropx Cod2.

(04:22):
I don't have access to the API,so I'm only using the text
interface.
This is one of my favoriteconversational large language
models, and they have some gooddocumentation on ways to present
data in the prompt to help withcontext and giving it data to
utilize.
So, my prompt strategy.
Cod2 had me actually completelyrevise my existing prompt.

(04:43):
Even though it can handle ahundred...
1, 000 tokens.
I found that it required muchmore strict structuring of the
prompt compared to OpenAI.
That's not necessarily anegative thing.
It just made me tag things in acertain way.
So I followed theirdocumentation.
I utilized XML tagging of datatypes and surrounded contextual
and additional data in specificxml tags that I was able to

(05:07):
reference in other parts of theprompt.
I also found that including anexample like a one shot proved
to help solidify the format andit didn't overly utilize the
phrasing.
Often, GPT 4 will use a similarsentence structure that the two
pieces of content feel just toosimilar.
They'll borrow too many of thewords from my one shot example.

(05:30):
CLOD2 has the best promptdiagnosis and suggestions built
into the large language model.
I was able to fine tune theprompt using the system itself,
allowing me to get it to asingle prompt response.
So let's talk about thatquality.
I was able to get a productionready response with a single
prompt.
The quality of the output was asgood, or in some cases, better

(05:53):
than GPT 4.
However, I don't have APIaccess, so it's not as easy for
me to test across more contenttypes.
Typically, I'm going to run this10, 15, 20 times and look at the
nuances and differences betweenthem.
Without having the API, it's alot harder for me to include all
of those different variationsbetween them.

(06:14):
The next is llama too.
It's free.
It's legal to commercial use.
Okay.
Did I mention it's free?
Well.
Actually, it's free if you havethe hardware to run it.
My system, though it's verycapable, doesn't quite meet that
requirement I wanted to use atleast the middle model.
So I was able to spin up an AWSendpoint with four NVIDIA Tesla

(06:34):
T4 GPUs, which was able to runthe 13B model.
I'm most excited about thismodel as it would allow for a
completely local text generationwith complete control over the
model, hardware, and privacy.
So let's talk about this promptstrategy.
I didn't really work a lot withthe Llama V1 as it was really a
research only version.

(06:55):
Or there were some leakedversions or derivatives, but I
did go in expecting it to worksimilar to GPT 3.
5.
And I feel like it does.
I am limited in tokens due tosome hardware limitations and
just some general settings, butI found that my GPT 4 prompt
worked pretty well within theLlama 2 LLM.
So let's talk about thatquality.

(07:17):
I would put the quality closerto what I normally get from GPT
3.
5.
Before I can pull that secondrefinement prompt with the
positive privacy aspects I feellike there's a lot of
opportunity to do promptchaining to get the response
refined It had some issuesfollowing the instructions.
It really tended to exaggeratewhat I want it done For example,

(07:40):
I wanted it to put some HTMLcode so P tags around each
paragraph But then it decided toalso create headings and heading
tags That's not something that Iwanted and it added that in
place.
There would be a lot of thingsthat I would have to tell it not
to do, and it tended to justkind of keep building as opposed
to just sticking straight to thescript.
Overall, I did like the way itworded the content, though it

(08:03):
did feel a bit too salesy, andoverly positive compared to
using the same tone requirementsas I used in other models.
Now, I did test the chat versionof the large language model and
had less success with the nonchat versions.
In fact, I couldn't get them toreally come back with anything
that I felt was cohesive.
They were going to require somemodel training.

(08:25):
that I just haven't put the timeinto yet.
I do think that with some minorrefinements, this could be as
good as GPT 4, but it's going torequire a lot more work in order
to get there, but it could comeat an overall lower cost.
And a fully private locally runlarge language model.
The last one is Jasper's API.

(08:47):
Now having access to this hasbeen really great, as I've been
building out AI tools that needmultiple models, and I really
like their command endpointthrough their API.
It allows for up to 6, 000characters in my prompt, and it
really lets me push the size.
So talking about that strategy,the prompt's focus has been
nearly identical to that of GPT4.

(09:09):
I actually suspect it's usingGPT 4 or some kind of variation
of it as part of the API.
Sections of the content withcontext and instructions along
with format requests really makeup this prompt, and I found that
XeroShot worked just as well asGPT 4.
So talking about responsequality.
I was able to get productionquality responses without

(09:30):
needing to make any adjustments.
So far, it looks like Jasper andGPT 4 are fairly easy to get
quality results from.
I was pleasantly surprised byAnthropx Clawd 2, and I like the
formatting they've trained theirmodel on.
I'm hoping to get access totheir API so I can put it really
to the test.
Llama 2 wasn't bad, but itcouldn't quite get me production

(09:53):
quality content, so I'll have tolook into training the model to
align closer to what I'm lookingto get as a response.
I'm curious how many of you havebeen creating a prompt library
aligned with different LLMs, andif you've found a prompt style
that maybe works between all ofthem.
I will say this was 100% writtenby hand, no AI wrote any portion

(10:15):
of this content, but does thatmake this a better article?
Would love to hear your thoughtson that too.
Thanks for taking the time tolisten.
This is Phil, the OpinionatedSEO, and I guess AI guy at this
point, because that seems to bea lot of what we're all doing.
Have a great day, and talk toyou guys again soon.

All Episodes

AI LLM Prompting Tests - My Results on Prompt Engineering

Episode Transcript

Popular Podcasts

Stuff You Should Know

24/7 News: The Latest

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}AI LLM Prompting Tests - My Results on Prompt Engineering