Prompt Engineering Advice | CXOtalk 883 - CXOTalk

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Prompting is the secret skill that taps into a is real
capabilities, transforming largelanguage models from flashy
demos into engines of real worldproductivity.
Today on CXO Talk number 883, weunpack how prompting works, what
it is, why it matters, and how to get it right.

(00:23):
I'm Michael Krigsman, and with me is Nate B Jones, a widely
respected AI expert whose sharp insights and no nonsense advice
have earned him nearly 300,000 TikTok followers.
We need to talk a lot about prompting for two reasons.
One, human language is fairly vague.

(00:45):
That's why we invented computer language back when we were
programming computers in the 1stplace, because it's much more
precise. And now we're using effectively
natural language to program computers again, and that's
challenging. The second reason is that even
though these models are very intelligent in certain respects,
they are not incredibly reliableyet at inferring your intent.

(01:10):
If you are not precise about what you mean or want, they
don't do that reliably. They guess, and they might guess
right, and they might guess wrong.
And so both because we have to get clear with our language and
because models don't yet infer with tremendous precision,
prompting is what bridges that gap.

(01:30):
So when we are prompting, we we're programming the AI.
This is really going to take youback, but like in the old days
in the 60s with punch card computing, you would literally
bring your little, your little punch card and, and put into the
computer and you would run it and you would see in 20 or 30
minutes whether you got that right or not, or maybe longer

(01:51):
than that if it was a big program.
We're doing exactly the same thing with natural language now.
We're handing the prompt to an inference model, maybe O3 Pro.
It does take that long, 20 or 30minutes.
And we're going to come back andwe're going to see if our little
natural language program did anything.
It's fascinating how time is a circle in that regard, where
we're back to where we started. So the logic of prompting is

(02:15):
effectively the logic of software development.
Is that a correct way to say it?You could say it as like the
marriage of software developmentand business intent.
So in a sense, software development has been predicated
primarily on building interfacesthat allow business operations
to be conducted, business logic to be encoded, etcetera.

(02:37):
But now, because these models have the ability to sort of
bring intelligence to bear, you're not just asking it to do
1 specific thing, You're not spending your time engineering a
specific interface. Instead, you're asking the model
to think with you. And so it's this weird mix of
the principles of engineering with the business clarity of

(02:59):
intent that has always characterized a very strong
executive brief, for example. As the models get better, what
does that do to prompting? Does it make prompting easier or
more difficult? On the one hand, you don't have
to do some of the stage management that you had to do in

(03:23):
2022 and 2023 anymore. You'll recall when ChatGPT first
came out, the prompting guides were like, OK, tell it to
pretend it's the best editor in the world.
Tell it to pretend this or that.And then it began to sort of
turn into chain of thought prompting.
Tell it to think step by step, do this and do this and do this
and do this. That stage management is

(03:44):
thankfully no longer really necessary.
You can instantiate the model bysaying you are in a particular
space, like you're in a consultative strategist space,
you're in a CFO space, whatever.You can say that, but you don't
have to like put in the adjectives and hope and pray
that the model understands what you mean.
You can just say this is where we are.

(04:04):
You don't have to specify chain of thought anymore.
The frontline models know to usechain of thought when they need
to. And so in that regard, prompting
has gotten simpler. On the other hand, the
importance of specifying what you're looking for, what success
criteria looks like, what the constraints are, that's only
gotten more important because these models are much more

(04:27):
powerful. And so before, like if it was a
very simple ask that you had fora smaller model, you could go
back and forth a few times and figure out what you wanted and
it was fine. But if you give something to a
frontline model and like it's running for 6 minutes, 8
minutes, 10 minutes, 20 minutes,it comes back and you just did

(04:47):
not clearly specify the scope, you're going to be frustrated
because you wasted all of that compute.
And so in that sense, some of the some of the, the, the stage
management and scripting that you're used to, you don't have
to do anymore, but the importance of specifying the
work very clearly has grown. Like you have to really take

(05:08):
that seriously now. So you really need to match the
prompt to the model. A lot of the art of it is in
figuring out what is this subject, what is my intent, what
is the right model for that? And once I have all of that
figured out, now how do I craft a prompt and then bring in the
context the model needs so it can do a good job for me?

(05:30):
For example, with open AIA, number of the models allow you
to either include deep reasoningor research or not.
And actually other other companies as well other LLM same
thing. So give us some examples of
this. This is one of the things where

(05:51):
model makers have not done a great job at the product surface
of explaining what their models do.
For example, deep research is really a very narrow web agent
that is trained as a research assistant to go out to look
across the entire browsable web,it doesn't yet look behind
paywalls and to come back with aconsolidated view.

(06:15):
And they train it specifically on citation.
So it's good at citations. It lists what it knows and why.
Open AI pioneered this with DeepResearch, but Deep Research is
now available on Perplexity. It's available on Claude.
It's available with Google. Lots of others have picked this
up because it turns out that reasoning across the web is a
lot of what we do. And so there's just inherent

(06:37):
value in report generation. But people don't realize that
all you're getting with Deep Research is the O3 model if it's
ChatGPT specifically tuned to web search.
And that is different from whatever else you've been
talking about with whatever model you've been talking about

(07:00):
in ChatGPT previously. So if you've been having a
conversation with four O for a bit and then you turn on deep
research, it's not that four O suddenly picks up a Cape and
becomes a superhero and turns into deep research.
It's that you are invoking a separate agentic tool, getting a
separate prompt in, starting a new flow, and then that report

(07:21):
is going to come back and you'regoing to be able to continue the
chat. And I think that a lot of people
don't think about it that way. And it's become even more
confusing in the last week because O3 Pro on the surface
looks very, very similar. It's got a long thinking time.
You give it a prompt, it goes away, it comes back.
And so people have asked me, didthey just release a clone of

(07:42):
Deep Research and rename it? And the answer is no.
The answer is that O3 Pro is a generalizable model with a lot
of different tool calls under the surface.
But precisely because it's underthe surface, it's difficult to
know that staring at the chat window when it takes a similar
amount of time and comes back. And so I think that some of what

(08:04):
I do is just try and convey the nuances of these models and how
understanding them with a littlebit of a fingertip feel can
shape the way we prompt. Again, there's a level of
confusion here. I mean, I use so many different
models every single day and I amon an ongoing basis having to

(08:29):
kind of experiment. You know, it's like this whole
domain is very immature because the models are changing and the
models give indeterminate results in any case.
And that means you keep having to adjust your prompts on an
ongoing basis. It's it's really a waste of
time. I think if it was a waste of
time, we wouldn't be seeing the kind of tremendous uptake we see

(08:53):
on ground groundswell usage withthese models.
One of the biggest challenges with IT and security this year
is shadow IT where people are finding these models so useful
for the work that they do that they are using them even outside
traditional IT security practices.
And in that sense I share your frustration.

(09:17):
I find that like when I am not getting what I want, there's,
there's nothing more frustratingthan sort of pounding my head on
the wall and trying to figure out what the model needs to hear
from me so that it can give me what I want.
But net, net, if I look across my overall productivity for the
day, for the week, I am so much more productive now, even with

(09:40):
all of that factored end than I was two years ago.
And it's because I'm learning enough about how to work with
these models that I'm able to get a tremendous amount of value
back. And I think a lot of people are
having that experience. And maybe I shouldn't say it's a
waste of time, although I do think it's a waste of time, but
let's just say that there's a lot of overhead that seems like

(10:03):
it shouldn't be there. That's a really fair call.
That's basically a complaint andnot that it makes any difference
at all, because that's the nature of the maturity of these
models as products at this pointin time.
What I am curious to see answered by the model makers in
the next probably 18 months is the extent to which prompting

(10:28):
remains a durable skill set thatprovides tremendous alpha to
people who know how to use it well versus the extent to which
it commoditizes. Not necessarily because everyone
learns the same amount, but because models get very, very
good at inferring intent across a range of prompts for the same
subject, and people are widely divergent on what they think

(10:51):
will happen. My own view is I'm trying to
take seriously the fact that I expected initially prompting to
be a very one off 2022-2023 edge, and that's not been the
case. It's been stronger and stronger
over time instead. So I tend to lean toward the
idea that at least for the intermediate term, prompting is

(11:11):
going to continue to have a tremendous amount of value
because that's what we've seen as a trend so far.
There are people who think that if we can get to a level of
generalizability with these models, we will suddenly unlock
a tipping point and we will finda way to infer very reliably
where we haven't before. And that might be.
And if that's the case, then suddenly prompting will become

(11:32):
less painful and less needed somewhere in the next 18 months.
Subscribe to the CXO Talk newsletter so you can be part of
our community. We have amazing shows coming up.
What makes a good prompt? Do you have any practical
advice? Number one, be really clear

(11:52):
about the outcome that you are looking for and about how the
model can know that it's done. I think a lot of people will be
fairly loose about specifying the outcome or they'll be loose
about the goal. They'll be very, very loose or
non existent about how the modelcan know that it's finished
adequately. And the more you can specify and
be clear about what you're looking for and what good looks

(12:14):
like, the better off you're going to be for the rest of the
prompt #2 you want the model to have all the context that it
needs to do that job, and you would prefer it to not have any
extra context that it doesn't need.
A lot of what we call hallucinations are effectively

(12:36):
models reasoning outside your desired context window.
And so if you can be more clean and clear about this is what I
want you to focus on in a web search or here's some documents
I want you to review. I want you to keep your thinking
focused around this particular, you know, set of meeting
transcripts or whatever it is. It will really help the model to

(12:56):
be confident that it's doing theright job and able to deliver a
reasoned results that closely matches the kind of work you
were looking for. And so the context piece is
another one. And then the third is really
making sure that you understand the the constraints and

(13:18):
guardrails that you want to put around.
So if you have a, if you have anoutcome or goal, if you have
context, you feed it. You then need to make sure that
the model knows don't do this. Where do I not go?
And I find that that is often one that people either barely
put in or tend to avoid because we tend to be thinking in a
positive stance of like, hey, this is what I want done.

(13:40):
Let me just give the task and go.
And maybe this is because we're anthropomorphizing models.
Anthropomorphizing models. We don't tend to regard a senior
colleague as someone who needs atremendous number of warnings
and constraints for a task. We just say, hey, go tackle
this. I'm sure you'll do a great job.
Come back and let me think. Think about what you get.

(14:01):
These models need those constraints still.
Even if they in many ways are very senior in their thinking,
they still need helpful constraints so that they know
where the guardrails are in the space.
And they don't start to reason off the rails into a direction
that that isn't helpful. Because at the end of the day,
what they're really trying to dois just infer from your

(14:22):
utterance what they think you mean.
Figure out where in latent spacethey can go and get a reasonable
pattern match. Do some searching across the
web. In the case of an inference
model, do a lot of that iteratively so they can figure
out what's best and then put together something.
And so they do need those guardrails to constrain.
Models are people too, and you can't.

(14:44):
Just as you can't expect your friend or your spouse to read
your mind, how can we expect models to anticipate every
possibility that's out there andmap it to what happens to be in
your mind at this given time, what you want when you write

(15:05):
this prompt? And that's the need to be
explicit. And that's why I say models are
people too. We as humans are very, very good
at retaining long context from multiple conversations with our
colleagues and extracting what'sreally important out of that and
and getting to clear points of discussion.
Like I can talk with a software development manager about a

(15:27):
project that's been going for six months.
We can have a really meaningful discussion about the sticking
point decisions we've made in the past.
What we need to change. That is how humans have done
work for a long time. We we iterate over time
effectively, the prompt evolves through conversation over time.
It's shared work together with the model.
We can't have the same iterativeconversation.

(15:49):
We actually have to front load all of that thinking and give it
to it in a really clean prompt so we can get a really clean
answer. And I think part of what's hard
for us about prompting is we're conversational people.
We like to chat just like you and I are chatting.
We make meaning that way, but the model needs us to sort of
compress that semantic meaning into a, like a really clean

(16:10):
initial prompt that will help itto work effectively.
I think this is a very importantpoint that you're making is
there is the need to, as you said, to compress the context
into a digestible set of words and chunks that the model can

(16:32):
then use to execute the explicittask.
Going back to an earlier comment, you made that in effect
you are programming into the model, into the and driving the
conversation through that programming essentially.
Right, because we humans effectively, collectively derive
intent and collectively reach decisions through conversation.

(16:53):
But the model needs you to be the one that provides the
intent, that provides the driving force.
There's a higher expectation of human agency in prompting.
Let's jump to some questions. If you're watching on Twitter,
pop your questions into Twitter using the hashtag CXO Talk.
If you're watching on LinkedIn, pop your questions into the

(17:14):
LinkedIn chat. And so this first question on
Twitter X goes to Arsalan Khan, who says it seems like asking
specific questions in your industry to the AI would deter
fake experts. But how would an end client know
the difference between real experts using AI versus just

(17:39):
good or fake experts or salespeople using AI?
And if I can restate that question in terms of prompting,
if somebody is really, really good at prompting, can't they
present the appearance of being an expert where it's almost
impossible to tell them apart from somebody that has the PhD

(18:02):
in whatever the subject might be?
That is more true than many of us would like to admit.
I think it's part of why there are so many, many consultants
springing up, so many tools springing up.
The industry has a need for authenticity, but AI by its
nature is enabling many people to claim expertise that they

(18:24):
don't genuinely have. And there's just like there's
not a silver bullet solution fordetecting text in student essays
and saying who, who wrote which which bit of text.
There's also not a silver bulletsolution for detecting
expertise. I find in practice, what tends

(18:46):
to be most helpful at distinguishing a true expert
from the sort of AI generated straw man expert is acknowledge
the source material. It's probably going to be very
good because AI helped prepare it, it's very thorough,
etcetera. Make sure you understand it and
then ask a question that's designed to push off balance,

(19:09):
push out of the comfort zone, and a true expert will be able
to adjust and have an interesting and thoughtful
perspective and not get too frustrated or flustered.
And someone who's depending heavily on the prompting is
often going to struggle because they won't be able to actually
have that flexible intelligence across the domain that
characterizes true expertise. So you're saying that there is a

(19:31):
level of depth that the model doesn't, or that a person who's
simply relying on the model doesn't have.
Is that another way of saying? I'll give you an example.
So I was doing an article on prompting an O3 Pro versus O3,
right, Because O3 Pro came out this week and I asked it to
prepare a road map because I'm very familiar with Rd. maps that

(19:54):
came up through product management.
I've seen more of them than I would care to admit.
And I asked, I asked for that because I knew I could judge it.
I knew I had the expertise to assess it.
And I was talking with someone afterward and I was saying O3
Pro did a much better job on theroad map.
And they were like, well, how did you know?

(20:14):
Aren't Rd. maps subjective? And I immediately pulled up 3 or
4 reasons why Rd. maps are not subjective.
Why? It's actually a craft you can
understand and and 9 out of 10 experts will agree with you that
a particular road map is better than another because it's a
proactive stance. It takes into account all of the
strategic advantages the companyhas.
It thoroughly understands the marketplace.

(20:34):
I could just go on and on. It's all at the top of my head.
And so having that expertise helps you to assess the true
quality of model response. And in a sense, what we're
seeing here is that these modelsare getting to a level of
intelligence where their very best work takes an expert to
truly understand and appreciate.We have some really interesting

(20:54):
questions that are coming in on LinkedIn right now.
And Greg Walters is responding from the to the point you made
earlier. Nate, where you where you were
describing the need for a compressed highly efficient
prompt. And Greg says this isn't the
magic in prompt iteration. Instead of having one

(21:17):
compressed, highly efficient andexplicit explicit task or
prompts shouldn't there? Shouldn't we be collectively
prompting prompting? It depends on the kind of task
that you're looking for. So this gets back to the
relationship between prompting and model selection.

(21:40):
For certain kinds of models, they're more suitable to
iterative thinking and iterativebrainstorming.
We haven't really talked about the relationship between model
and interface, but I find if I'musing advanced voice mode, it's
just a very different experiencefor my brain because I'm talking

(22:01):
instead of typing and I am much looser and it's much more
conversational. It is in a sense much more
iterative and I keep it that wayon purpose.
But if I'm working with a long inference time model, and it's
not just that ChatGPT has a monopoly on those.
Opus 4 is a great example from Claude.

(22:22):
I want to be clear in what I'm looking for because frankly, it
is expensive to iterate when thecycle times take that long.
And so I pick the problem and I pick the model and that guides
me to a prompting style. We have a really interesting
question from Wayne Anderson on LinkedIn and he says this.

(22:43):
How would you address the fear that leaders using large
language models could inhibit and erode decision making and
critical thinking? When does effective prompting
help and when do you think leader should avoid using AI?
It's kind of like asking, do youwant your doctor to avoid using

(23:05):
AI? If we have studies that show
that medical reasoning is something that these models are
very good at, I would love my doctor to use AI as long as my
doctor understands how to use itwell.
And so in that sense, my response is I want AI leaders to
be using AI all the time. I just want them to understand
the limitations of these models and where they need to think

(23:27):
beyond the edges. And so really, I think it's more
precise to say these are extraordinary models.
In some places, they are advancing the far edges of human
thought and research. We have AI developed drugs and
pipeline, but they're narrow. They have like particular ways
in which we can prompt them thatgenerate extraordinarily

(23:49):
effective results. And the strength of a good
leader is not being only narrow,it's at that T shaped leader
where you have that breadth of experience as well.
And So what I would look for a great leader to do with AI is to
know when he or she needs to go to AI for a deep, precise,
thoughtful perspective on something and then to bring that

(24:09):
generalized experience of the business to bear to say this is
how I would contextualize that and understand it for my broader
problem set. But let me just go back to the
comment that Wayne Anderson made.
When I write certain things, I'll write something and I'll
ask ChatGPT or whatever the model is.

(24:33):
What do you think? And it will make suggestions and
this new canvas feature, I guessit's not so new anymore of
ChatGPT makes it really easy to like drill down to very small
segments. It produces good results.
But in the back of my mind, I'm thinking to myself, it's giving

(24:54):
me kind of the least common denominator, mass market
generalized. Solution.
Not necessarily. That might be your prompting.
And I think that's what's interesting about these models
is that you, you are correct that if you're not intentional
about how you frame the models position in latent space, it

(25:20):
will default towards something that's more highly probable,
which we often translate as the least common denominator.
If you are intentional though, and you want to lean in and say
I don't want a mid answer, I don't want a common answer.
I want a really creative answer.I want a really thoughtful
answer. I want an answer that you

(25:40):
haven't heard or seen elsewhere.Models are perfectly capable of
going that far and thinking morecreatively, thinking more
substantively, but they don't doit by default because the way
they're trained is to be helpfulfor as much of the population as
possible. And so in a sense, our own
population distribution shapes the way the model makers are

(26:02):
tuning these models for general helpfulness.
And so it's up to us if we want something more on the far side
of the distribution to push for it.
We're drifting from prompting here.
Oh, you. Do that with prompting.
It's prompting going to create the next, you know, set of Bach
inventions. No, I don't think so.

(26:23):
And I think especially in the creative arts, like I would say
that like humans have tried, I've actually a huge fan of Bach
in the cello suites. I love them.
I listen to them almost every weekend.
And people have tried to expand on to invent after Bach even
through the 20th century. And in my view, no one has done

(26:46):
for the cello what Bach has donefor the cello.
And so no, I don't believe that we are in any danger of a
machine coming along and doing abetter job than Bach at Cello
Suites. Let's jump over to Twitter from
Chris Peterson, who says tokens and time measures are all very
well, but doesn't every round trip of prompting eat up more

(27:09):
electricity and water for cooling, thus making some of the
numbers from Open AI and others highly misleading?
No or yes it does, and yes it matters in aggregate, and yes we
should talk about power use in aggregate.
I think it's an appropriate conversation to have.
But individual, individual prompt usage by people doesn't

(27:31):
compare to some of the other things we do day-to-day that use
energy and water. So taking a hot bath is much,
much more expensive in water than any kind of ChatGPT prompt
you're going to run. Watching an hour of football on
the big screen is much, much more expensive in electricity.
Then I think it runs up to a couple of 100 ChatGPT prompts.

(27:52):
And so does it matter in aggregate?
Yes, because suddenly a billion of us are using this.
It's important, we should talk about it.
Not saying that we don't have relevant conversations to have,
but I think the idea that an individual prompt is
fantastically expensive is incorrect when we actually
factor in the energy usage of a day-to-day life.

(28:16):
Let's jump over to another question.
This is from Chris Chablonsky onLinkedIn who says Do you have
any tips for using Gen. AI for a data analyst to process
a large data set and generate visualizations?
I'm not sure if it's the right tool for the job.
I've sort of talked about this alittle bit with folks who are

(28:39):
managing large data sets. And what I find AI
extraordinarily good at is handling data sets that don't
have clean numeric, numeric data, right?
If you have clean numeric data, we have fantastic tools for
that, and they may include machine learning or they may be

(29:00):
just traditional sequel, but we're very, very good at
handling that efficiently with compute.
I don't know why we would switchthat out and ask a large
language model to do that when the language model wasn't even
designed primarily to be numbersdriven.

(29:20):
They use Python And other tools to handle numbers now, and
that's great. But if you're talking about a
truly large data set, we have tools that handle those data
sets and visualizations really effectively.
And what I find people using in practice when they're looking at
large data sets and AI is they're using AI to help them

(29:40):
craft SQL statements. They're using AI to help them
think through the data schema that they want to set up.
Sometimes they're using AI to help them prototype
visualizations that they will want to get to quickly.
Claude is great for that. And all of those are sort of, by
the way, uses of AI that help you to use that data more

(30:01):
effectively. But that's different from the
traditional assumption that you can just sort of type the query
in and you will magically get a better answer than you would get
with really efficient sequel. It's a really, really good point
that you've got to have an understanding of the particular
tool that you're using and what will be the most effective use

(30:24):
of that tool. And as you said, prompts are
great if you have a body of dataand you're trying to figure out
what have I got here and how canI present it?
And is there something that I'm missing?
Prototyping, as you say. But there are tools out there
that are designed for, you know,millions of records and that do

(30:46):
it really well. I don't think you'd want to put
millions of records into ChatGPT.
It's not really designed, if youthink about what we mean when we
talk about context and prompts. It's designed to look across the
overall picture. And oftentimes with data, we
don't just want an overall picture.
We want precision. And that's something that we
have existing tools that do very, very well at.

(31:08):
Let's get into the structure, the nature of large language
models, how they think, think inquotes and operate, and what
that means for prompts. Maybe take us down that path a
little bit. It's probably worth calling out
that a lot of the difference in how prompting has evolved is

(31:29):
being driven by this movement from large language models that
are what I would call vanilla. So that it's just coming back
with a response based on weightsand vector space developed
through pre training data. Which is what we had into 2024.
And then the newer version, which is inference time,

(31:49):
computes models where they have that same underlying
architecture, but at the time you press enter and send in your
query, they are running threads in the background trying to
figure out what the correct response is.
And there's different ways of doing that.
Sometimes it's a combination of expert models in the background
that are sort of coming up with answers and deciding amongst

(32:10):
themselves. Sometimes it's running the same
query multiple times in parallelin the background trying to find
the most common answer. Regardless of the underlying
architecture, the effect of having more time to run cycles
on your query is tremendous. It's it's a night and day
difference in terms of the intelligence that the model is

(32:31):
able to respond with. And so that is a lot of what has
shaped different prompting. A lot of the reason we don't
have to give chain of thought instructions anymore is because
the models already have a way ofdeeply processing the queries we
give them when they are inference models and they don't

(32:51):
need our help to do so anymore. And so when I say you don't need
chain of thought, but you want to be clear on your goals and
basically saying try and write aprompt that understands that you
are going to be running multipleparallel streams of thought in
the background or multiple parallel streams of tokens in
the background. Constrain it.

(33:13):
Like if you if you have 10 that are going to run, you don't know
it's time. But let's pretend it's time for
simplicity's sake. Make sure all 10 are focused on
what you care about because you want to constrain the scope of
the query so that it's actually focused on where you want to go
with a conversation. And so that's why I emphasize so
much. Set a goal, make sure the model
knows what good looks like. Make sure you set guardrails,

(33:34):
etcetera, etcetera. Describe to us what you mean by
a chain of thought prompt. It's where you said I want you
to answer my query to a traditional model using pre
training data weights and it would come back and answer.
But you wanted the token stream to go through a particular
sequence. And so it's going to go through.

(33:56):
And from a transformer perspective, like the
Transformers there, it's basically using your query, it's
matching it in vector space. Once it vectorizes it with what
it has for weights and pre training data, it's coming back.
And you're basically saying, letme give you a deeper query with
a lot of things I want you to think about and do.
So start with, this is who you are.

(34:16):
You're an expert on marketing. Second, I want you to think very
deliberately about this campaignthat I want to launch.
Third, these are the steps of thinking I want you to go
through. First, develop a plan.
Second, critique your plan. Third, understand the
consequences of the plan in the market and you can kind of go
through. And that's like chain of
thought, right? When you do that, you're

(34:38):
basically being very particular about the places in vector space
that you want the model to go and hit when it's generating the
response. And because models read like
humans do, they read top down, when the model hits that point,
it's going to be effectively sequentially reasoning back to

(34:58):
you because of the way you programmed it.
And so this gets back earlier inour conversation, Michael, when
we talked about this idea of natural language programming,
that we are effectively programming the model, that was
sort of what we were doing. And all we're saying now is we
still have to program the model.We don't have to program it
quite that way anymore. How do we program it today?

(35:20):
Today when we program the model,we want to be focused more on
outcomes and goals. And in the past it was focused
more on process. And so today if I'm looking for
a report like I digested a, you know, 130, a 140 page economic
report this morning from the world, I think it was the World

(35:43):
Economic Forum, something like that.
It I I wanted the model to understand what I wanted out of
the report and the goal of the summary.
I didn't just want a vanilla summary.
And so my focus was on making sure it knew the the angle I
wanted on the report. And I trusted it to know how to
read, digest, summarize, think through all the things I would

(36:04):
have had to specify earlier. In that case, again, the the
context, giving it the background and the goals becomes
the key focus of the prompt as opposed to telling the LLM how
to do its job. That's right, we have another
question from Arsalan Khan on Twitter.

(36:26):
Arsalan says to prompt or not toprompt?
When is it appropriate and when is it just a rabbit hole for
your confirmation bias? I think one of the biggest
differences in the way people use models right now is people
who are focused with their models can use the model as a
mirror that focuses on a particular subject really

(36:46):
effectively. And there are people who are
less focused and the mirror becomes a scatterer for them.
It scatters their thinking. They become more confused as
they use it. And I've seen both.
What I find is interesting aboutthe critical thinking piece.
Imagine the mirror, and typically it faces you right.
Then it becomes a reflection of yourself.
You're absolutely right. There's no critical thinking

(37:07):
there. It's just coming back with
confirmation. But if you're smart, you can
turn the mirror away from yourself and you can focus it on
something else and you can come back with a disconfirming or
divergent opinion. And so I will frequently ask the
model to fight with me. I will ask it to disagree.
I will ask it to come up with a Steel Man argument because I
think it's much more interestingand my thinking gets sharpened

(37:31):
when I do that. Actually I do the same thing.
I very often will say to the model, be very critical.
Don't worry about hurting my feelings, be sharp.
That's right, like an iron sharpens iron vibe is what I
like to go go for. Makes sense.
What's the best way to again craft that prompt?

(37:53):
We were trying to accomplish something.
Shall I show my screen? Would it be helpful to just kind
of take a peek at a prompt I wrote?
Sure, let's do that. All right.
This is a real prompt that I wrote and this is an example of
me picking something where I feel pretty good about sort of
my overall ability to assess quality of response.

(38:14):
But I don't have a direct answerto this question.
And this was part of a sub stackarticle that I was writing to
test O3 Pro. So this is four O 3 Pro.
You can see it up there and I'm asking it to step through this
analysis with me. So think you're a senior product
leader brought in to design a 12month AI adoption road map for a
real firm. First I I could have given the

(38:37):
model the choice of firm and I tried a separate prompt where I
gave it that option. That was very interesting.
In the end, I wanted something with a company that I was
familiar with since I was working on it for testing
purposes. So I used Datadog.
I ask it to do some very specific information gathering.
So build the source corpus. I want publicly available
information, I want 10 KS, I want job postings, I want SEC

(39:00):
FINRA guidance. And then I want 3 responses.
And I actually specify the word count output and I specify what
I want there, right? There's a strategy memo first,
there's a tech stack overview, and there's a regulatory
constraints piece. And So what what's interesting
is by using the word internal, Iam suggesting to the model that

(39:23):
the model can craft these insidethe chain of thought that it's
running behind the scenes without me having to see it.
And then Step 2, produce. Now I'm starting to ask for
output. I'm starting to ask the model to
come back with one document withan executive summary, month by
month road map, a KPI per quarter, anticipated failure

(39:46):
modes and mitigations, and an advisor briefing.
And then I'm giving it styling that I want.
So I want it to be really brutally honest.
This is an example of not looking for confirmatory
thinking. I do not want tables, I want
just bullets if need be and I would like.
To get a sense of what shaped your recommendations right, I

(40:07):
want to know where you got some of this thinking from.
And then I give it a limit at the top.
It can't be more than 7500 words.
So it ran, it thought about it, it was a 6-7 minute, basically a
7 minute run. So it chooses Datadog, which I
specified, It does a little bit of basic research on Datadog.

(40:27):
It builds the source corpus, so it gives you a sense of what's
in the box there. Market contacts, Datadog's Edge.
It's starting to adopt the persona.
So it's saying where we lag and talking about sort of other
updates in the competitive space, getting into growth
goals. And what I love here is that it

(40:48):
actually called out like the statement by the CEO by by
Olivier around what they're looking for and why.
And it's taking that into effectand taking that into account the
way, frankly, a good road map builder should.
It's looking at client mix. And this is a situation where
it's done its own research to come up with that assessment and

(41:10):
given sort of a very rough assessment of that.
It's looking at the AI aspirations it can find from
each of the different C-Suite members.
It's looking at strategic gaps to close.
This is all just in preparation and hasn't even really started
the assignment yet. It's just kind of thinking it
through. It's now going into the current

(41:30):
stack in great detail, looking at vendor contracts, security
posture. You can see where we're going
here. Eventually it's actually going
to get to what it wants to say. And it's actually a very cogent
thesis. It talks about how you sort of
dominate the data exhaust space and what that means.

(41:51):
And then it starts to get into the the road map piece.
But my point here, like we couldgo through this, but we don't
have time. The the point is basically
because I structure the prompt carefully, I got exactly what I
was looking for back. What if you're not trying to get
a research report, but you're trying to do, say, small, small
research? Find out the answer to some set

(42:12):
of questions, for example. I did 1.
I don't know if I have it handy or not.
I will see if it's there may be emerging trends and investing.
Yeah, I think it's this one. This is a much shorter report.
See that? That's the whole thing right
there like that. It feels short comparatively and
it was a very short ask. Please analyze this economic

(42:33):
report and I'm really interestedin again, I'm trying to push it.
I want to understand emerging trends and I want to understand
areas not commonly discussed, right?
I'm looking for it to sort of push beyond, but it's not a very
long prompt per SE. And then it just jumps right in.
It reads all 138 pages. It gives me a snapshot.

(42:53):
It talks about commodities and where they're at globally, miss
pricing issues driven by LNG, and it basically goes through
themes it's seen. And then at the end, it Nets it
out like this is the big pictureassessment for the next 6
months. This is the macro assessment and
this is how you start to lean in.

(43:14):
And what's interesting is this is much more specific.
I ran the same prompt with O3 and O3 Pro and it was
interesting to see the relationship between the two
because O3 focused on on this sort of bifurcation piece and O3
Pro had a slightly different perspective.
And I know we have like 8 minutes so we probably don't
have time to get into it, but I thought it was fascinating to
run a short prompt on both and see the differences.

(43:36):
We have a question from LinkedInand this, and I was going to ask
something very similar, which isthis is from Laura Finlayson and
she says with a prompt like this, which model will do the
best at retaining the prompt information for future use?
She built and refined her job application prompt in Gemini,

(44:01):
but it seems to forget some of the deliverables each time she
goes back with a new job description.
Claude allows her to save a project, but she doesn't love
its writing. And so we need to talk about
which model is the better model to use and how do you choose.

(44:23):
And we only have a few minutes left, but this, you know, we
could go on forever here. So what do we do?
There's two things going on there.
The 1st is memory. And ChatGPT really has a killer
feature edge with memory right now because they do have, it's
not perfect, but they have a memory feature that enables the

(44:43):
models to start to actually havea living context of information
about other chats you've had inside the same model surface,
right? So if it's in ChatGPT, it
doesn't matter if you're talkingwith O3 or 4-O or whatever.
There's going to be a loose understanding of recent
conversations you've had along with some specific facts that
the model has remembered about you that you can actually audit

(45:05):
and check in the settings section.
That turns out to be very usefulfor problems like this where you
want it to do a repetitive task and you want it to have a sense
that it's done the task before. Even so, I still find I want to
be precise about each of the assets I needed to process if I
need that. And that is one of the reasons

(45:26):
why I do tend to favor long prompts that I will keep in a
Notion page or keep elsewhere that I can just copy and paste
in as needed because I don't want it to forget anything.
I don't want to go to that trouble of writing out that
prompt again. I just wanted to remember every
single thing and do it again. And I wish that I had an answer

(45:46):
for you, that these were going to be flexible deep memory
models that just would remember that you did exactly like this
and never forget that step. We're just not there yet.
And so prompts are part of how we bridge that gap.
One of the problems that I have is I like to try prompts on
different models to see the results and compare the results.

(46:08):
I think it's it leads to a lot of creative thinking and it
becomes a a real burden and an obstacle because I've interacted
for 20 minutes with Model A and now I want to go to Model B and
I've got to start all over again.

(46:29):
There is a way to make that slightly less painful.
So what I like to do is if I want to transition, I like to
ask the model I've been chattingwith.
Could you please give me a very detailed summary of our
conversation so far and make sure it's as clean and clear as
you possibly can make it? And then it will do that and it

(46:51):
will give me a great summary of the conversation.
I then pull that summary into a new model conversation I'm
starting and say, here's where we're at right now.
I would love to continue this conversation with you.
And this is what I'm looking for.
And it's still a little bit painful, but it's it's less
painful than it would be otherwise.
Which model or which company do you gravitate towards the most?

(47:14):
You have access to everything. What do you use the most?
The memory feature is one of themost powerful product features I
can remember on ChatGPT because I find that the fact that it
remembers something about me drives a recursive behavioral
loop for me. I'm very aware of it, right?

(47:34):
Like I've I've worked in productfor a long time, I know what
they're doing, but it still works because I find that having
a model that remembers me a bit is super, super helpful.
And so Chet GPT drives a lot of interest for me.
O3 is a daily driver for me. I pick it up by default, but
that doesn't stop me from going other places.
Like when I am working on a complex piece of writing, I will

(47:58):
use perplexity. I will use both of the new
Claude models, Opus 4 and Sonnet4I will sometimes go to Gemini
2.5 Pro. And so I almost look at those as
like additional pieces that I want to go to for specific
things. Sonnet 4 is great for writing.
I love it. Opus 4, I love the way it does

(48:18):
like very thoughtfully considered reading and research.
There's something qualitative about it that's very strong
there. And so even though I end up in
Chat GPTA lot, that doesn't stopme from reaching my fingers into
the rest of the ecosystem and and grabbing what's useful.
I also really like ChatGPT in general.

(48:40):
They make it pretty easy. And that memory feature, you
know, you start, you start with your prompt and you feel like it
has so bizarre to say this. It has like this intuition of
what makes sense for you. Right.
And that sense of being recognized, I think is very

(49:02):
powerful from a product experience perspective and
people respond to it. Marketers talk about
personalization and usually personalization is well, we've
watched their shopping cart and in the past they've bought XYZ
product and so we'll recommend the next product that they'll
really, really like or the next movie or whatever.

(49:22):
But we're talking here about a level of subtlety with
personalization that's like light years beyond the typical
marketing personalization that it that we know of.
It really is and it's going to be super interesting in the next
6 months or 1818 months to see how the product platform evolves
for ChatGPT as they build on this memory feature and they add

(49:45):
more you know new models, etcetera.
I, my sense is especially as they lean into the partnership
with Shopify, they're going to learn more into commerce.
They're going to be opportunities for
personalization with commerce that we've never seen before.
But we'll just have to see how that evolves.
Should I pay the $200 a month for ChatGPT pro?

(50:07):
I know you do, but should I? Is it worth it or should I just
pay the 20 bucks a month that I pay right now?
That depends on the kind of userthat you are.
And so I have seen the article that came out, I think it was
this week that basically said ChatGPT has done such a
phenomenal job pushing value down the chain to the free tier.

(50:28):
Why would we pay it all? Because it's it's so impressive.
And I think for a lot of averagedaily use, that is the correct
assessment That is true for me. I want, it's not, it's not even
just that I want it so I can test it and show it to people.
It's that I want to have no token limits and no usage limits

(50:49):
on the smartest models out therebecause I find myself doing
better with my own brain if I have the smartest thinking
partner possible. And so we've talked a lot around
the edges of what sort of thinking and intelligence means.
That's probably a conversation for another day, but from a
economics perspective, if I havea thinking partner like that and

(51:11):
I push the edges like that and it helps me make one or two
better decisions in a given month, the ROI is off the
charts. At $200 a month, it's very, very
easy to do that math. And so I think it depends on
what you're looking for it to achieve.
I want the best possible resultsfrom the thinking because if I'm
spending time. Pay the 200.

(51:32):
You'll get O3 Pro and it's, it'squalitatively better in a way
that you will notice it. It has a resonance to it where
like the insights it has stick in my head and I'm like chewing
them over in a way that I haven't had with other models,
which is super interesting. Well.
I'll have to try it. Well, with that, we are out of
time. A huge thank you to Nate Jones.

(51:56):
Thank you so much for taking your time to be with us today.
It's so valuable for us when you're here.
It was such a delight. I enjoyed it.
A tremendous thank you for having me, Michael.
I'm glad we got to talk about prompting.
I felt like we could have gone on for hours because it's such
an interesting topic, but I think we really got to a lot of
cool stuff over the course of this 60 minutes together.

(52:16):
And thank you to everybody who watched.
Now, before you Go, subscribe tothe CXO Talk newsletter so you
can be part of our community. We have amazing shows coming up.
We have the chief technology officer of AMD coming up, and we
have all kinds of amazing people.

(52:36):
So go to cxotalk.com, subscribe to the newsletter, and we'll see
you again next time. Thanks so much everybody and
hope you have a great day.

All Episodes

Prompt Engineering Advice | CXOtalk 883

Episode Transcript

Popular Podcasts

Bookmarked by Reese's Book Club

Dateline NBC

Stuff You Should Know

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Prompt Engineering Advice | CXOtalk 883