Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Okay, let's untack this. What if you could take the
raw power behind tools like chat GPT and mold it
to your exact needs. We're talking about going beyond just
you know, chatting with an AI to actually building intelligent
applications and specialized assistants.
Speaker 2 (00:14):
Exactly Today, our deep dive is all about the open
Ai API, focusing on how it empowers well anyone really
to create custom AI solutions. Our main source for this
is Henry Habib's Open Ai API Cookbook, which Packed Publishing
put out in March twenty twenty four.
Speaker 1 (00:32):
And Henry Habib he knows his stuff over a decade
and AI and productivity right, and he's a big believer
in this citizen developer idea that you don't need to
be like a hardcore coder to build amazing things. He's
also the guy behind the Intelligent Worker newsletter.
Speaker 2 (00:45):
That's right, and Sam Mackay, the CEO of Enterprise DNA.
He actually calls the book an essential guide for knowledge
workers eager to harness the power of open AI and
chat GPT to build intelligent applications and solutions. High praise.
Speaker 1 (00:58):
So your mission for this, a deep Dive listener, is simple,
get a shortcut really understand how to use the open AIAPI.
We're focusing on the practical stuff, those real aha moments.
You've heard of AI chat GPT. They're everywhere, constantly talked about.
But what's cool here is how actionable they are. We're
going to show you how to turn ideas into reality.
Let's start with the basics. Why the API matters. It's
(01:21):
more than just the chat box you see online. I mean,
chat GPT's growth was just insane, wasn't it. One hundred
million users in two months. That's faster than TikTok, which
took what nine months. It really brought natural language processing
NLPE to the masses.
Speaker 2 (01:35):
Absolutely, and the API it takes that democratization way further.
It's a genuine paradigm shift. It means anyone can generate
really human like text from simple prompts. You don't need
a PhD in machine learning anymore. It's not just for
the big players like Typeface or Jasper Ai building on
top of it. It's for you integrating that power into
your own stuff.
Speaker 1 (01:53):
And the open II Playground is kind of the perfect
place to start messing around. Yeah, like a sandbox. It's
got three main parts the system message, the chat log,
and then the parameters. The system message is where you
tell the AI who it should be, like you are
an assistant that creates marketing slogans, simple as that shapes
its whole persona right.
Speaker 2 (02:13):
And it's fascinating because the model isn't understanding like we
do no thoughts, no feelings. Think of it more like
super advanced autocomplete. It predicts the next word based on
patterns from tons of text data. So you put examples
in the chat log, say you give it company makes
ice cream, and then that apply sham the ice cream
that never melts. You're guiding those predictions. You're kind of
(02:34):
training it right there to follow patterns like starting with
sham and ending with an exclamation.
Speaker 1 (02:39):
Mark that makes sense, guiding the probabilities. Okay, so once
you've got your prompts working well in the playground, you
move on to making real API requests, maybe using something
like postmam. And this is where it gets really powerful
because you're not just watching it work, you're controlling it
with code programmatically. And for an API request, there are
like four main things you need. Right First is the
(02:59):
endpoint that's URL, the address you're sending the request to
like https, dot API, dot openI, dot com, forward slash
v one chat completions exactly.
Speaker 2 (03:08):
Then there's the header. Think of this as containing important metadata.
It tells open Ai what you're sending, usually content type
dot application JSON because Jason is just a standard way
for systems to swap structured data. And critically, it says
who you are with your authorization bearer, your API key.
That's your secret handshake with open Ai.
Speaker 1 (03:27):
Okay, So header is like the envelope details, and the
body is what's inside the envelope.
Speaker 2 (03:31):
Correct. A body is a Jason object. It holds the
specifics like which model you want to use, and the
messages that's your system message and chat log content. And
finally you get the response back from open Ai. That's
also Jason containing the AI's output. It's choices and usage
data like how many tokens you used?
Speaker 1 (03:48):
Cool? But okay, let's break out of just text. The
open Ai API can do more than just words, can't it?
Multimodal stuff? Oh?
Speaker 2 (03:55):
Absolutely, Beyond text. You've got image generation with Dally. The
newer versions Dally two and three use this technique called diffusion.
You can kind of picture it like starting with TV
static and slowly clearing it up until an image appears.
It's pretty neat. But the key with images, unlike text maybe,
is you have to be super specific in your prompts.
Just saying a dog gets you, well, a random dog,
(04:16):
but a brown, furry, medium sized CORKI doog on a
green grass field profile view that gets you much closer
to what you actually want. It raises an interesting point.
Text generation can infer context sometimes, but image generation it
needs precise descriptive language. Ambiguity is your enemy here.
Speaker 1 (04:32):
Good point need to be crystal clear. And it does
audio too. Transcripts.
Speaker 2 (04:35):
Yeah, the audio endpoint uses the Whisper model for that.
It transcribes audio files.
Speaker 1 (04:40):
Ah and technically for file uploads you need to use
form data instead of JSON in.
Speaker 2 (04:45):
The request right exactly. Jason is great for text data,
but form data is built for sending files, kind of
like attaching something to an email. It handles lots of
formats dot MP three, dot MP four, dot MPEG, dotwave,
dot web, dot WebM quite a few.
Speaker 1 (04:59):
So you could transcribe a meeting maybe easily.
Speaker 2 (05:02):
And the real magic starts when you chain these things together.
Imagine a voice assistant. Voice comes in whisper transcribes it,
chat Api figures out a response, maybe Dali even generates
relevant image.
Speaker 1 (05:12):
Okay, that's starting to sound really powerful. Now, let's talk
about fine tuning the dials and knobs as you called
them in the book.
Speaker 2 (05:19):
The parameters, right, The parameters let you control the AI's behavior,
and the model parameter is probably the biggest one. Usually
you're choosing between GPT three point five and GPT four.
GPT three point five has what one hundred and seventy
five billion parameters. GPT four is estimated to be way larger,
maybe over one hundred trillion parameters across a bunch of
models working together. More parameters generally means the model is
(05:40):
better at capturing subtle patterns and understanding complex instructions. So
GPT four tends to be more reliable, better with nuance.
It actually scores higher on things like standardized tests, EP calculus.
Speaker 1 (05:50):
The lsat Wow, and you can see that difference in
the outputs. Can't you like that example in the book
asking for a sentence about Mars with six five letter words,
GPT three point five up the word count right, It
gives our Mars strip felt vast, new, cold.
Speaker 2 (06:04):
Hard, grand grand isn't five letters.
Speaker 1 (06:06):
Exactly If GPT four gets it, Mars Red World, Brave Crew,
Deep Space finds life. Perfect for the cigarette question how
many chemicals? How many harmful? How many cause cancer? Just
the numbers. GPT three point five gives you a paragraph.
GPT four just answers two hundred and fifty sixty concise
even logic puzzles. GPT four tends to reason more accurately
than three point five, and GPT four has a bigger
(06:28):
memory too. The context win much bigger.
Speaker 2 (06:30):
Like GPT four thirty two k can handle around thirty
two thousand tokens maybe twenty four thousand words. GPT three
point five max is out around four thousand tokens about
three thousand words. Big difference if you're feeding it long documents.
Speaker 1 (06:43):
Okay, but there's a catch, isn't there cost?
Speaker 2 (06:45):
Huge catch? GPT four can be twenty to forty times
more expensive per token than GPT three point five. It's significant.
So the practical advice for you is always start with
GPT three point five. If it does the job great,
you save a lot of money. Only upgraded GPT four
if you absolutely need that extra reasoning power or the
larger context window.
Speaker 1 (07:04):
That's a massive cost difference. Why is it so much
more just the size.
Speaker 2 (07:08):
Primarily, Yeah, it's a much bigger, more complex model. Just
takes way more computing power to run each request. Think
supercomputer versus calculator.
Speaker 1 (07:15):
Gotcha? Okay, another parameter dot N that controls how many
answers you get back right.
Speaker 2 (07:20):
N sets the number of responses can be any whole
number for chat, but max ten for images. Super useful
for brainstormings, logans, getting different options, or for checking consistency,
maybe ab testing outputs.
Speaker 1 (07:29):
And the interesting thing you mentioned is the cost isn't
linear like N three isn't three times the price?
Speaker 2 (07:34):
No, it's often much less, maybe sixty percent more, not
two hundred percent more, which tells you something cool. The
AI isn't just running the request three times separately. It's
likely batching the computation somehow finding efficiencies. It's an optimization
hint clever.
Speaker 1 (07:49):
Okay, what about temperature? That one sounds a bit abstract.
Controls creativity.
Speaker 2 (07:54):
Yeah, temperature basically controls the randomness or let's say, creativity
of the output. It goes from point zero to two
point zero th of it, like tuning a radio. Low
temperature maybe twoint zero too point eight is like a sharp,
clear signal, very focused, consistent factual responses. Good for things
like code generation data analysis where you want deterministic.
Speaker 1 (08:12):
Output and higher temperature more static, more like an eclectic
mix station.
Speaker 2 (08:17):
Yeah, higher temps, say one point two to two point zero,
make the AI take more risks with word choices. It
flattens the probability curve for the next word, so you
get more diverse, unexpected sometimes more creative results. Great for brainstorming,
writing stories, generating slogans.
Speaker 1 (08:30):
So for general use like a chatbot, maybe somewhere in
the middle point eight to one point two exactly.
Speaker 2 (08:35):
Balance is making sense with being interesting.
Speaker 1 (08:37):
So the advice is start around one point zero and
tweak it by like zero point two increments.
Speaker 2 (08:43):
That's a good practical approach. Yeah, see what works for
your specific need.
Speaker 1 (08:46):
Okay, makes sense. Now let's shift gears to building real applications.
Usually you don't just have your app talk directly to
open AI, right, there's often a back end layer in between.
Speaker 2 (08:57):
That's right. The typical flow is from tend what the
user sees talks to your back end, and your back
end talks to the open AIAPI. This back end layer
is crucial. First security, it keeps your precious API key
safe hidden from the user's browser. Second control, you can
process the input before it goes to open AI or
(09:18):
clean up the output after it comes back. Plus it
lets you integrate other services, hand logins, all that stuff.
Speaker 1 (09:23):
And for that back end, serverless options like Google Cloud
functions are pretty popular.
Speaker 2 (09:28):
Very popular, yeah, because you don't have to manage servers.
It just scales automatically. You write your code, upload it,
and Google handles the rest. You set up an HTTP
trigger so it could be called like a web address.
Allow unauthenticated calls maybe for testing, but be careful in
production and define your entry point function.
Speaker 1 (09:46):
And then for the front end the user interface. You
can use no code tools like Bubble, so anyone can
build the app part exactly.
Speaker 2 (09:54):
Bubble lets you visually design your web app and connect
buttons and inputs directly to your back end cloud function.
It's incredibly empowering.
Speaker 1 (10:02):
Let's walk through an example, like that email reply wrapper
from the book. You could do it in chat GPT, sure,
but building it yourself really teaches you the whole process.
So you start in the playground testing proms, get the
Python code, then you put that logic into a Google
Cloud function that's your back end. It takes the email
text as input, adds your API key. Secretly, you'd tell
it to use say GPT four, maybe a higher temperature
(10:23):
like one point four for creator replies, set N three
to get three options, maybe limit topens to five to one.
Speaker 2 (10:28):
Right, and then you'd use Postman to test that cloud
function directly, make sure it actually returns three email replies
in the format you expect. Once that's working, you jump
into Bubble. You build the input box for the original email,
a button to generate replies, and maybe three textboxes to
display choice one, choice two, choice three. Use bubbles API
connector to link the button press to your cloud function
(10:49):
URL and display the return choices. And really understanding this
whole playground, Cloud function, Postman, Bubble. That's the fundamental pattern.
Master this and you can pretty much any intelligent app.
Speaker 1 (11:01):
That's a great point. It's the core loop. What's a
common sticking point when people first try this? Getting the
data flow right often.
Speaker 2 (11:08):
Yeah, getting the JSON right in the requests and responses,
making sure API keys are correct and secure, little syntax things.
Postwind really helps debug that before you even touch the frontend.
Speaker 1 (11:18):
Okay, so that's a solid foundation. But let's get to
something really cool, something you can't just do in the
standard chat GPT interface easily. The multimodal travel itinerary app.
That sounds awesome.
Speaker 2 (11:30):
It really shows the power of orchestrating multiple API calls.
The idea user toxicity gets back a detailed one day
plan morning, afternoon, evening activities and three AI generated images
matching those activities.
Speaker 1 (11:43):
Wow, okay, how does that work behind the scenes in
the cloud function.
Speaker 2 (11:47):
So first, because this involves multiple calls, including image generation,
which can be slow, you need to increase the cloud
function's timeout limit maybe to three hundred seconds five minutes,
just to be safe.
Speaker 1 (11:57):
Good practical tip.
Speaker 2 (11:59):
Then one uber one uses the chat api GPT four. Specifically,
it takes the city name. Crucially, you give it a
detailed chat log with examples what the book calls fu
shot prompting. You showed examples for Rome, Lisbon, et cetera.
Format it exactly how you want warning activity, afternoon activity,
evening activity. This force is GPT four to follow that
structure precisely. It stores the resulting itinerary text.
Speaker 1 (12:21):
Got it. So the structure comes from good prompting and examples.
How do the images get generated?
Speaker 2 (12:25):
That's call number two, also chat API, but this time
using GPT three point five Turbo one one oh six.
Its only job is to take the itinerary text from
call one and create three short descriptive prompts suitable for DELI.
Like if the itinerary mentioned the Colisseum, Vatican and Trevy Fountain,
it might output Colisseum and Rome, Vatican City Interior, Trevy
(12:48):
Fountain at night. Just the prompts separated by a pipe symbol.
Speaker 1 (12:51):
Ah. And you use GPT three point five here because
it's cheaper and the task is simple. It doesn't need
GPT four's nuance exactly.
Speaker 2 (12:58):
The user never sees this intermediate p output, only the
final images, so three point five is perfectly adequate and
much more cost effective for this specific step.
Speaker 1 (13:06):
Smart resource use nice optimization. Okay, So now you have
the itinerary text and three image prompts.
Speaker 2 (13:10):
Right, So call number three hits the images API using
DELI THII. Your code loops through the three prompts from
call too, making a separate API call for each one
to generated image. It collects the URLs of the three
generated images image rolls Finally, the cloud function bundles everything
up and returns a single Jason response containing the itinerary
text and the URLs for morning image, afternoon image, and
(13:31):
evening image.
Speaker 1 (13:31):
And then in bubble you just connect those pieces input
for city button, a big text area for the itinerary,
and three image elements. You map the JSON fields from
the cloud function response directly to those elements. That's really slick,
combining text and custom images on the fly like that.
Speaker 2 (13:47):
Very cool. Okay, let's switch tracks slightly. Building knowledge assistance
this is huge. Standard chat GPT is great, but its
knowledge is kind of frozen in time right, and it
can sometimes just make stuff up hallocin. You can't easily
to only use this specific document precisely.
Speaker 1 (14:03):
That's where building your own assistant comes in, using the
API combined with your specific trusted knowledge source. A basic
way to do this covered in the book is PDF analysis.
Your app takes a PDF link and a question. The
cloud function fetches the pdf, uses a library like pipdf
two to scrabe all the text out of it. Then
it stuffs that entire text into the prompt along with
the user's question, and sends it off to GPT four
(14:26):
so it just crams the.
Speaker 2 (14:26):
Whole PDF into the context window every single time. Yeah, coefficient,
it can be. It works, but yeah, limitations. It only
gets text, no images from the pdf. It struggles with
really huge documents, and the biggest issue is that context
window limit. If your PDF has more words then the
model can handle like those three thousand words for GPP
(14:48):
three point five or twenty four thousand for GPT four
thirty two. K. It just won't work properly, right.
Speaker 1 (14:53):
But there's a better way now, isn't there with the
newer assistance API.
Speaker 2 (14:58):
Oh yeah, the assistants APIs specifically with its built in
knowledge retrieval tool, is a total.
Speaker 1 (15:03):
Game changer for this What makes it so different.
Speaker 2 (15:05):
It's incredibly smart. When you upload your documents, PDFs, word docs, etc.
To an assistant with retrieval enabled, open AI automatically handles
the hard parts. It breaks the documents into manageable chunks,
creates embeddings for each chunk, those unique numerical fingerprints we
talked about, and stores them efficiently. Then when you ask
a question, it uses vector search to instantly find only
(15:26):
the most relevant chunks of texts from your documents related
to your question.
Speaker 1 (15:30):
So It doesn't read the whole document every time, It
just finds the relevant paragraphs.
Speaker 2 (15:33):
Exactly, which means there's effectively no context window limit for
your knowledge base. You can upload massive files or hundreds
of documents and the assistant intelligently retrieves only the necessary
snippets to answer the question. Incredibly efficient.
Speaker 1 (15:49):
That sounds amazing. How do you set that up? Still?
Speaker 2 (15:51):
Start in the playground, yep, The playground is great for
creating the assistant itself. You give it a name US
Constitution Expert Instructions answer questions based only on the provided
constitution document. Choose a model like GPT four to eleven
oh six Preview, which is good for this. Then the
crucial step you toggle on the retrieval tool and then
you upload your knowledge file like a PDF of the
(16:13):
US Constitution. Once it's created, you grab the unique assistant ID.
Speaker 1 (16:17):
Okay, assistant created, knowledge uploaded. Then the cloud function code
uses this assistant ID.
Speaker 2 (16:22):
Correct. The Python code for your cloud function becomes a
bit different using the assistants API. First, you create a thread.
Think of thread as a single conversation session. Then you
add the user's question as a message to that thread. Next,
you tell the assistant to run on that thread, providing
the assistant ID and the thread ID. Now here's a
(16:43):
key detail for the book's code. You need to wait
a bit. The assistant needs time to process, search the
knowledge and formulate the answer, so you might add a
time dot sleep or similar pause. After the pause, you
retrieve the list of messages from the thread and the
assistem's answer will be the newest message.
Speaker 1 (17:01):
Okay, that pause is important. And the bubble front end
for this probably simpler.
Speaker 2 (17:07):
Much simpler for this use case. Yeah, just an input
boxer the user's question, a button and a text box
to display the answer returned by the cloud function.
Speaker 1 (17:14):
And the result is you can ask specific questions like
how many senators are there or what's the age requirement
for a senator and it pulls the answer directly from
that constitution pdf you uploaded exactly.
Speaker 2 (17:25):
It grounds the AI in your specific source material. It's
incredibly powerful for legal teams, medical info, company knowledge bases,
educational tools, anywhere you need reliable answers from a defined
set of information.
Speaker 1 (17:39):
Wow, we've covered a lot, from just understanding the API
basics to playing in the playground, making direct calls, adding
images and audio. Then building actual apps with back ends
and frontends, optimizing costs, and finally creating these powerful knowledgeable
assistance tied to specific documents. You've really gone from just
using chat GPT to understand how to build with its
(18:01):
underlying power. You're equipped now to actually create things.
Speaker 2 (18:05):
Yeah, and it brings to mind something Paul Siegel, a
tech entrepreneur, wrote in the forward to Henry's book, You said, Essentially,
I strongly encourage you to use this knowledge to create
your next successful app or business, or simply to enrich
your thinking about how to innovate. Dream on it, then
fashion your dreams into a reality with the tools you've
gained here. I think that sums it up nicely.
Speaker 1 (18:24):
Great final thoughts, So the message is clear, don't just
use AI, build with it, Go experiment, see what you
can create.