All Episodes

July 18, 2024 47 mins

In this week’s episode of the Generative AI Meetup Podcast, hosts Mark and Shashank dive into the latest advancements in generative AI. They kick off with a detailed exploration of Mistral Nemo, a new AI model that has set a benchmark with its unprecedented 128K context window. The discussion then shifts to an intriguing development at Microsoft, where a specialized LLM is being designed for optimizing spreadsheet functions. Join us to understand how these innovations are shaping the future of AI and why they matter to developers and businesses alike. Whether you’re a seasoned AI enthusiast or just curious about the technology shaping our future, this episode is packed with insights you won’t want to miss.

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
And hello everybody, welcome to the podcast.

(00:05):
So for those that are just tuning in, I'm Mark and I'm with my co-host Shashank.
We have a weekly podcast where we talk about everything generative AI and sometimes we kind of deviate from AI.
So in a certain sense, it's the gen, so we call it the gen A, I meet a podcast,

(00:29):
but in a certain sense, it's kind of like the general AI podcast because we just got to talk about whatever.
But we run a meetup event in Silicon Valley, just some general housekeeping before we get started.
There are some bigger events that we have coming up.

(00:49):
So Thursday, July 25th, 2024 at 6 p.m., we have a meetup in Palo Alto,
which is going to be an exciting one where we will be building intelligent AI agents.
So Shashank will put the details of that in the podcast description.

(01:09):
So if you're going to be in the area at the time, we'd love to see you there.
I think you'll be able to learn quite a bit about AI agents.
And if you want to know more about AI agents and kind of get primed beforehand,
Shashank and I, we talk a little bit about AI agents in the last podcast.
So feel free to listen to that one.

(01:30):
And then on August 20, hold on, let me get the date.
August 22nd, we're going to do one, another larger event in Mountain View at the Databricks office.
So it is, they're going to be talking about their new programming/prompting framework.

(01:55):
So it's going to be a thing that can be used to build agents and/or program-large language models
called DSPY.
It's a pretty cool project.
It's all open source.
And they're going to have four people kind of talk about how they built it,
how they use it, how we can use the DefineTune models.

(02:17):
So it's a very, I think it'll be a very exciting talk.
Last time we did an event at Databricks was absolutely awesome.
They talked about how they built their GPT-4 class model, which is all open source.
When we went there, I felt like when I left, I kind of had like the building blocks that I needed

(02:38):
to build my own GPT-4 class model, which was fantastic.
So I hope to see you all there.
But yeah, we do meetups once a week every Thursday.
But yeah, there's not a ton in the news.
We got like a few topics that we're going to discuss.
First one is Mistral.

(02:59):
Mistral AI.
They came out with a brand new model today, which is very exciting.
Shashank, do you maybe want to describe what the new model is?
Yeah, I was just taking a look at it today.
They call it Mistral Nemo.
It is the best state-of-the-art model of its size compared to all the other smaller models,

(03:25):
close to 10 billion or smaller.
The cool thing about this, apart from the fact that it aces the benchmarks in a variety of
metrics, is that it has a ridiculously long context window compared to all the other models
of its size.
So, for reference, all the open source models from Google, Facebook, Microsoft, they have

(03:50):
like an 8K 8000 context window, which is roughly 8000 words or so.
This one has 128K context window, which is massive.
More than 10X, the size of the other competitors.

(04:11):
They haven't talked too much about how much training they did, how big the dataset was, or
how much money they spent trying to do this.
But they partnered with NVIDIA, so I would assume they have a lot of resources behind them
right now, at least for this particular model.
Like you were saying before we started the podcast, given a company this size, it's really impressive.

(04:37):
They're a tiny French startup with, I would say, dozens of employees, as opposed to like
OpenAI, Google, Microsoft, Facebook, which have large, large teams.
So, yeah, really cool, tiny model, state-of-the-art.

(04:59):
I think it is open source too, so you can probably download it and run it locally.
It is open source.
Yeah, I found it.
It is hosted on Hugging Face.
Awesome.
So, I tried to run it.
It seems like I didn't actually try that hard.
I'm slightly bigger than the other one.
Yeah, exactly.
I wasn't able to run it super easily on my computer, but also I spent like two minutes trying,

(05:23):
so I just didn't have a lot of time to play around with it.
Yeah, it's very exciting because I remember when we were talking about like Lama 3, 8B,
which is like Meta's, like, Sash Facebook's new small model, it was around like chat GPT

(05:46):
level quality.
So like GPT 335 quality, which is I think, what did they come up with that maybe a year ago?
Something like that.
And this, so Lama 3, 8B has about GPT 3.5 low quality, which is like, you know, chat GPT
took the board by storm.
This one, it seems like based off the benchmarks, it is slightly better or we can even say about

(06:11):
equivalent to like Lama 3, 8B.
So this mistrial model is probably around chat GPT level quality, but the big difference
is the context window because chat GPT I think only had like a 4,000 token context window,
something like that.
It was either 2,000 or 4,000, I can't remember.
But this is 128,000 so I mean like 50X, 60X, I mean, that's crazy.

(06:36):
It's like wild.
And the fact that you could probably run this on your computer, probably like a high end
MacBook Pro, something like that, you could probably run it on.
That's fantastic.
So that is really democratizing what these models can do.
So you won't need to hit a server, you won't need to pay for an API.

(06:57):
This is 100% free and I think it'll open up like a lot of interesting use cases because
it's bringing these large language models literally just for everybody, for the masses.
So I don't know, Shashan, we were kind of talking about this before on like large models versus

(07:21):
smaller models and kind of where their place is.
Like do you have like any ideas of where we would maybe see some like use cases for these
smaller models and like when we'd want to use it and like what types of use cases we'd
want to use it for over someone like the state of the art models.
Yeah, that is a good question because I think most of the time I default to using the best

(07:50):
models from either chat GPT or open AI I mean.
So that's the GPT 40 or Google's Gemini Advanced or if I'm through perplexity I have access
to the cloud model.
So most of the time I default to using these large models but if I have some side project

(08:14):
that I'm trying to automate and do some research then I have some agents that run repetitively
trying to crunch some data, do some tedious tasks that don't really require that kind of
processing and that kind of intelligence.
So for those use cases I try to use the smaller models to both for the speed, the efficiency

(08:38):
and also cost because like you don't be spending a bunch of money if an agent runs wild.
If you leave it running for a couple hours got for a bit like a whole day and like rocks
up huge bill with open AI's larger models.
You don't want that.
But I think agents is definitely a popular use case for using smaller more targeted models

(09:07):
that are better at certain domains and also to like fine tune.
I think it's much easier to fine tune these smaller models than larger models with less
training data because they have less memory to kind of replace and it's a lot easier to
tailor something, it's one of these smaller models for a specific use case.

(09:30):
And yeah, I mean you have a lot more freedom.
You're able to run it locally, control it more as opposed to these black boxes which are
these larger models from these larger companies.
You just don't have that much control and ownership and customization options.
Yeah, yeah, that's very true.
Funny story, not even really to AI but talking about like just rising costs.

(09:57):
I remember I think I had just graduated from college so I'm dating myself now but I think
it was around 2016 maybe.
I was working on a side project and for some reason I think I had maybe like published some
API keys on GitHub or something.
It was probably that.
It was probably the publishing API keys on GitHub.

(10:19):
But did someone find it?
Somebody found it very quickly and spun up like I think it was like a bunch of Bitcoin miners
and they just like made like the highest and like EC2 instances that you can possibly
have.
So it just like spin up a bunch of like virtual servers on it was Amazon and I was like 22

(10:41):
or 23 at the time and I had no money and like I was looking at the bill and I was like
the heck it was like $40,000.
Oh my god.
And I was like what is going on?
You need to let you spend that much.
I don't know.
I have no idea.

(11:02):
Luckily I talked to the fine folks Amazon and they refunded all of the money for it.
Which was good.
Were you charged that amount?
No.
It didn't like go to my credit card but it's like monthly billing so it goes like I think
it's like at the end of the month like you're supposed to pay and I was like hey look like
so you saw a charge in your credit card for 40 something.

(11:24):
No no no.
Like I because I logged into my Amazon web services in the billing.
It was like I can't remember.
It was like thousands of dollars and it kept on like just like going up and up and it turns
out that like pro tip.
Amazon has lots of data centers all over the world and in AWS like Amazon web services portal

(11:50):
you only will actually interact typically with one data center at a time.
I hate that.
It's so complicated to figure out what resources you have.
So it's like you know you might be like in Virginia you might be in Ohio you might be
in Japan you might be in Seattle whatever.
And like I shut all down all the servers for one data center but then I didn't shut
them all down for all the other ones.

(12:11):
And I couldn't figure out why it was still like increasing.
So anyways after a couple days cleaned it all up.
Amazon was like really great about it.
Amazon fantastic company.
Highly recommend them for all your data center needs.
They you know helped out like a poor college student.
Yeah.
So yeah I don't know that just reminding me of that.

(12:33):
So nothing to do with gender of AI but yeah I remember doing something like that with
the Amazon not publishing my API case but having some instances running in some other geolocation
and I forgot to turn it off.
And this was one of my previous startups before I joined Google and we shut down the startup
but the servers were still running.

(12:54):
And I was being charged a couple thousand dollars.
I think it was roughly 10k or less.
And I was like this is crazy.
I haven't done anything.
These resources are just running idly but it's it's racking up a massive bill.
And I was like can you guys you know do something about this?
Like I haven't used anything.
I'm not even sure what I'm paying for.

(13:14):
And they actually were really nice and we've that bill too.
But trying to bring this topic back to Jenny.
I think these larger models have gotten really good at finding these zero day vulnerabilities
in your source code, other people source code to try to prevent these things from happening.

(13:37):
So if you're working on a project on GitHub or elsewhere even having an LLM like peer review
your code before you publish, submit whatever is seems like a good idea to find issues before
they happen.
Yeah, yeah that's right.
Although I don't know if it would necessarily get you published in API keys.

(14:00):
Now just public service announced it's never ever ever write any API keys in your code ever
start the most environment variables.
But yeah, don't write it like directly in your code.
Learn that the hard way because sometimes like you just accidentally make publish it and
make sure however you're writing those environment variables are also not being uploaded to your

(14:22):
source code.
Because oftentimes they're like stored as a plain text file on your local environment and
you need to sometimes ignore that file from being included in your Git repository.
Yes, yes, definitely good advice.
And actually now that I remember I don't know if it was 40k or was 15k.

(14:44):
Anyway, I remember it was over 10, I don't know, it felt like infinite money because I didn't
have that much.
So all I know is I didn't have enough money to pay it.
So thank you, thank you very much.
Anyways, speaking of AI stuff, Andre Carpati.

(15:04):
Yeah, so Andre Carpati, I think we talked in like a previous episode how he was leaving
open AI.
And we finally found out what he's going to be doing.
So he was doing much stuff, he published some open source courses where he was teaching
people how to build stuff.
But he's coming out with a new company called Eureka Labs.

(15:31):
So Eureka Labs is really exciting.
All we have is a landing page.
But if it's from Andre Carpati, I think you could know for fact that it's going to be
a probably pretty legit.
So it will be basically using AI to help teach you things.

(15:52):
So he said, imagine you're trying to learn physics and you have Richard Feynman helping you
every step of the way.
I would imagine like maybe you could have like Albert Einstein trying to teach you relativity
and sitting down with you.
But instead of like Albert Einstein who isn't alive anymore, Richard Feynman who also died
a long time ago.

(16:13):
You could have an LLM trained on some of the greatest teachers and thinkers of our time,
which may just help level up the world's knowledge.
So that's really exciting.
We don't know much yet.
There isn't a lot of info, but incredibly exciting.
So we'll be keeping you updated.

(16:35):
So as we learn more, we'll tell you about it.
Yeah, it seems really cool.
These seems like the perfect person to approach this problem.
Not only is he an expert in AI having spearheaded the efforts both at OpenAI back in the day and
also recently and in between those two stints working at Tesla and furthering their AI

(16:58):
efforts.
Before that, he led a bunch of AI courses at Stanford.
I think he created the first deep learning class for computer vision models.
And his first prototype for this new company, Yuri Kalabs, is to build the world's best AI

(17:21):
course, LLM 101.
I feel like his background in teaching in AI, he's going to approach this in a really cool
way.
And this is incredibly exciting.
And I was younger in school.
I was like, you know, learning a tedious subject with a boring teacher is just sucking the fun

(17:45):
enthusiasm out of me.
And it really, you really need some enthusiasm from your teacher who is guiding you, who is
showing you this whole new world and building up your curiosity, building up your imagination
to see the beauty of a subject of math, of chemistry, whatever it is.

(18:11):
And yeah, like he mentioned, if you could have Feynman teach his physics, oh my god.
I remember watching the Feynman lectures and he's just so charismatic, so passionate about
what he's doing that he just rubs off on you.
Even if you don't know anything about physics, listening to his lectures is just magical.
Yeah, Richard Feynman is like a really cool guy.

(18:33):
I read this book or I didn't read it.
I am terrible, like actually physically reading a book, but I listen to the audiobook because
I think I was on a plane.
And you know, when you have it where your body is busy, but your mind is free, where you're
cooking, you're walking, you're like running or something like that, driving.

(18:55):
It's not the time where you could actually sit down and read a book.
I love audiobooks.
Yeah.
My attention span, I've lost something.
I think it's the whole world right now.
You don't have to justify it.
Yeah.
But anyways, audiobooks is fantastic.
So there was a book about Richard Feynman called, I think it was called Surely Your Joke
Mr. Feynman.

(19:16):
However that one.
And it was just like full of different anecdotes, little stories about Richard Feynman.
It seems like he was a master, like, safe cracker, where he would go and then try to, like,
just break into people's safes.
And it's a lot of fun.
I have a luck pick set here too.

(19:36):
Do you?
Yeah.
A lot of fun.
I didn't know that.
Have you ever been able to pick a lock before?
A bunch of locks, yeah.
Really?
Sometimes I got stuck out or some friends lost their keys and I just went there with the
lock pick set and then jiggle it open.
Man, it sounds like a really just useful life skill.
It is very useful.
Ethically, of course.

(19:57):
Sure.
Yeah.
We don't condone just like picking any random lock.
Not in a public podcast.
No.
No, that would not be good.
No, definitely not.
We don't do that.
A hundred percent not.
But yeah, I mean, it seems like good.
I mean, like, you do like accidentally like lock your keys in your car.

(20:17):
I don't know, like, you lock that with like your gym lock or something like that.
Or even if you just forget the combination on your master lock.
Yeah.
Because then like, that's a hassle.
So anyways, like I would love to get a lock picking course by Richard Feynman.
I think that'd be pretty cool.
I don't know if he's the best teacher for lock picking, but he's doing something else.
I don't know.
Or like, he also worked on the man, the man hats on project, right?

(20:42):
Yes.
Yeah, with like the first nuclear bomb.
And he also came up with like a bunch of physics stuff.
Yeah.
I'm ready.
I am so ready.
Okay.
All right.
Nice.

(21:03):
Next topic.
Let me just pull up the notes here.
Oh, yes.
Microsoft.
Microsoft is working on LLM's for spreadsheets.
Oh, that's cool.
Yeah.
Yes, full.
So not like a huge new story, but I feel like it's little improvements like this, which

(21:27):
are just going to like really improve our quality of life.
Because right now, like I've tried to play around with making spreadsheet formulas with
an LLM.
So one thing that I was trying to do a couple months ago actually was during tax season.
I wanted to try to calculate like what I would expect my taxes to be.
So I said like, okay, based off of like the marginal tax back is based off of my income,

(21:50):
like how much money am I expected to pay in taxes?
And I tried to make that spreadsheet with an LLM because I didn't want to do it by hand.
It couldn't figure it out.
It was really struggling to do that.
What did you give it?
Just the raw data?
Did you give it like the schema of the data and have it write some functions to transform

(22:13):
that data?
So what I did when I was giving the LLM the data is I took.
So for I know that we have listeners from all over the world, not just the United States.
So in the United States, we have what is known as a, I think it'd be called a progressive
tax system.

(22:33):
So basically as you make more money, the next dollar is like taxed more.
So I think it's like the first like, I'm just gonna make up some numbers.
It's like the first zero to like $10,000 that you make are taxed zero.
You don't pay any tax on that at all.
And then like let's say the next $10,000 or from like if you make $10,000 to $20,000, that next

(23:01):
$10,000 might be taxed let's say 5%.
So if you made $20,000, you'd be paying, you made $20,000, took home $20,000, but then let's
say you're only getting taxed on 5% of that $20,000.
5% of that of the $10,000.
So like that means you keep the 10,000.

(23:23):
That's yours free and clear.
That's zero percent.
And then the next 10,000, what if 5% of that is, what is that?
50 bucks or something?
Yeah, so 50 bucks.
So then you're in the 5% tax bracket, but then you're only paying like 50 bucks tax on the
$20,000 that you had.

(23:44):
So and then like it just steadily increases and then like once you make over, I forget it's
like 200,000 something, then like your tax, like everything over that is like 40 or 50%.
It's like, I think it's 37%.
Anyways, it's pretty high.
So what I did is I took those, those rates.
So like from zero to 10, 10 to 30, 30 to 70, 70 to 120, whatever.

(24:11):
And I took all of those tax brackets and I said like, hey, Chashan Bati, why don't you make
me a spreadsheet that can take an arbitrary amount of income and then calculate what the tax
rate is.
What I wanted to do is I wanted to find out what the difference between fairly filing
married or sorry, filing single versus filing jointly.

(24:34):
So whether it made any difference at all, it turns out that it didn't, it didn't make any
difference, but I was curious if it would make a difference.
So I tried to have it make that spreadsheet for me and it struggled, it struggled hard
to do it.
I haven't tried recently, maybe like with the new models, it'd be able to do that, but

(24:55):
I'm excited to see if this new LLM from Microsoft for specifically working in spreadsheets would
be able to do such a task.
Also what I wanted to be able to do is I wanted to be able to take like, so you know, I track
like my finances.
So like, you know, we could sometimes see like what type of transactions were made like

(25:17):
take the credit card, put in a spreadsheet, you analyze it.
I wanted to do all the analysis for me.
So I think that'd be like really cool.
So it could help you figure out, like make more informed financial decisions based off
of like all the data that you have.
Because right now, like a lot of that is just like a manual data wrangling because it's
all from a bunch of different places, a real pain to work with.

(25:40):
So if Microsoft can figure that out, that would be fantastic.
It would really improve the quality of life.
Yeah, that sounds incredible because I think most of the startups today are running on
spreadsheets, not any fancy databases, but just like regular old spreadsheets that are
being wrangled by PMs or even like the CEO of small companies.

(26:04):
I thought that Microsoft would be the one, or Microsoft would have done this way back
when like last year when they announced co-pilot for their workspace suite of tools.
But I think they were just doing the bare minimum of plugging in chat GPT and having it see
the entire spreadsheet, which is what you were doing, I assume.

(26:26):
And I don't think it can understand structured data, especially like numbers and correlations
between different tables, etc. in a meaningful way.
So looking at the blog post, it seems like they have like restructured the way they're

(26:46):
trying to build this LLM.
They are adopting a new tokenizer to try to encode this information that is in the spreadsheet
and multiple tables and columns, try to compress that information in some meaningful way that
the LLM can understand.

(27:08):
And they're also using, you know, in regular LLMs, we use chain of thought prompting.
We're using a chain of spreadsheet prompting where maybe you can explain chain of thought prompting
and also a tokenizer.
I feel like lots of buzzwords, lots of jargon.

(27:29):
So yeah, first maybe we'll talk about what it told us.
Yeah, start chain of thought.
I think it's a little simpler to understand.
So when solving a math problem, for example, you multiply two numbers.
How do you do that?
You don't just magically have the answer for like a multi-digit, like a four number digit

(27:53):
times a two number digit, like one, two, seven, five times 56.
It's like you're not just going to magically boom.
Here's the answer.
You break it down.
You're like, okay, take the six, multiply it by the last digit, carry over the extra number,
add it to the one to the left.
So reasoning about a problem and multiple steps and collecting all those intermediate results

(28:22):
and combining that with the next step and reasoning to come up with a cohesive answer
is like a chain of thought method.
So that usually results in a much more thoughtful answer than having one single prompt.
So I don't know, what would be a good example for a chain of thought reasoning step?

(28:46):
That's tough.
I don't know.
Okay, I'm looking around and there's, I have like this handled broken in my cutting board
or something.
Tell the LLM, okay, how would I fix this?
Think about your steps, assess the issue, write down the materials required to solve this

(29:11):
problem and give me like a comprehensive step by step.
And then they'll be like, okay, fine, let's look at the problem.
This is what's broken.
Okay, how do I fix it?
Maybe this is what the materials that I could use.
Thinking about this in multiple steps is chain of thought prompting.
So it sounds almost like the, like the socratic method or something like that where they,

(29:34):
like just like formal logic.
It's like, okay, like if this is true, then this is true, that means this is true.
It's like, I think it's like the classic example is like, oh, if all dogs are brown and
I have a dog, therefore my dog is brown.
Not that like, that's actually correct, but like, you know, it would be like, you know,

(29:57):
if those, if those statements are true, then the socratic method would lead you to the
correct answer.
Exactly.
And I think a chain of thought prompting my employee, you know, strategies from formal logic
and socratic methods, etc.
But it's just a general strategy to think in multiple steps before coming up with an
answer.

(30:18):
Yeah.
Very, very cool.
And then you also brought up tokenizer.
Yeah.
So, I think this was an issue to address the fact that computers don't think in natural
language.
They think in zero-zones, they look at numbers, they do mathematical operations, matrix

(30:40):
multiplications, linear algebra, etc.
And the researchers were trying to figure out how do we get these machines to understand
human language, real world images, video, etc.
So they came up with a way to compress, just sticking with natural language, compress
natural language into tokens that represent the language.

(31:05):
So they started, I forget the, you know, complete breakdown, but it is a way to decompose
words into smaller components that make sense by themselves.
And compress all of this information of natural language to fit it into some vector of

(31:32):
numbers that computers can do operations on.
Yeah.
I always kind of thought as a tokenizer is like, just like a program that can take some
training data and then basically break it up into meaningful chunks.
So it could be words, but it just like, like specific chunks that have meaning.

(31:58):
So like a specific word may not be enough because a, there's also things like capitalization,
punctuation, like if I have a number, yeah, numbers, proxylation marks, proper names,
stuff like that.
Also, it would be like potentially like, what is it?

(32:18):
Like hominem's, like two words that, or is it hominem's?
No, because I was thinking two words that sound the same, but it spelled differently, but
what I mean is just like different, same spelling, different meanings.
Yeah, that's what I meant.
So that's what I meant.
That would have same spelling, different meaning, hominem's.

(32:39):
Okay.
So I did get it right.
Anyway, so I think that like, you know, often sounds like context is important.
So I can tokenizer will try to figure out like, often times look at the meaningful chunks
are and how they relate to each other.
So anyways, now that we have chain of thought and tokenizer, just as a quick example.

(33:00):
So if we were to break down the phrase, I love dancing, period.
So I would be a token.
Love is small enough.
It would be a token.
The period at the very end would probably be a token by itself.
Dancing would probably be broken up into dance and eng, because those two can be shuffled

(33:25):
around and used in different contexts by themselves.
Because like, eng is a very common present continuous.
It's kind of like doing a part of the subject that can be applied to a bunch of different
words.
And surfing.
Exactly.
So that by itself can be a token that can be identified in a bunch of different words.
And it still retains the same meaning.

(33:46):
It's like the present continuous form of some word.
And dance on the other hand is also, you know, self-contained.
It has its own meaning.
Yeah.
Yeah.
So anyways, with that being said, now that we talked about chain of thought, also fun fact,
the Mistral Nemo with an amazing context window, great at multimodal performance, also

(34:11):
both a new tokenizer.
I think that was partly the reason why they achieved such great performance.
Because I feel like we're reaching diminishing returns in terms of what these new companies
are able to do at a certain scale of model size.
And this model, you know, it is better than all the other models.
But it's just a few percentage points better at all these different benchmarks.

(34:33):
So to get significantly more performance, they're going to have to rethink their entire
approach from the ground up.
And that includes the tokenizer and not to mention everything else in the stack, including
you know, what kind of architecture your GPU has and what model you're using maybe make
modifications to the transform or architecture itself and so on.

(34:56):
Yeah.
You know, that's a really good point.
So I was going to bring us back to the spreadsheet.
But I feel like this is a really good digression.
So let's go along this digression a little bit.
So I think that's, I remember we were having a conversation actually last week at the meetup
where we were talking about tokenizers.

(35:17):
And apparently tokenizer is actually one of the hardest things to build in your LLM.
Yeah.
I think everyone is just using open AIS tokenizer.
Because it's easy and it's proven to work at scale.
Not everyone wants to go through the trial and error process of building a new tokenizer.

(35:39):
Yeah.
So it seems like tokenizers are almost like both like combination like both art and science
to try to figure out like, you know, how you're going to break up the tokens and like how
you're going to chunk the data.
So exciting.
I think it was like under Carpati said that like a tokenizer is one of the most important parts
of making like a state of your LLM.

(36:00):
And then I also, it was like the first step.
Yeah, that's right.
And then the step to the first step would be getting the trading data.
Well, sure.
I mean, there's lots of little steps.
I mean, like you also need a computer to run it on.
You need to be born in the first place.
Yeah, you know, it's like the, I think it's like the famous joke.
It's like in order to create an apple pie.

(36:21):
The first universe.
Chris, you know, exactly.
And Richard Friedman said that.
I don't know.
I feel like he's so do what it said that.
It was Russell who said that.
I think it's Richard Friedman, yeah.
Well, now I was thinking our good mutual friend Russell, who said that, but or call, call
say again, one of those people.
Yeah.
Somebody said that.
Anyways.

(36:42):
But I was listening to an interview like a while ago where G, when GPD four first came
out, it was a Lex Friedman interview with Sam Altman.
And they said that there wasn't any like major architectural difference between like the
GPD, like catching between GPD four, even the GPD four is way better.
It was just like a lot of little improvements that created something significantly better.

(37:04):
So like potentially things like a tokenizer, it would be like a little mini improvement
that you made that can make life model way better.
So exciting.
It's bringing it back to spreadsheets.
Spreadsheets.
Now that we've defined all our terms.
So yeah, what does that even mean to build a tokenizer that can understand, you know, tabular

(37:29):
data that is interesting?
I feel like they probably need to retain some relation between the things in the same
column, you know, relation between the column headers maybe because you know, if you have
a bunch of numbers that are related to, you know, expenses versus a different set of numbers

(37:52):
related to income, you kind of want to maintain that relationship.
Yeah, yeah.
And when I read the blog post, apparently they had to strip out a lot of relevant data
in order to do this.
So one thing they mentioned is oftentimes like the color of a cell will be meaningful.

(38:12):
Like maybe like, oh, like this cell becomes like red because it's like bad or like this one
is green because it's good or maybe like we're like losing money.
We're winning money.
Whatever it is.
And apparently they stripped all of that color information out because it just became too
unwieldy to work with.

(38:33):
So I don't know what else they stripped out, but we'll see like actually how useful this
is.
And the best is that if you have just like plain vanilla data that isn't like too complex,
maybe it'll work well, but like I would assume as the complexity grows, it may just get
worse and worse dealing with it.
Yeah, I wonder how this compares to some of the elements that they've come out with

(39:01):
help people's sequel queries more easily because that's somewhat similar.
If you're trying to do some business intelligence tasks to get some insights about your customer
data or something, there are, I think data breaks actually has some tools to help you write

(39:22):
sequel queries with the LM, with the help of the LM, you tell it what you want to do,
what kind of insights you want to get.
And then boom, it gives you this query, you run it and it crunches the numbers and gives
you the data.
Yeah, you know, I don't know.
It seems to me like it's a little different because I feel like it's a little more complicated.
I think the Excel one is more complicated.

(39:43):
Yeah, because I think the thing with sequel queries, you don't actually like care that
much about like the data, but you maybe the schema.
You care a lot about the schema.
I mean, I think you'd care a little bit about the data.
No, I don't think you do.
Well, you'd probably care a little bit about it.
Like so, for example, if I have like a date, maybe I want like something like after like

(40:05):
a certain date.
So like I might care about that or like maybe I want something within like a certain range
of numbers.
So they're like care a little bit about the data, but I think that, but like the data isn't
in the context window at all.
Right.
Exactly.
Like the schema is sufficient because like if I have a date, then like I know that like,

(40:26):
okay, I can like pick a date like after this date before this date, maybe after this
date within this date range.
Anything for like like a string, like if I have a name, maybe the name starts with the letter
A, whatever it is, right?
But when you have spreadsheets, like spreadsheets can do like so much more.
I mean like you can go and then embed like graphs into your spreadsheet, right?

(40:51):
Like I can put like images into there.
I can color it.
You can't color your SQL data.
I could have out another value that says color.
Well, I mean, sure.
Like you can go and then have like a cells refer to other cells and do like a formula on that

(41:12):
cell, which like updates another cell, which updates another cell, you can't do that in
SQL.
I mean like you could write a program that like interacts with it, but like spreadsheets are
way more powerful than SQL, like and significantly more complex.
Also like the data that you can store is like huge.
I mean like people run like entire businesses on just spreadsheets.

(41:36):
I mean spreadsheets are like, I think spreadsheets like, I would say spreadsheets are a superset
of SQL databases.
Yeah.
They're probably heavily powered by these same databases that we're bashing on, but they
are a superset.
Yeah.
But like I think spreadsheets are fantastic.
Like fantastic.

(41:57):
It's amazing, right?
And oftentimes enough for most people as opposed to going to an actual database or like a programming
language, right?
Because like, you know, maybe like with like if I have like a Python program, I can make
something even more powerful than like a spreadsheet, but then like has to do like all
this programming and like formal logic and get a computer science degree and like think

(42:21):
about all this stuff.
It's just like, it's too much.
But with like a spreadsheet, you could take like an hour course or just like kind of, you
know, mess around with it and then like build some like really awesome useful tools.
Like I could track my spending with a spreadsheet.
I could do my taxes with a spreadsheet.
I could like, I don't know like a, like keep track of my health data.

(42:46):
There's so much stuff that you can go and do with just like a spreadsheet.
Anybody can do it.
The period entry is really low.
I can store it as a file.
I can send it around.
I can upload to the cloud.
Like it's, it's fantastic.
Like spreadsheets and like like specifically like Microsoft Excel and like Google sheets are
I think some of it's just like the best tools that like regular people could use for just

(43:11):
like interact with their data.
Yeah.
I wonder how this would, this new Microsoft's spreadsheet element would perform to let's
say GPT-5, which is, you know, an order of magnitude better than today's state of the art
elements.
We'll be a soon.
We don't know that.
You know, let's say the next version.

(43:33):
Sure.
Whatever that's called, whatever that's released.
Because that wouldn't, maybe I don't know, what do you think?
So the question is, is would this surpass like the new system model?
Yeah.
I think it might, just because this seems like it's purpose built for this particular type

(43:58):
of data.
And I found that when I have, so like one thing that I was doing is I was trying to keep track
of all my investments.
So I put all of the investments into a spreadsheet and I said like, hey, Cheshire, can you analyze
this?
And it said like, oh, you have.
And I was like, hey, like what do you think I should do?

(44:19):
With my investments that said, oh, I think you should sell your Nvidia stock.
And I was like, I do own Nvidia stock.
So it's like, and it's like, oh, yeah, you're right.
You don't own Nvidia stock.
But like you should sell your TSMC.
I was like, I don't own TSMC.
It's like, I don't know what you're talking about here.
So like it gets confused.

(44:40):
So like I don't know if like scaling it up will solve fundamental issues.
But that seems like it's fundamentally broken.
So I think that scaling up the model won't make it like better.
It'll just maybe make it like a better idiot, not sure.
So I think that like you might need to have like some sort of like purpose built architecture

(45:08):
that will help work with spreadsheets directly.
Because however, GPT-4 is parsing the spreadsheets, it doesn't do so very well.
Yeah, according to this paper, they beat GPT-4's vanilla analysis by like 25% or so.

(45:29):
So it seems pretty significantly better.
But we'll see how the next generation of chat GPT, you know, if it just swallows all the
other competitors who are trying to focus on these niche use cases.
Well, I mean, you would think that Microsoft may have some insight onto what OpenAI is building.

(45:51):
So maybe they would think that like it's not going to be obsolete.
You would think, I mean, it seems like they have like some sort of financial interest in
OpenAI.
And I think even within the same company, especially large organizations, corporations, there's
competing efforts to try to solve the same problems.

(46:14):
Yeah, yeah, that's true.
I mean, why do it once when you can do it twice?
It's better to...
Anyways, that is also another strategy that we use to get the best answer out of LLMs.
And we have them just come up with an answer for the same question multiple times and pick

(46:35):
the best answer.
I mean, that does work.
Yeah, actually.
Much better performance.
Yeah.
Yeah.
Or even just like the same question to multiple LLMs and then we better them to kind of compare
with each other.
So like, hey, chat GPT.
Make sure...
Not make sure of my touch, but like combination of experts.
Make sure the Claude Opus, it told me this.

(46:59):
Like, this seems like a better answer.
How do you have to say?
Can you give me a better answer than what Claude gave me?
Or find Lupils in this answer.
And then, yeah, that would work.
So anyways, we are about out of time.
I had no idea we'd be just talking so much love about spreadsheets.

(47:20):
I feel we probably lost half the audience, like after the first 10 minutes.
But if you're still with us, you're a true member, a true OG.
So we thank you for listening to us opine about spreadsheets.
So opine the right word, I think so.
Anyways, exciting stuff.
So anyways, thanks so much for listening and we will catch you in the next one.
Advertise With Us

Popular Podcasts

United States of Kennedy
Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Bookmarked by Reese's Book Club

Bookmarked by Reese's Book Club

Welcome to Bookmarked by Reese’s Book Club — the podcast where great stories, bold women, and irresistible conversations collide! Hosted by award-winning journalist Danielle Robay, each week new episodes balance thoughtful literary insight with the fervor of buzzy book trends, pop culture and more. Bookmarked brings together celebrities, tastemakers, influencers and authors from Reese's Book Club and beyond to share stories that transcend the page. Pull up a chair. You’re not just listening — you’re part of the conversation.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.