Chris Van Pelt: Machine Learning Tooling, Weights and Biases, Entrepreneurship | Learning from Machine Learning #9 - Learning from Machine Learning

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
These models are going to get better.

(00:02):
They're going to do more amazing things.
It's an exciting time for us to be in.
But as these models get generally better,
this problem of like, all right, well, when it fails,
knowing how it fails and doing everything we can
to like inform the user and protect against it
is going to become even bigger
because we're going to start trusting these things more.

(00:22):
How did the best machine learning practitioners
get involved in the field?
What challenges have they faced?
What has helped them flourish?
Let's ask them.
Welcome to Learning from Machine Learning.
I'm your host, Seth Levine.
Hello and welcome to Learning from Machine Learning.

(00:44):
On this episode, we have a very special guest,
Chris Van Pelt, the co-founder of Weights and Biases,
the co-founder of CrowdFlower and Figure 8,
and somebody who's dedicated his career optimizing
ML workflows and teaching ML practitioners,
making machine learning more accessible to all.
Chris, it is an absolute pleasure to have you on the show.

(01:06):
It's a pleasure to be here.
Why don't you start us off with what attracted you
to machine learning?
Yeah, this was quite a while ago.
But I remember all the way back in college,
studying computer science in the early 2000s,

(01:30):
talking about machine learning.
But in my college years, it wasn't something
that I immersed myself that deeply into.
It wasn't until a little later, early in my career,
I moved to the Bay Area in 2006 to work
at a startup called PowerSet.
And PowerSet was a startup that was really ahead of its time.

(01:53):
And that was where I first got immersed
in the world of machine learning.
So at PowerSet, we were oddly enough
doing a lot of natural language processing, which
is a hot topic these days.
But we were using a very different approach,
a rules-based, heuristic approach to language modeling.

(02:15):
And we had licensed technology from Xerox Park
and brought on a lot of these very learned professionals,
PhDs in the field, tackling very hard problems
around language understanding and how
we could apply that to search and make a better search product.

(02:36):
It was at that company I also met Lucas B. Walt, who I've now
founded two companies with.
And that is what really launched my career in AI and ML.
My co-founder, Lucas, actually studied machine learning
and had been working with models throughout college

(02:56):
and in his career.
And I'm the full-stack web developer
that landed in this hot and exciting space
that has had the blessing of being
able to create tools for who I consider some of the most
impactful and interesting engineers out there building
the next generation of products and solutions

(03:19):
on top of this stuff.
So exciting times for sure.
I'm glad I landed at that startup in the early 2000s.
Awesome.
How would you say that your background as a full-stack
engineer sort of prepared you for the machine learning world?
Well, I mean, I think the core thing, what I consider to be

(03:39):
like most important when you're an engineer building
products is thinking about the end user experience.
Like how is the world going to interact with this thing?
And I think the same is true and often a lot trickier
with machine learning models.
Like the second you introduce one of these models,
you suddenly have this thing that's
like right some percentage of the time.

(04:03):
And by design, it's going to be wrong.
So thinking about how end users are going to experience that
or ways in which you could potentially make the end user
experience better when they need to get involved and kind
of handle those cases where the model is wrong,

(04:24):
I think has made me hopefully more
than maybe hopefully a better engineer and developer
when it comes to actually bringing these machine learning
models into the real world.
Nice.
Yeah, as a machine learning practitioner,
I get to use my favorite quote like once a week,
all models are wrong, some are useful.

(04:47):
George Box, he was a statistician.
I don't know if he was necessarily talking
about machine learning, but it's still a fun one to get to say.
I'm glad I got to just say it also.
Moving forward a little bit to weights and biases,
which is just an absolutely incredible tool.
I've been using it for a better part of like five years
for every part of my machine learning life cycle

(05:11):
for my projects.
I use it for a bunch of personal projects.
I'm now using it in industry.
Why don't you tell us in your own words as a co-founder,
what is weights and biases?
Yeah, you bet.
So our mission at weights and biases
is to build the world's best developer tools for machine
learning engineers.
So we're really interested in building really good tool.

(05:36):
I've always been a fan of tools.
To have a good tool, to have the right tool for the job
in the real world is there's nothing better.
I don't do a lot of handy work, but going to Home Depot
and looking at the different actual tools
is quite exhilarating for me.
I've always enjoyed that.

(05:56):
So weights and biases is building tools
for machine learning engineers.
So the kinds of tools that a machine learning engineer needs,
it was pretty obvious in the early days.
And as we've grown, it's become more nuanced.
There's like little pockets of the problem space

(06:18):
that we're always kind of going, hey,
is there a better way to do this?
How can create a better tool?
Cluster problems are like, well, you
need to keep track of what data you're training on.
It's always when you're modeling, the data is king.
So we created a number of tools to just have
a solid understanding of data lineage, data versioning,
being able to dive in and visualize, understand the data.

(06:41):
And then there's a lot of experimentation
in machine learning.
So when you're training a machine learning model,
it's not just the source code.
As a traditional software developer, it's like,
all right, I've got GitHub.
I know what the truth is, and I have CI CD running.
It's going to be the data, and it's
going to be some hyperparameters, some command line arguments
that you passed into the program that you're running.

(07:03):
And then ultimately, the weights and the biases
that you're creating when you've trained a model.
So weights and biases is an end-to-end ML ops platform
that helps engineers keep track of all of these things
and then conserve as a system of record for their day-to-day
development and understanding of how these models are
performing and how they can make them perform better.

(07:25):
Very cool.
Yeah, the amazing thing for me is I've
gotten to see how weights and biases has expanded over time
and my usage of it also.
I started out using it really just to keep track of things,
keep track of experiment results.
I think seeing loss curves was very illuminating for me.
I guess I had seen it in fast AI,

(07:47):
but there was something about seeing multiple runs all
in one place, which was really nice.
Towing around with sweeps and creating reports, all of it.
Tables is incredible.
Haven't gotten into Weave yet, but I'm looking forward to it.
And ML prompts also, which is really nice.

(08:07):
And speaking of, yeah, nothing better than a good tool, right?
I mean, the right tool for the job.
It's amazing how seamless it can be.
And also, when you don't have the right tool,
how frustrating it can be when you're
trying to do something, don't try to hammer in something.
You get a hammer, right?

(08:30):
From your perspective, how have the goals of weights
and biases changed since the onset?
Yeah, in the beginning, right?
This is like 2016.
This is a time, I'm sure, much of your audience
could remember, or maybe they were in the space at this point.

(08:53):
But TensorFlow was really the main player
in the framework space.
PyTorch really wasn't a thing.
Computer vision was the use case that everyone
was talking about and excited about.
This was a time when self-driving cars was the primary topic

(09:14):
around AI or applying ML.
The core problem that we set it to solve in the early days
was just keeping track of your experiments.
So state of the art at that time for just keeping
track of your modeling effort was like a Google spreadsheet
or an Excel spreadsheet.
So that was a pretty low bar.

(09:37):
And we just set out to make a tool that
was really easy to keep track of the experiments
that you're doing.
Like originally in the very beginning,
we didn't think putting a whole bunch of charts
into the product was necessarily needed.
Like the main problem we were solving
was just keep track of the actual experiments
and maybe what the final loss value was

(09:58):
or the final accuracy value was.
And then as we added more rich visualization features,
we saw users love it, so we really doubled down there.
The ways in which we've expanded was we
finally found we convinced ourselves
we had product market fit that we had created something that
was useful when we got teams like OpenAI to actually use

(10:21):
the product for work that they were doing
or to a research institute on a lot of the robotics
and autonomous vehicle work.
So then it became like, all right, well,
what other problems are there?
And this was literally just going out
talking to our customers or users
and hearing where their pain points were.

(10:41):
So the sweeps offering inside of Waste Advices
where we make it really easy to run a hyperparameter search,
initially it wasn't obvious.
It was kind of like, well, there's
good tools on the market that do that.
We don't think we're going to magically come out
and have the greatest hyperparameter algorithm that's
going to save everybody money.
It was just like, let's just make it as easy and as pleasant

(11:04):
as possible to run a hyperparameter search.
And since features like our model registry reports
was an interesting set of features
that came out of the reality of like, all right, well,
everyone's report or the end result
is very different.
The kinds of things you want to understand and know about

(11:25):
when you're doing computer vision, very different
than if you're making a financial prediction model
or some arbitrary classifier.
So we built this very flexible platform
to actually communicate these graphs and charts and results
around to customers.
And the product continues to evolve,
I think most recently.

(11:46):
The move from what when we started and for the past five
years, it was always like, OK, you build a model.
Maybe you take ResNet or some existing base model,
but you're going to fine tune it and you're
going to do all of this stuff in-house.
Now it's often, we'll just call out,
so like opening eyes API or some other API

(12:08):
and the kinds of problems and things you need to be concerned
about are different.
But they're similar in a lot of ways as well, right?
These are all machine systems that
have this probabilistic nature that are going to be wrong.
How do we evaluate and how do we try to make the user
experience as good as possible across it?

(12:30):
Yeah, absolutely.
Yeah, there's a certain flexibility
that's really nice with weights and biases
that you can use it for many different use cases.
Speaking of creating tools and sometimes you
have the intended use for tools, what's
a really unique use of weights and biases

(12:51):
that I guess when you were creating it,
you never really thought that it would be used for it?
Yeah, I mean, the weights and biases platform itself,
it's pretty versatile in terms of the core.
As you're building a product, you're like, all right,
well, what are the atoms of this thing?

(13:12):
And I remember we built this feature a few years ago
where we let people completely define their own visualization.
So we built it on top of Vega, which there's
like an altair is the Python framework that works with this
visualization framework.
Under the hood, it's all like D3, which

(13:33):
is a very cool technology.
But we wired up Vega such that users
could define their own custom visualizations.
And they could wire that up to any of the atoms
in the weights and biases API, these units of data.
And one of our engineers actually wired things up

(13:54):
and defined a custom visualization that was actually
like a role playing game, which I thought was awesome.
A complete misuse of both the core Vega spec
and the underlying data model, but a very cool demo,
nonetheless.
I think in the actual use cases of weights and biases,

(14:17):
I've been able to see some very cool use cases of machine
learning over the years.
One of my favorite examples is technology around agriculture.
So putting computer vision models
onto big tractors and combines and reducing
the amount of pesticides or chemicals

(14:38):
that need to be applied to control weeds in a field
has a massive impact on the environment.
It's really cool tech.
Like I went and saw one of the tractors,
and they have little NVIDIA boxes
like on the combine doing it.
And also not one, it's not the first place your mind goes,
where you're like, how could we use AI or ML

(14:59):
to make some impact in the world?
But yeah, the work we've done with John Deere and Blue
River around that has been really cool to see.
Very cool.
In terms of all of the things that
have been accomplished from weights and biases,
I'm sure that you guys have a nice roadmap ahead.
What are some of the things that you're most excited about

(15:20):
for the future for weights and biases?
Yeah, so I think the most exciting thing
is this next generation of tooling
for really the next generation of AI and ML engineers.
What's happened in the last year, year and a half
with the explosion of chat GPT, and now every data science

(15:42):
conference you go to is definitely
going to have the words like LLM or Gen AI
somewhere on a poster.
It's been just wild to see the whole industry
shift to this excitement around these large models.
The team is working on, all right,

(16:03):
well, what does a product look like where you're not
necessarily doing a lot of modeling in-house.
You're leveraging these tools, doing more prompt engineering,
doing more like the retrieval augmented generation space,
kind of hooking these tools together
and with agents and these more general purpose uses of LLM.

(16:27):
It's like, what would the world's best tooling
look like for that new world?
That's what the team's been working on over the last year.
And we're excited to finally release that in the next couple
of months here and continue to iterate on it.

(16:50):
As we found with our existing product,
it's like we make a swing, we try to make something as good
as we think it can be, and then through actually having people
use it and solve problems, we can iterate and make
it great and delightful.
So that's really the area we're focusing a lot on.

(17:10):
I think one of the big shifts there
is that from the start of the company,
we're selling a product to machine learning engineers.
These are people that understand the underlying math.
They understand probabilities and what
that means from an operational standpoint.
In this new world, we have a lot of just traditional software

(17:31):
developers that are now consuming these APIs
and building products on top of them.
So one of the challenges is, how do we
convey these core ideas to this new audience in a way that
enables them to build better products without a lot of the.

(17:53):
There's a lot of new, it's tricky.
There's going to potentially be bias.
You need to really think carefully about, OK,
when this thing fails, how's it going to fail?
You want to fail in a way that's least disruptive
to the end user.
So being able to build tools for this space,
it's really exciting.
And there's hundreds of other companies doing the same thing.

(18:15):
So we've got a lot of work to do, and we need to do quickly.
But it's an exciting space to be in.
Yeah, that's definitely one of the most challenging things
with machine learning versus, like, say,
traditional programming.
If it doesn't work for traditional programming,
you just get an error, right?
I mean, usually, most of the time,
unless it's something really weird.
But machine learning, you'll get an answer.

(18:37):
But it won't be right.
And with an API call, you will generate text,
you will generate some image, but will it
be useful for what you're actually trying to do?
And understanding the, I guess, the responsibility
that people have when they're creating things like that,

(18:59):
it's a real transition.
And it's tempting.
You can make a cool demo today.
Like, it's been so fun as an engineer having access
to this technology and to delight myself when I make something
and I'm like, whoa, it did that.
I can't believe it did it.
But that demo, where you then kind of script it
and you're showing your friends this cool thing you made,

(19:23):
it does not account for all of the weird edge cases
and things you haven't thought about in ways in which
another user is going to interact with this thing.
And if you just throw that out there,
you're not even going to know really if it's working or not.
The closest proxy you'll have is like,
are people sharing it and more people using it.
But even thinking about, all right, well,

(19:45):
how do I get user feedback?
I mean, this goes back to that first job I had here
in the valley, getting into machine learning.
When it's a search engine, how do you
know if you're a machine learning algorithm that return
results is any good?
Well, a good proxy is like, are people clicking on the results?

(20:08):
But it's a subtle gnarly problem.
And you need to really think about it
and have rigorous ways to evaluate and understand
if you're getting better or worse, because you're
going to have to change the prompt.
You're going to upgrade the model.
You're going to change things about your product.
And you need a way to actually measure,
is this thing good or bad without just sending it out

(20:30):
to your users and making them kind of yell, hey, what the heck?
This sucks.
Yeah, for sure.
And speaking of the ability to create demos,
maybe I'm not sure if it's over said or anything,
but something I've been finding myself saying,
it's easy to create a demo.
It's hard to create something for production.
And it's even harder to create something at scale.

(20:50):
Something can work a dozen times.
But is it going to work a thousand times?
How's it going to work a million times?
How's it going to work when there's multiple users
at the same time?
How's it going to work on all of these edge cases?
And I think that what we're seeing
is that especially with this generative AI,
you can't even test all of these things.
You can't even fully check it for prompt injection,

(21:12):
let's say, because until it's out there
and people are starting to use it for these unintended uses,
that's when you start to see all these crazy things come out.
But it's already kind of too late,
because it's in production.
Someone is using it.
They have, it's already exposed.
It's already out.
We're seeing lots of things like that happen

(21:33):
where people are putting out generative chatbots
for their customer service.
And it's just like, that's a terrible idea
to do that fully, to just be fully relying on that.
And there's obviously other examples too.
But yeah, speaking of evaluation,
it's really hard.
How do you know if your product is working correctly?

(21:55):
So yeah, something like search, it's very difficult, right?
You might want, you could quickly get results,
but are they the right results?
Recommendation engines, right?
You can quickly get results,
but are they the right results?
I think evaluation will always remain a problem,
especially because I think people put too much weight

(22:18):
on benchmarks as well.
I don't know what you're feeling is on that.
You think about that one?
The benchmarks are very generic, right?
So then, some will make an announcement and say,
hey, we're better than GPT-4 in like MMLU or,
I'm not even sure if that's one of the correct acronyms
of the 30 core tests that people are throwing out there.

(22:42):
And there's not, those are important.
It's good to have some general set of benchmarks
for different things that we're testing,
but they're very general and they're never gonna tell you
how good is this thing gonna be for my specific use case.
You're the only one who can answer that question.

(23:03):
And it could be hard to answer it.
So like going back to the search engine ranking algorithm,
well, how do you do that?
Well, it turns out you hire a bunch of people
who are trained often with a big manual
on here's how we define relevance,
which is already a pretty fuzzy subject,

(23:25):
like how relevant is something to a given user's query.
And then you have them label the data.
You look at a whole bunch of queries and results
and you have them on a scale of like one to four
or one to five, say how relevant a given result is
for a query and even then you're like, okay, well,
you have to, the query could be ambiguous.
It's hard to understand what a user's intent is

(23:47):
when they query.
These problems are very similar in the chat space
or having a user ask for something.
And then when the ultimate result comes out,
you have to, you need some way to measure,
well, okay, did this satisfy the user's question?
We see it in chat GPT itself.

(24:07):
We can give a little thumbs up or thumbs down.
Most people probably don't interact in that way.
When a user does, it's a really strong signal.
Right, so you should probably incorporate that data
back into your process and use it to make the model better.
But yeah, I mean, the good news is companies

(24:29):
have been working on this problem for 20 years.
The bad news is every individual has like a slightly
different definition of good for whatever they're doing.
So there isn't just this magic, I can buy this product
and it's gonna like solve this problem for me.
What you need, you need like really good tools to help you

(24:51):
ask the question and solve the problem,
which is why we built weights and biases
and hope we can really help a lot of people
put this rigorous process in place to be able to build
a robust data science machine learning function.
Yeah, absolutely.
One of the things that weights and biases has helped me,
it's like you try to get this one metric, right?

(25:14):
Like you try to get like, oh, okay,
F1 score is above 0.8, right?
But it doesn't really matter that much.
It's sometimes it's about how it's performing
on different segments of your data.
And I found that tables has helped me a lot.
I've been able to look at different probability distributions
for different classes and also just to see where there's,
to see where there's errors and to sort of segment the data

(25:37):
and then see, okay, in this particular type of conversation
that I'm analyzing, you know, this is what I wanna be looking
for, okay, I need to, these are like,
it helps me with error analysis basically
and to zoom in on those problems.
Because often, yeah, it's not just about one accuracy metric
or one particular thing, you have to sort of have this ability

(25:59):
to zoom in and zoom out.
And that's one thing like weights and biases
has really helped me with.
Yeah, I mean, this idea is like, you know,
a confusion matrix of like, I'm making a model to predict
like whether or not you have COVID.
Like if it, like false positives versus false negative,
it's like different for the use case.
Like I would, if I tell someone they have COVID

(26:22):
and they don't actually have it, probably not as bad
as me telling someone they don't have COVID
when they actually have it.
Right.
So how do you wanna optimize your model for these cases?
What can you do to really prevent that?
Like the case you don't want, right?
These broad like F1 score 80%, yeah, it means nothing.

(26:42):
How many times am I gonna be like lying to my user
about this thing that's really important?
Right.
Yeah.
It's whenever the cost of errors aren't equal
and it's always that case, right?
Cause the cost of errors are never the same.
So therefore the metric can't just be this overall metric

(27:04):
where you're treating true positives and false positives
or whatever, you know, true negatives and whatever.
All of your combinations in your confusion matrix,
each box matters differently
and you have to be able to somehow incorporate that.
And the only way you can really do that
is by, you know, segmenting it.
Especially when you're iterating, like, you know,

(27:24):
maybe I moved F1 score from 80% to 90%.
That's a no, of course, let's ship that model.
Well, wait, like look at those cases.
Did the cases get better or worse?
Cause maybe overall you got better,
but now you're like way worse on the false positive
or whatever case.
And that's really important to know.
Right.
Or you'll just get better at the majority class

(27:45):
and then you won't even ever detect the rare class
and you'll think, oh, okay, yeah, my model's better.
I know people just wanna know is this,
is model A better than model B,
but there's always some trade off.
It's never, very rarely do you ever get it
like across the board that one thing is better,
you know, categorically better than another model.

(28:07):
You know, I mean, like these models are gonna get better.
They're gonna do more amazing things.
It's an exciting time for us to be in.
But as these models get generally better,
this problem of like, all right, well, when it fails,
knowing how it fails and doing everything we can
to like inform the user and protect against it,
it's gonna become even bigger.

(28:28):
Cause we're gonna start trusting these things more.
Like I bet we'll never get rid of hallucination
because by definition of the way these things work,
there's some weird corner case or something weird
with the data that's gonna like be really bad.
It's very important to understand that
and do it we can to prevent users
from having a bad experience because of it.

(28:52):
Yeah, 100%.
Yeah, I know, I always find it so funny
like companies say we have eliminated hallucinations.
If you've said that, then don't trust that company
because they don't know what they're talking about.
It's like eliminating bias.
It's like, no, you have not eliminated bias.
You can try to minimize it, but you cannot eliminate it.

(29:13):
And if you think that you have,
then you didn't really fully think through your problem.
Yeah.
So just like looking at this space and, you know,
obviously like the last year and a half
has been this hype cycle, right?
But you've been in this industry, you know,
since like 2007, were there any other like big

(29:36):
revolutionary like step function things like this
that really created such hype?
Have you ever seen something like this
like chatGPT has created?
Not to this level.
I mean, this is astronomical hype and it like continues.
I kind of thought like, all right, people will chill.

(29:56):
But there's still like every conference I go to,
every company I talk to, they're, you know,
deploying a lot of resources to figure out
how generative AI is gonna change how they function,
how the world functions.
So this is definitely unlike anything I've ever experienced.
The closest is maybe the, yeah,

(30:22):
the hype around autonomous vehicles.
Really like when we first started weights and biases,
it was clear that, okay, deep learning was really
starting to work.
Like the things that the demos I was seeing,
how good these models were getting at just taking in pixels
and spitting out like what everything in that image was

(30:43):
or putting bounding boxes around important objects was,
I remember seeing examples of it being like, wow,
I did not think we'd be able to do this
when we were able to do it.
Right.
And I think you saw, you know, a ton of money go into
a ton of different companies trying to make
self-driven cars and predictions of having a self-driven car

(31:05):
before, you know, well before we actually were able to have it.
But, you know, now I'm going around streets of San Francisco
and seeing the ways cars drive by without someone in them
or taking rides in them, which is trippy.
Like you've been in it.
It's here.
It's a little bit longer than any of us had hoped,

(31:26):
but it's here.
You've taken one?
I think, yeah, yeah, a couple of times.
It's cool.
Creepy?
It's very cool.
Yeah, definitely a little creepy.
And I've seen it's gotten into, I love like writing in ways
because you like see some situation and you'll be like,
I want to get like a bag of popcorn and be like,

(31:46):
what's it going to do here?
We've got like construction codes.
Homeless person doing something crazy.
Let's like see.
I've always been pleasantly surprised.
Right, yeah.
Yeah, I don't know.
It's creepy.
Are there steering wheels or there's no steering wheel?
Yeah, there's a steering wheel.

(32:07):
You can even sit in the driver's seat.
Apparently you have to keep your hands off of the,
I haven't done that.
Yeah.
I get in usually in the back seat or something
and I'll take like a video because I'm still, you know,
when you see the wheel turning and it's going.
Yeah, it's pretty cool.
I guess it works.
It needs to stay within a certain area though, right?

(32:28):
It can't go outside of a certain area.
Is that how it is?
It takes some weird routes.
Oh, okay.
Like it's definitely like its route planner
is not just like Google Maps.
Yeah.
But yeah, I don't know how they license it with the city
or if there's certain like no go zones.
But they also like the tech on those things is nuts.

(32:50):
That is not a cheap vehicle to operate
and there's lots of light ours and all these things
that Elon doesn't like.
But you know, it turns out it makes the problem
a lot more doable.
But yeah.
Take in whatever senses you need to take in
to get that done.
You don't have to have it be some all knowing

(33:10):
omniscient sort of model.
It can take in multiple senses. Yeah, that's cool.
I need to look into it even more.
I don't know if I would take it or not.
I guess eventually that'll become commonplace.
You do it enough, you'll be exposed to it.
You'll be, you'll stop taking, you know, stop taking videos.
Come on, you know, it's exciting man.

(33:31):
It's, you should take it.
Yeah.
I'll come to San Francisco.
I'll get you a ride in one.
I appreciate it.
I would take a ride with you in a driverless car.
I would do it.
Very cool.
So with all of this hype and everything
that's happening in, you know, let's say natural language

(33:53):
processing, but really just like the machine learning world,
how do you view the gap between the hype and the reality?
So like what the promise is of all of this stuff
and then like where we actually are?
Yeah.
Well, like I said, I'm surprised that the, like,
where we're still like peak hype from what I can see.

(34:16):
So, you know, we're going to reach,
we're going to hit the trough of disillusionment
at some point.
This is the, you know, the Gartner hype cycle.
I think, you know, a big gap, like this space moves so fast.
You know, waste and biases has been around five years.

(34:37):
The amount of change, you know, the transformer architecture,
for instance, like wasn't a thing until 2017.
And now that's basically the most popular architecture used
in everything from the self-driven cars
to these language models.
And, you know, I'm sure there'll be another architecture

(34:58):
or changes to this architecture that
proved to be even more fruitful.
So, the, yeah, well, I think the speed is jarring.
And then when you get these big enterprise companies
figuring out how to use this new thing, they're slow.

(35:19):
Like they're still, you know, very much being cautious
and figuring it out.
And, you know, we're just sitting,
we're waiting for the number of transistors
that NVIDIA can pack into their gyps to go up, which it will.
And then these models will get better.
And I saw, there was like an interview with Sam Altman,

(35:43):
saying a lot of people think, like, oh, we'll get this,
like, AGI or even the couple weeks after chat GPT blew up,
everyone was like, oh, my god, this is going to, like,
change everything now.
It takes time.
It is the actual process of finding the killer use cases
for this and making it a core part of what you're doing.

(36:05):
It will take time.
I think, well, you look at, like, why
Combinator and the startups coming out of that now,
like, the majority are somehow connected to this space.
What was the original question?

(36:26):
What are the challenges going to be?
Yeah.
No, the gap between the hype and the reality.
Yeah, I mean, I think this is self-serving.
One of the big gaps is just better tooling,
like, having visibility into how these things are performing
and actually operationalizing it.

(36:49):
You know, I think that's the thing that's happened is,
you can use like GPT-4.
It does these amazing things.
But it's slow, and it's expensive at scale.
So then people are, all right, well, yeah,
we'll take Lama 2 and find, well, now you
need to have a robust like MLOps process and practice
to iterate on that model and understand its shortcomings
and prevent all of these safety-related issues.

(37:13):
So I think the gap now is that, yeah, there
aren't a lot of push-button-managed solutions
out there.
Often, the use cases of these things
are so specialized and unique that you kind of need
to build out some internal expertise.
And everyone's just kind of figuring that out now.
So I guess I'd expect all of this to get better.

(37:37):
But yeah, I guess I can't offer a win as soon as possible.
It's definitely what we're working on.
But it's clear this is not going anywhere.
And there's a ton of potential.
Like, I'm delighted by just like chat GPT on a daily basis
and thinking of ideas for how this could be applied

(37:58):
to different processes within organizations.
Yeah, 100%.
It's a really good brainstorm partner.
You could give it some ideas.
It could really, really helps out.
And you can have a nice little back and forth.
It generates very interesting ideas.
And then you were touching upon another interesting thing,
which was like the hardware that's involved with these systems.

(38:23):
And obviously, there's an NVIDIA, which is a huge player.
And then Google has their TPUs.
And then there's this new thing like LPU.
It's very interesting to think that now there's hardware
that's going to be designed specifically for these use cases.
So yeah, it'll be interesting to see
can we get whoever, those companies,

(38:45):
get the latency down to a point where
you can actually make an API call, let's say.
I guess there'll still be some challenges there,
no matter what, as long as there's an API call involved.
But if you're doing it locally, you also
made another really good point.

(39:05):
I think people tend to, it's like a new idea,
like a maximum viable product.
They'll use chat GPT to get a really good version of something.
Then thinking, oh, then when we scale,
we'll substitute it for Maestro or Llama or some other model.
But it's not that simple.
It's not really as simple as a plug and play.

(39:31):
Yeah, so I guess along the same vein,
what's an important question that you believe
remains unanswered in machine learning?
We've been in the space long enough to see what's happening
here.
We played with GPT-2, we played with GPT-3.

(39:51):
We thought these were cool.
We were telling our friends and family about it
and having them try it.
Right.
It wasn't until the really instruction fine-tuned and chat
GPT-stick stuff came out where it was like,
whoa, this is really cool.
But also, the models had gotten better at that point.

(40:16):
So you just plot that stuff out on a graph.
Like year thing was made and how good it was.
Like the main limiting factor is the speed and cost
of the chips running these things.
And all indications are they get better

(40:37):
if we're able to throw more computing power at them.
So it's a weighted game.
We're just waiting, essentially, for Moore's law,
which happens to be an exponentially increasing
phenomenon for these models to get better.

(40:57):
So the question to me is, all right, well, when does that
just mean we get AGI?
I mean, this is a big question for open AI.
Can we just continue to scale this thing up?
And we have a model that's generally, however we
want to define generally more capable than humanity.

(41:21):
That's a big unquestioned answer for me.
It's something I think about a lot.
I think what's been really interesting in terms
of unanswered or what I think will probably
be some of the most interesting stuff
in the next couple of years.
Is all the multimodal work that's happening.
So Gemini released their million token contact length,

(41:44):
which means now we can just throw videos in there.
And the stuff you can do with video is pretty cool.
Just in my own personal usage of chat GPT,
the image stuff has been amazing.
Like I can take a picture of something
I need transcribed or translated or I
want you to count calories in my refrigerator.

(42:06):
Like it's very cool what you can do just by adding imagery.
And then if we throw audio and video, the use cases,
and then if we make it faster to get input and output
into that thing, the use cases are boundless.
So I think that's a long winded way of saying

(42:26):
the main problem here is just like more compute that's cheaper.
And this is why NVIDIA stock is going to the moon.
Through the roof.
Yeah, absolutely.
And it's like I saw it too.
It's like I knew it was going to happen.
Should have gotten deeper into that.
Anyway, speaking of AGI, I think everyone

(42:55):
has a different definition for it.
Like slightly, I think.
Do you feel like you have a good definition for AGI?
Or so?
No, I don't have a good definition.
Well, I want to solve real science.
Like solve some hairy problems that our best scientists
can't solve.
Then it's like, all right.

(43:17):
Right.
It's achievement unlocked.
It can do it.
So that's like what?
Some unsolved math problems, some new protein thing.
Well, yeah.
People, they recently had a model like solve a proof
that none of us could solve.
So maybe it's here.
Yeah, but if you look into it, they had it do it like 1,000

(43:39):
times.
And then they had mathematicians review it.
And they found like, oh, OK.
A handful of times this actually worked.
I think that.
I don't know.
That's what I was reading about.
But yes, it's possible.
It's possible.
Now, I think it's really, yeah.
I mean, a lot of people way smarter than me.
I've spent a lot of time trying to define this.
So I'm not going to even attempt it.

(44:00):
But it's one of those things where you probably
know it when you see it.
I don't know.
I think it's going to be remarkable and scary.
But it seems like, I'll also say,

(44:21):
there's a long history of the machine learning
world kind of over-promising and under-delivering
when it comes to this stuff.
So I would not be surprised if it takes us longer
than the next generation of GPT here.
But I do think there's a reasonable likelihood

(44:44):
that in my lifetime, I get to see this, which is awesome.
Scary.
But I mean, like, wow.
Like, I managed to be put on this Earth during a time
when this evolved ape created this other thing that somehow
surpassed.
But it's just a very special time to be alive

(45:09):
and to have the privilege to be a part of the space
and kind of see it happen is pretty remarkable.
Yeah, absolutely.
It's like the most exciting time to be in machine learning.
Changing gears a tiny bit.
So you've been involved in two successful machine learning

(45:31):
companies.
What does it take to sort of take part
in something like entrepreneurship in a field
like machine learning where there's so much uncertainty?
What are some of the lessons that you've learned?
Well, I think lesson number one, you
have to love what you're doing.

(45:54):
And specifically with a start, it's like, well,
you need to love the people that you're selling software to,
the people you're solving problems for.
And for me, machine learning, the intelligence,

(46:14):
the thoughtfulness, the kinds of problems
that can be solved with it just made it something
that I could get very passionate about and put
a ton of energy into.
There's a lot of no one cares, especially in the beginning.
Like you're building this thing.
You think it's cool.
You care a lot.
You go out.
You share it with people.
And most people really do not care.

(46:40):
So you need to have grit to push through that,
to stay positive, to continue putting one foot in front
of the other every day.
I think others have given that advice just around persistence
and being able to keep trying.

(47:05):
But yeah, I guess for me, it's just like the main thing
is you can go to a conference with your users
and be energized.
That would be the main piece of advice.
Because if you don't have that, it's
going to be really hard to keep going when you haven't necessarily
found that product market fit or success in the space.

(47:29):
Right.
How did you know when you hit product market fit?
Is it a feeling?
Is it was there something that clicked where you had it,
or was just about having a certain number of users,
certain value that users were getting?
I feel like that's something that's very hard.
Like a lot of startup struggle with understanding,

(47:50):
like have I reached product market fit?
Yeah.
Well, there's like first, just getting users.
So that's big.
But there's a lot of things you could do on the internet,
especially if you have millions of VC dollars that give you
a bunch of users that aren't necessarily

(48:12):
ones that will stick around or be all that valuable.
Right.
And Lucas and I have always approached entrepreneurship
like as a small business that really
needs to earn every dollar and just make it work.
So early on for us, it was those initial conversations

(48:37):
with your very first customers where you're going to go,
all right, we want to charge you for this software.
You've got to come up with a price.
It's kind of a harrowing process.
But then to see customers actually say, yes,
we want to pay you this.
This is valuable.
And seeing them continue to engage with the product.

(48:59):
And it was probably like after a year of having paying
customers and seeing that they actually renewed.
All right, well, there's clearly something here.
But even after getting those first couple of customers,
it's like, we spent a lot of time with them.
We held their hands a ton.
Is this scalable?
Are we going to be able to find broader market fit here?

(49:22):
There's a lot of doubt in those early days.
Right.
Yeah.
Yeah, so I guess it's not just about users.
If you're creating a software product that anyone can use.
Because users can be, you can do anything, anyone
that's seen Silicon Valley.

(49:44):
Have you watched Silicon Valley?
Mm-hmm.
Yeah.
But it's not just about getting users.
It's about retention.
And actually have them continue to use it.
And being able to continue to see how they're using it.
And yeah, pricing is always very tricky.
Because it can't just be like, however much they're willing

(50:06):
to pay, you actually have to equate that value to something.
So yeah, that must be very tricky.
Any other lessons from that?
Well, in the beginning, though, it is kind of an exercise of like,
how much do you want to pay?
Right.
I mean, you're trying to price this product that
has no precedent in the market.

(50:27):
Yeah, it's wild.
But it is.
You're kind of pulling numbers out of a hat.
Right.
I see the other piece on users.
Like an example, with both Waste and Biasis and CrowdFlight
Figure 8, we engaged a lot with the academic community.
And you're not monetizing that community.

(50:50):
There's like no, like you might be
able to get a university to pay a little bit for the software.
But the amount of work and pain you're
going to have to go through to get that done is a lot and not
worth it.
And then you might be able to get a handful of the academics
to pay for the software.

(51:10):
But the dollars are going to be really small.
And they have pretty tight budgets
and don't generally want to pay for software.
But we always would invest in that community
because we knew that if you're doing this work in academia,

(51:31):
eventually you're going to get a job in industry.
And you'll want to use the tools that
help you do your best work in academia
and hopefully bring us along.
But the end goal of the business is always
to close those larger deals with the various enterprises.
So you've got to be really smart about how you do that.

(51:53):
And there is some tension between,
all right, let's give as much of this away for free,
while also being able to monetize for industry.
Right, because the value that you get from people using
your software, figuring out what breaks, what doesn't break,
what people are getting value from, that's invaluable.

(52:14):
But you also don't want to just, you can't just give it away
forever.
At some point, it's a business.
There's a certain bottom line that you have to start collecting
some sort of fee.
But it's very interesting.
You mentioned in the beginning you
were doing things that were more almost consultative.
So when you were small, you were doing things

(52:34):
that didn't necessarily scale.
But did you know at the time that that was the case
and that in hopes that one day you'd be able to reach a point
where it would?
I mean, in the beginning, it's just
like you're trying to get anyone who will engage
to continue engaging.
So that was priceless.
Like, yes, the founders will drive down

(52:56):
to Mountain View every week to meet with the team at Toyota.
That's invaluable.
Now, we can't keep doing that forever.
But it was right for us to do it.
And I think of it less as like consultative
is that's something as an entrepreneur you always

(53:17):
need to be really careful with.
Because you don't want to make a consulting company that's
building bespoke things for different people
where there isn't a central platform or service that
can have the benefits of scale across many, many, many
different customers.
So we were working very closely and addressing

(53:39):
specific problems that they were having,
but always stepping back and saying, like, hey,
is this generally useful?
Will this also be something that someone working
in this other space could benefit from when
deciding whether or not we actually productized it
and put it into the product?
At my previous company, CrowdFlight Figure 8,
that was helping customers generate labeled data sets

(54:02):
for their machine learning model efforts.
That would often turn into actual consulting,
which was really hard.
Like, we're using our own software
on behalf of the customer, or we're
going deep into their specific use
cakes and helping them design.
And that makes for a very different business dynamic

(54:22):
than just selling a software license.
Yeah, I think that's because getting annotated data
is so much harder than people think it is.
Because it's not just like, oh, get good data.
Like what you were saying earlier, like, what does good mean?
Right, you need to create a set of annotation instructions.
You need to create the tooling around it.

(54:43):
And you actually have to know somehow
if you're collecting it.
And then often, this task, it won't even be objective.
There'll be some subjective nature to it,
and there'll be this low inter-annotator agreement.
So how do you even measure if you're getting good data?
So I'm sure there were so many challenges there.
But yet, such an important problem,

(55:04):
such an important thing to try to solve.
And way ahead of the game.
Like that was back in 2007, 2008.
I mean, thinking about the data-centric movement that's
taken place over the last few years,
like you knew that a long time ago.

(55:24):
Yeah, we were definitely too early to market
with that first company.
But we learned a ton and got to work
with a ton of really impressive machine learning teams
over the years.
I wouldn't take it back.
That's good.
Yeah, I mean, you get to learn about some of the problems.
I see how well scale has done, which started 10 years

(55:48):
after we started and think like, oh, if we had just
timed our go-to-market a little differently,
but now they're awesome.
They've executed it amazingly.
Yeah, the timing of things, there is a certain,
I never like using the term luck,
but there is a certain luck to timing,
especially for entrepreneurship.

(56:09):
You have to be excited in developing this thing
at the right time when other people are,
where some amount of people are ready for it at least.
You need to have some customer base.
I think that when it comes to creating
SaaS and a tech company, you have a team filled

(56:30):
with forward thinkers.
And that's not necessarily who the buyer is at a company.
It might not necessarily be the most forward thinker.
They might be a little bit more on the conservative side,
not willing to take certain risks.
And then you have to try to show them value,
which can be really tough.

(56:52):
Yeah, just thinking about some things
and the challenges of entrepreneurship.
But also, that's what makes it fun.
And then you combine it with machine learning.
It makes it even more fun.
There you go.
Yeah.
So in your career, well, first off,
you've had some of the best titles, I have to say.

(57:13):
Chief Awesome Officer at one point.
Just your name or your initials at another point.
Pretty cool.
Any other really cool ones?
Yeah, I think Chief Awesome Officer, I just put on LinkedIn
for fun.
Oh, OK.
But CAO, it's got a nice ring to it.

(57:34):
It does have a nice ring to it.
CVP, that's my personal favorite because it's my initials.
It could also be corporate vice president.
Yeah.
Yeah, titles are there.
It's the title.
I suppose I like having a C title, but it doesn't.
My titles co-founder really at the end of the day.

(57:57):
And that's one of the things I love most about the job
is that I'll get kind of brought into anything at any time
and can be really versatile and just try to solve problems
pragmatically.
Yeah, that's what I was going to say.
It's just about solving problems so they

(58:20):
get to bring you in to solve problems.
Fixer.
You're the fixer.
The closer, the fixer, both of them, I'll give you this one.
What's one piece of advice that you would give yourself
or you wish you received 20 years ago, 15 years ago?
All right, this is great.

(58:41):
Yeah, yeah, yeah, yeah.
Find a hobby.
OK.
I think this is something I had, like other friends had told me,
like, yeah, I should do this.
Especially as an entrepreneur, it's always just like,
oh, there's not a lot of time.
My hobby is this project.
And I've definitely found that there's

(59:04):
only so far that that goes before you're just kind of burnout
and now you're worse off than if you had just spent
your 10, 20 hours of free time last week doing something else
that you're interested in or excited about.

(59:25):
So any hobbies that you want to share?
Have something exciting, interesting on the side.
The sad part is I still don't have a great hobby.
So you wish that somebody gave you that advice, I guess.
Yeah, exactly.
That's like legit advice, yeah.

(59:45):
I mean, the regulars, I enjoy reading.
I enjoy long walks, traveling.
But I don't think any of those quite qualify as a hobby.
I'm thinking I should go to the clay studio
and throw some clay or go weld some metal together or something.
But yeah.
I was going to say something maybe in the art realm.

(01:00:08):
Yeah.
OK, the final and the juiciest of questions.
What has a career in machine learning and entrepreneurship
taught you about life?
Oh, man.
Well, I'd say the entrepreneurship part has taught me

(01:00:29):
that there's the business, there's this idea, the customer.
All of these things we think about when
we think of the kinds of problems you're
going to have to deal with within a company.
The thing that I never thought about that much,

(01:00:50):
but is actually what I found to be the most important,
is the people within the company that you're creating.
You're hiring a bunch of folks to work on a problem.
But each of those individuals is another person

(01:01:11):
with their own problems, own stuff going on.
And the only way the organization is going to be effective
is if the people within it feel respected and treated
as humans with dignity.
And there's not some magic formula.
But this is what you do, such that everyone in your company

(01:01:33):
will now be seen as their full and true self.
But I think it's something important, especially
as an entrepreneur, as a leader in the company to think about
and to try to engage with as many people in the organization
as human beings as possible.
That's definitely a lesson.

(01:01:54):
I think the other piece that I've
learned in doing this over the years
is that I could still find that joy, that happiness of imagining
how to solve a problem and going out and solving it.
That being a creator, that's one of the aspects

(01:02:16):
of entrepreneurship that I love the most.
And it's been just amazing, even over the last six months,
to go and experiment with these new language models
and see what kind of side projects and tools
that I can create.
And I still have that same joy I had as a teenager when I made
my first website, and that's been awesome.

(01:02:41):
And just continuing to learn and to build.
That's awesome.
I love it.
I love it.
For people that are interested in learning more about you
or some of the work that you're doing at Weights and Biases,
where would you direct them?
Well, WB.com has information about the product itself

(01:03:04):
in the company.
There's also really cool links to different, what we call,
reports in the Weights and Biases platform, which
can be bits of research or analysis
or leveraging some of the new large language model stuff
we were talking about today.
That's really good content.
We have a YouTube channel, and we're on Twitter, LinkedIn.

(01:03:26):
Those are our primary social media outlets.
I'm on Twitter at VanPelt.
Ping me.
Hit me up.
It's been a pleasure.
Yes, it's been absolutely fantastic.
I really appreciate you giving me the time.
Thank you so much for the incredible work
that you're doing at Weights and Biases.

(01:03:47):
Thanks for letting me pick your brain for a little bit.
You bet.
This is fun.
Thanks for having me.
Thank you for tuning in to Learning from Machine Learning.
On this episode, we delved into the experiences
of Chris Van Pelt, co-founder of Weights and Biases,
gaining valuable insights into the current landscape
of the industry.

(01:04:08):
Chris explained the pivotal role of Weights and Biases
as a powerful developer tool, enabling ML engineers
to navigate through the complexities of experimentation,
data visualization, and model improvement.
His candid reflections on the challenges
in evaluating ML models and addressing the gap between AI
hype and reality offered a profound understanding

(01:04:30):
of the field's intricacies.
Drawing from his entrepreneurial experiences,
co-founding two machine learning companies,
Chris leaves us with lessons in resilience, innovation,
and a deep appreciation for the human dimension
within the tech line.
Don't forget to subscribe and share this episode
with your friends and colleagues.

(01:04:50):
Until next time, keep on learning.

All Episodes

Chris Van Pelt: Machine Learning Tooling, Weights and Biases, Entrepreneurship | Learning from Machine Learning #9

Episode Transcript

Popular Podcasts

Bookmarked by Reese's Book Club

On Purpose with Jay Shetty

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Chris Van Pelt: Machine Learning Tooling, Weights and Biases, Entrepreneurship | Learning from Machine Learning #9