Sebastian Raschka: Learning ML, Responsible AI, AGI | Learning from Machine Learning #4 - Learning from Machine Learning

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
How did the best machine learning practitioners get involved in the field?

(00:08):
What challenges have they faced?
What has helped them flourish?
Let's ask them.
Welcome to Learning from Machine Learning.
I'm your host, Seth Levine.
Welcome to Learning from Machine Learning.
In this episode, it's incredible to have Sebastian Raschke here, Lead AI Educator at Lightning,

(00:31):
former statistics professor at University of Wisconsin, the author of Python Machine
Learning Book and Machine Learning with PyTorch and Scikit-learn.
And overall, just an amazing force making AI and deep learning more accessible and teaching
people how to use AI and deep learning at scale.
Welcome.
Yeah, thank you for the kind invitation to be here.

(00:55):
Very exciting, Seth, to have me here on your podcast.
I think it's a relatively new podcast.
I'm especially honored to be one of the first people on this podcast.
So I hope we will have a lot of fun.
Hopefully a lot of stuff to talk about because we work both in machine learning and have
a lot of overlapping interests.
It's awesome to have you here to get things kicked off.

(01:19):
Do you want to give us a little bit of a career background, your journey?
How did you get to where you are today in the machine learning field?
Yeah, that's...
How far do you want me to go back?
So maybe starting with...
As far back as you want.
Yeah.
So I think how I basically started was during my undergrad, I got into statistics, R, Python

(01:44):
programming eventually.
And I've always been a tinkerer.
I must say I always liked the coding more than the math.
Nonetheless, I always was somewhere in between the two.
So I was never really, let's say, a software engineer, but I was also never really a mathematician,
if that makes sense.
So I was more like an applied researcher or scientist.

(02:07):
And yeah, my background is essentially during my PhD, I worked on computational biology
problems where it was usually centered around some prediction task.
Let's say virtual screening, where we were interested in finding small molecules that
inhibit some biological response, let's say related to diseases or other types of biological,

(02:28):
let's say, systems.
In the same way, we were also modeling protein structures and these types of things.
And yeah, we had to do a lot of coding, coming up with rules to classify things.
And there was this class when I was in grad school that is...
It is more than 10 years ago now, I think.
So it was called Statistical Pattern Recognition.
And I was, as my advisor back then, she recommended me taking that course because, well, it was

(02:53):
something where you can maybe automate this prediction type of problem that we had instead
of hand coding things, going through things.
And I must say I wrote a lot of brute force for loops in Python to optimize things using
very simple optimization libraries.
And that was kind of like eye opening.
So that course was mostly focused on Bayesian methods, let's say Bayes optimal classifiers

(03:18):
and then Naive Bayes to make that more feasible and these types of things.
But that kind of introduced me, I would say, to machine learning, like the concept of...
I mean, it was more statistical learning, but the concept of learning from data, essentially.
And then I took another class, data mining.
And that was also the time where Andrew Eng's class was launched on Coursera, the machine
learning class.

(03:39):
And I got totally hooked.
It was, I mean, in two ways, revolutionary.
At first, working with data, letting computers learn from data automatically, that was super
fascinating.
At the same time, Coursera as an online learning platform was also super cool as a student.
Like, wow, I can do this at home.
I mean, I like going to classes in person, but this was just also very revolutionary

(04:02):
where you had everything at home.
You could take the class whenever you wanted.
And it was just addicting to take that class.
Andrew Eng was such a good teacher.
I got really hooked.
And yeah, from there, eventually, I joined the statistics department at UW-Madison in
2018, where I focused on machine learning and deep learning research.
And then in 2022, I joined Lightning AI.

(04:24):
I liked my time as an assistant professor, but things change in machine learning where
the problems become more challenging and bigger.
If you are, let's say, a small team, it's more challenging to keep up with, let's say,
technology and resources.
And like I mentioned before, I'm not, let's say, the typical mathematician type of person.
So I like computing.

(04:44):
So I was looking for an opportunity where, let's say, I have a team of people and infrastructure
to work on different type of problems.
And also, like how to extend my educational, let's say, passion from just in classroom
teaching to also maybe developing an online course, which is what I'm, for example, among
other things doing right now.
So yeah, I joined Lightning AI, long story short.

(05:07):
And yeah, and yeah, back then, I've been really happily working there.
I like my time at UW-Madison as well.
But yeah, you can't do everything all at once, I guess.
So yeah.
Yeah, no, that's a great journey.
I too was captivated by Andrew Ang's Coursera class, as I think a lot of people in machine

(05:29):
learning.
So having the experience being a professor and now working in industry, how would you
compare working in academia compared to industry?
Yeah.
So I wouldn't say one is necessarily better than the other and just very different.
I think as an academia, what I especially like, I mean, was this academic thing in the

(05:57):
air where you have freedom to do whatever you want, and it's very exciting to be in
academia in that sense.
You get to design your own research projects.
But with that, there are also a lot of responsibilities.
So you have to write grants, you have to make sure you are becoming basically a manager,
you are managing your small lab, you have research students, you have to make sure that
your research students get paid, you have to then reapply for grants and these types

(06:21):
of things.
Which is, if things go well, it's very satisfying, but I must say as a person who likes doing
things, I would like to focus more on the research and let's say also the teaching,
rather than let's say writing grants and these types of things.
So in that sense, it's very different.
You have these responsibilities where you have to do a little bit here and a little

(06:43):
bit there.
So you're getting drawn into different directions, which I would say is not a bad thing.
It's just depending on your personality, whether you prefer that or just to focus on one thing
and doing one thing well.
I must say, I really like doing research, but one thing I didn't like was a bit, let's
say the reviewing system.

(07:03):
I think something everyone complains about peer reviewing.
There's a lot of work to do if you are a peer reviewer.
You get a lot of papers for conferences to review.
But then also as an author, it can sometimes be a little bit demotivating because reviewers
are, I would say, sometimes very critical and not always in a constructive way.
So sometimes we get this almost mean or hostile comments.

(07:27):
And this was something where I was like, I don't know if I want to do that for the rest
of my life.
Same with grant reviews, where sometimes you get these very, I don't know, apparent reason
because sometimes even someone misunderstood your report, you get very mean responses.
And I was like, maybe let me focus more on the good things, building things, teaching

(07:48):
and less on these types of things.
In industry, I mean, there are, of course, other trade-offs.
But I would say what changed for me is that I basically get to focus more on certain things
without having to worry about other things.
I'm not like a manager, basically.
So I like to build things and I like also to teach people.

(08:11):
So I'm glad that I found something where I can focus more on that.
That's great.
Speaking of building things and tinkering, do you remember one of your first projects
in machine learning and what attracted you initially?
Yeah, my first project in machine learning.

(08:31):
I think besides that, I think one was maybe a fun one.
That was back then when I took this data mining class that I mentioned that was a side project
because we had to come up with a class project for that class.
And by the way, that is also something I took inspiration from, from that class.
I also always emphasized in my courses to include little class projects.

(08:55):
It's always something that students found very exciting.
And back then, so there were two things I was working on.
As a student, I was working on fantasy sports predictions.
Back then, I was a big soccer fan.
And there was a website where it was called, I forgot the website, but it was a daily fantasy
sports where you basically assembled a team of players and they got scores based on how

(09:22):
well they performed in the Premier League games on the weekend.
And so it was basically a constraint optimization problem where you had certain budget and you
wanted to basically maximize, you wanted to predict how many score or what the best players
are based on the budget, basically.
So you couldn't, and there were also other constraints like the formation.

(09:42):
You couldn't have 10 strikers.
You could only have, I think, maximum three strikers.
So it was very interesting.
And based on that, I built machine learning classifiers with Cyclet Learn, very simple
ones to basically predict what the promising players were.
And that was very interesting as an exercise because that's how I taught myself Pandas,
the data array library or data frame library.

(10:08):
And I tried to automate as much as possible.
So I was also trying to do some simple NLP, going through news articles, basically predicting
the sentiment and extracting names from players who are injured and these types of things.
It was very challenging, but it was a very good exercise to learn data processing and
implementing simple things.
So that was maybe one of my first projects, not related to my PhD at all.

(10:31):
It was more like a side project.
And also I built something called, I think it was called Music Mood.
I called it Music Mood, which was for this class project where it was about predicting
the mood of music in terms of, is this a positive, negative song?
And originally it was the Happy Rock Song project where we had also the genre.

(10:54):
So it was genre and the mood.
And yeah, I turned this into an open source project.
I think I shared, I built a simple website with Flask where people could enter the movie,
sorry, the music lyrics and then get a predicted label, whether it's positive or negative.
And yeah, that was a nice little project because it was also almost like an end-to-end project

(11:15):
where we had to collect our own data.
So it was with two other classmates.
We collected our own data, cleaned the data, built the classifiers and then built that
website also on top of that.
So it was kind of like a pretty comprehensive project.
The machine learning was pretty simple with Psyched Learn.
I think we used the random forest classifier, but yeah, a lot of fun, a good exercise, I

(11:36):
think.
Yeah, that's awesome.
I think the best way to get involved is just to find something that you're interested in,
create a project, find some data.
You learn a lot of the skills doing it that way, solving problems that you're interested
in.
If I may ask you before we go to...

(11:57):
Sorry.
If I may ask, what was your first machine learning project, if you can remember, like
on the spot?
It's a really good question.
Well, one of the first ones that I worked on was a...
Basically it was a computer vision project where we wanted to use face recognition or

(12:19):
face detection actually to control a media player.
So if you looked away, the media player would stop.
If you looked at it, then the media player would play.
And then we started to get into different hand recognition.
So if you put your hand up like this, then it would stop.
Like doing it like that, it would raise the volume.

(12:42):
So it was really interesting.
I got to learn about all of the different algorithms that are used to do face detection.
And I learned so much about computer vision.
For me, the amazing part of that was just...
I've always had a really strong background in math.

(13:02):
So being able to take images and converting them into numbers was just kind of mind boggling.
And then you can do a lot of things with them.
But now that you mentioned that, where I think this type of system still lives is if you
use, for example, an iPhone.
And I think they encode or they hide the text messages until you look at them for privacy

(13:23):
reasons.
So I think they're only visible when you look at them.
It's kind of reminded me of your system, basically, where it's basically all the time detecting
whether your face is pointing towards the camera if you're looking.
And I think the next level is if it's you who's looking into the camera versus someone else,
basically.
Very interesting, yeah.
Right.

(13:43):
Yeah, it was cool.
It was also really interesting to see when it worked and when it didn't work.
We trained it on perfect conditions, right?
The lighting was perfect and things like that.
And then as soon as things got dimmer, it was much harder to detect faces, obviously.
It were different types of people.

(14:07):
So yeah, we ended up creating our own training data set and it ended up being a lot of fun.
I think that's the best way to get involved.
Yeah, just to find something that you're really interested in.
We didn't need to do the recognition of our fingers and hands for that project, but we

(14:30):
were just so interested in it that we decided to take it one step further.
I find that to be the most rewarding when you're doing it, not just for a class or for
a grade.
You actually are very interested in the project that you're working on.
Super cool.
Yeah.
Yeah.
You mentioned sports and fantasy sports.

(14:53):
That's something that I've been very interested in in the past.
And then music also is one of my interests too.
So it's awesome to hear that you worked on projects in those areas.
Speaking of those sorts of projects, are there any other open source projects that you've
been a contributor for?
I would say back then I was using a lot of Second Learn and I also contributed a lot

(15:17):
to Second Learn.
In the recent years, maybe not as much because I got busier with other things.
But yeah, back then we had the Ensemble voting classifier, the feature selection, the sequential
feature selection, and some other things where I got to contribute.
And that was a lot of fun.
Besides that, I built my own little hobby library called ML Extend, which is I think

(15:43):
used by a lot of people now because it has this frequent pattern on mining submodules
that a lot of people at companies use.
I always see on the discussion board a lot of companies, they have some proprietary data
set about some customer item sets data stuff where they have some questions.
And I think it's very widely used not for machine learning, although it has machine

(16:04):
learning capabilities, mostly for the frequent pattern mining.
But yeah, this was a library essentially because I built a lot of stuff that I needed for my
work like little, let's say functions here and there for normalizing things and also
some other classifiers and so forth, where I just thought, okay, instead of just hiding
them on my computer, I can make them a little bit more general and then I can share them

(16:26):
with the world and then others might find them useful basically.
And yeah, I just grew that library over the years just adding and adding to it.
And the other major one I would say was BioPandas where in computational biology, we work with
these protein structure files and also small molecule structure files.
And we were building back then a virtual screening library where we were making predictions on

(16:49):
millions of molecules.
And for that, you had to parse these molecules in a way that you could process them.
And there were a lot of libraries out there that did something like that.
They basically had some API where they read in these molecule files and then you access
the objects in Python, let's say with a custom API and so forth, which is fine.

(17:14):
But it is like, yeah, you have to learn that.
It's like a specific library and you have to learn how do you get the number of carbon
atoms?
How do you get the position, the coordinates of that atom?
And it is, I think, yeah, it is a bit steep in terms of the learning curve.
And I thought, okay, why making that so complicated?
If we just had a way we can load that protein structure file into a Pandas data frame, I

(17:38):
can just use everything that's already there in Pandas.
I don't have to reinvent the function to compute, let's say, the center of mass using
the coordinates.
I can use all the functions, standard deviations, mean everything that is in Pandas and to make
that more convenient.
So it's essentially a library where you can convert protein structure files into a Pandas

(17:59):
data frame and then you can do machine learning, you can do statistics, everything on top of
that without having to relearn, let's say, a custom API.
It's basically all in a Pandas data frame.
And other than that, I would say, yeah, these were my main libraries where I contributed
to or that I built basically from scratch back then.
But then over the years, I did a lot of open source stuff, but not necessarily libraries.

(18:24):
What I did more was education, I would say, like writing blog posts, explaining things,
PyTorch and second learn related tutorials or things like, hey, let's implement a principal
component analysis from scratch or let's implement a self-attention mechanism from scratch and
like writing the code, but not necessarily as a library because I think there are already

(18:46):
a lot of efficient implementations out there.
So it doesn't really make sense to reinvent the wheel, but it's more about like, let's
peel back a few layers, make a very simple implementation of that so that people can
read them because that's one thing.
Deep learning libraries are becoming more powerful if we look at PyTorch, for example,

(19:06):
but they are also becoming much, much, much harder to read.
And so if I would ask you to take a look at the convolution operation in PyTorch, I wouldn't
even know where to look in PyTorch to start with.
It's like, I mean, for good reason because they implemented it very efficiently and then
there's CUDA on top of that and stuff like that.
But as a user, if I want to customize or even understand things, it's very hard to look

(19:29):
at the code.
So in that case, I think there's value in peeling back the layers, making a simple implementation
for educational purposes to understand how things work.
So that's something I have also liked doing in recent years, which is why I maybe didn't
contribute so much to the core libraries.
I was more like focusing on the coding for education, essentially.

(19:52):
Right.
Yeah, no, that makes a lot of sense.
I appreciate a lot of the writing that you've done.
I really enjoy your blog.
I think you have a newsletter that I'm following now, too.
I'm looking forward to your new book that's coming out.
Q&AI?
What's the title?

(20:14):
Q&AI, so I can maybe say a few words about that.
So it is essentially, it started because what I do is when I read or learn things I have
for myself, I have flashcards.
Basically I write down questions and answers for myself.
So just, I mean, usually when you write them down, that process helps you learn these things.

(20:39):
And maybe you rarely have to go back to your flashcards because it's not about the memorization
necessary.
It's more about making the question.
But then also it kind of feels good when you feel like you have read a paper or a book
and then you made these questions for future use so you know you have them written down
somewhere.
And just in case you forget, they are there as flashcards in my software so I can look

(20:59):
them up.
And people on the internet, they ask me sometimes to share these flashcards.
And what I did is I thought, okay, why not?
But let me polish them a little bit up because when I write things for myself, they're usually
not that nice.
They are also, I mean, containing grammar errors or typos.
And I was like, hmm, let me polish them, make them a little bit more clear so that someone

(21:20):
else can read them.
And in that process, these notes became longer and longer.
So they became like fully-fledged answers.
Some of them like, I don't know, I just was in the mood of writing.
And then some of them were like four or five pages long.
And yeah, so one question would be, for example, what's the difference between an embedding,

(21:41):
a latent space, and things like that, essentially, or when are fully connected layers and convolution
layers equivalent?
And all types of questions.
What is the difference between self-attention and the traditional attention mechanism in
RNNs?
What are the multiple GPU training paradigms, like tensor parallelism, data parallelism,

(22:02):
and so forth?
And the answers, they tended to become longer and longer and longer.
And I was like, okay, instead of just, I mean, these are not flashcards anymore.
These are basically book chapters.
So I thought, okay, I could just basically turn that into a book.
And yeah, it's basically machine learning Q&A because it's like a Q&A, it's a question

(22:22):
and an answer.
But then also it was interesting that it's chat GPT now, so an AI doing the answers.
And as a little gimmick, I thought, because it just came out, why don't I include also
the answers by chat GPT?
So I have my own answer followed by the chat GPT answer and the short discussion.
And then readers can tell or can let's say, judge for themselves which answer is appropriate

(22:49):
or appropriate.
So one thing, of course, chat GPT cannot create figures and these types of things.
So it's kind of a little bit unfair, but I must say for my comparison, what was very
interesting is that when I wrote the answer, I had at least a very long answer.
Chat GPT was way, way shorter.
Times or yeah, I would say if you have 10 items, I would say three items are wrong.

(23:15):
Chat GPT answers contain sometimes factually incorrect things.
It's easy for a domain expert to weed them out.
However, what was nice about chat GPT is it sometimes came up with things I didn't think
about when I, for example, asked about what are some ways we can deal or can improve or
reduce, let's say, overfitting?
What are some techniques for reducing overfitting?

(23:36):
I had quite a long list, explained everything, asked chat GPT if it had some, let's say,
wrong answers, but some of them I didn't even think about.
And so that was nice.
It's essentially creating false positives, but it's also having these true positives,
let's say, that you missed.
So it's in a sense, actually pretty good for brainstorming, I would say.

(23:56):
It's actually a pretty good writing companion.
You still have to know a bit about the field because these errors, if I wouldn't know about,
let's say, machine learning, it could be dangerous because it would give me wrong information.
But if you look for inspiration, I do think it's a valuable tool, essentially.
Yeah, definitely.

(24:17):
I was about to say that I use chat GPT as a brainstorm assistant.
It can help you with drafts.
It can help you write outlines and things like that.
But yeah, there is that danger.
You're a machine learning expert reading about it and you're able to quickly pick out, say,

(24:41):
whatever, 20%, 30% of this information might not be factually correct.
And it does become dangerous when there's someone looking at it and looking at it as
an authority, seeing the output and thinking that it's probably going to be correct.
Yeah, so talking to someone in NLP and machine learning, we brought up chat GPT.

(25:05):
It took us a little bit, but I guess we could dive into it now.
Yeah, there's no way to avoid it nowadays.
No, can't avoid it.
I know you've been in the field and you've seen the progression.
It seems as if it's like this overnight success, going to a million users in a couple of days.

(25:30):
But obviously, this has been years in the making.
Where I want to start off with is how do you view the gap between the hype of something
like chat GPT and the generative models now and the reality of AI?
Yeah, so it's interesting.

(25:51):
I would say chat GPT did a good job in terms of closing the gap, because honestly, I must
say it works pretty well.
And it is impressive.
I don't know how far it scales in terms of would we...
I mean, we can always improve things, but I don't know what, let's say, the rate is

(26:14):
of how we can make it better, I guess, related to the hype.
I think there's a lot of...it's like the same with self-driving cars, I guess, where five
years ago they already had pretty impressive demos.
I haven't seen, to be honest...
I mean, the thing that they don't show you is what they have right now that is not released
yet.
But I do think it's usually the last few percent that are crucial.

(26:39):
I think with self-driving cars, we have been...it's just a number, I don't know for sure, but
I would say we have been there for like 95% now.
Like five years ago, it was almost, let's say, 95% there, almost, let's say, ready.
Now five years later, we are maybe there at 97% or 98%.
But can we get the two last remaining percent points to really nail it, basically to have

(27:04):
them on the roads reliably and so forth?
And that is hard to say with the Dutch language models as well.
I think we can reduce the factually incorrect information, make them more useful and so
forth.
I just don't know how much work it takes to get just a few more percent more better performance.
We will see with the next generation, let's say the GPT-4 models and so forth, if they

(27:29):
apply then also the reinforcement learning with human feedback in the loop on top of
it, if it's substantially better, like the same like from GPT-2 to GPT-3.
Maybe it's the same from 3 to 4 where we get, again, mind blown.
But yeah, that is one thing.
The other thing is I think people are chasing, like hype wise, they see ChetGPT and they

(27:50):
are chasing AGI, like artificial general intelligence.
Yeah, that is an interesting question.
I think no one knows how far we are from AGI.
With ChetGPT, I think there's a lot more hype around AGI.
It appears closer than before, of course, because we have these models.

(28:11):
There are people though who say, okay, this is the totally wrong approach.
We need something completely different if we want to get AGI.
No one knows what that approach looks like.
So it's really hard to say.
That's the thing.
If something hasn't been there before or it doesn't even exist, it's hard to predict
when it will exist.
It's really hard basically to make any reliable or any statement about that, I would say.

(28:36):
The thing though, what I always find interesting is do we need AGI?
More like a philosophical question.
I think AGI is useful as a motivation.
I think it motivates a lot of people to work on AI, to make that progress.
I think without AGI, we wouldn't have maybe things like, I don't know, like what was it

(28:59):
called, the AlphaGo where they basically beat the best player at Go.
Maybe chess, even back then chess.
How is that useful?
I would say maybe AlphaGo and chess engines are not useful, but I think it ultimately
led to AlphaFold, the first version for protein structure prediction, and then AlphaFold2,

(29:20):
which is now based on large language models, what uses large language models.
In that case, I think without large language models and without the desire maybe to develop
AGI, we wouldn't have all the, let's say, very useful things in the natural sciences.
My question is do we need AGI or do we really just need good models for special purposes?

(29:44):
For example, if I want to, I mean there was a paper the other day, accurate weather prediction
with deep learning, like more accurate than the best physics-based simulations that run
on supercomputers with a smaller, let's say more, not smaller, but with a more energy
efficient neural network and more accurate.
Maybe that is sufficient.
Maybe we don't need an AGI that can also predict the weather.

(30:07):
Maybe it's better to just focus on improving that weather prediction engine and separately
improving the protein structure prediction model AlphaFold.
Maybe we don't need to chase something that can do all the things at once.
However, I do think AGI is useful as a motivator to find better algorithms.
So in terms of hype, I think I'm personally, I don't see the purpose of AGI.

(30:33):
Maybe I'm too short-sighted here.
I would say what would we do with AGI besides what people say about replacing humans?
I don't know how that really benefits compared to special purpose applications of machine
learning.
Yeah.

(30:53):
Right.
I mean, you brought up so many interesting points.
I don't even know where to go next.
Let's talk about the use cases for generative models.
So you were mentioning basically, which I love this point, where we're able to get these
models up to a certain level of performance.

(31:15):
Say you can get a model to 90% or 95%, but it's that last 5% that's so hard.
The closer you're getting to that 100%, it's even harder.
It makes me think about when you're training a machine learning model, any model, say even
a text classifier and you have your F1 score at, say, 0.85, how much work can you really

(31:42):
do to get it that much higher?
But I wanted to take a step back and I wanted to talk about basically generative models.
I think there's a lower threshold.
So error can be okay depending on your use case.
So if you're using it for something like just to make a draft, it doesn't need to be 100%

(32:04):
correct because if you're making marketing content, let's say, that could be the product
I'm seeing now Wix is offering a complete generative, using generative models to create
your whole website.
That's amazing.
That solves the cold start problem.
It gives you so many options you can build off of it.

(32:28):
But then there's the other part.
There's predictive models where you're, say, you're categorizing something and you need
it to be very close to 100% correct.
Depending on your use case.
And then you bring up AGI, artificial general intelligence.

(32:52):
I think everybody thinks about it a little bit differently.
Everybody has a different sense of it.
Everyone has a different definition.
Are we trying to replicate humans?
Are we trying to replicate human intelligence?
If that's the case, then I personally don't think that large language models is the way
to go.

(33:13):
There are certain things that I think about like from GPT-2 to GPT-3, one thing that's
very interesting are when you, by orders of magnitude, add all these parameters.
There are these emergent capabilities, which is really interesting.
I think in one of them, you're reading so much of the English language, so you're going

(33:34):
to learn how to make grammatically correct sentences, and then you're going to learn
different relationships between things.
All of that stuff is amazing, but there's more to it, I think, than that.
Just being able to predict the next word.
The reinforcement and human in the loop piece of it is definitely going to, as you were

(33:57):
saying, minimize the amount of factually incorrect responses.
What do you think?
Do you think that our goal should be to try to replicate human intelligence, or do you
think we should be specializing in certain systems or certain use cases?

(34:19):
I personally think, for the sake of developing more efficient learning algorithms or alternative
learning algorithms, I do think it makes sense to get inspired by, let's say, replicating
human intelligence.
But I would say if it doesn't work, that's fine too.

(34:39):
The classic example is really airplanes or submarines, where airplanes are inspired by
birds.
It's like, hey, birds can fly, they have wings, can we build something similar?
Turns out the airplane is very different, it doesn't flap the wings, but it gets the
job done.
In the case, we don't need to mimic how birds fly.
In the same sense, we probably don't have to mimic how, let's say, humans learn and

(35:04):
think, although I do think it would help understanding that because there might be more inspiration
that we can use for these models.
One thing is also related to that, ensemble methods are, so building an ensemble of different
methods is usually something to improve, how you can, let's say, make more robust and accurate
predictions.

(35:26):
Ensemble methods usually work best if you have an ensemble of different methods, if
there's no correlation in terms of how they work, so they are not redundant basically.
That is also one argument why it makes sense to maybe approach the problem from different
angles to produce totally different systems that we can then combine.
I think that's also interesting from the perspective of how people try to implement large language

(35:52):
models as part of a search engine because I feel like, yeah, we don't, so it's kind
of related to artificial intelligence, general intelligence where maybe we don't need one
system that solves it all because, for example, with chat GPT, it can do math, it's some of
the emergent capabilities that you mentioned, but it's not useful for simple math, like
if you say multiply 13 by 117 or something like that, it's maybe not useful to use chat

(36:20):
GPT for that.
We have a calculator that can do that accurately, that doesn't need to be trained, there are
simple rules.
Yeah, so in that case, what we need is more like identification of what we need to get
the job done.
Maybe having, like Siri, what Siri is doing is it's parsing the language.
I mean, besides the fact that it doesn't work well, but let's say it would work better in

(36:42):
parsing your input.
What it does, it's rerouting your input to the appropriate application on your phone.
I think if you set a timer, it will use the timer app on your phone or if you do a calculation,
it will use the calculator app.
So it's not trying to do everything itself, it's trying to delegate.
And I think with AI, I think that's the same thing.

(37:03):
If we ask it to maybe compose text, the AI itself might be the best way to do that.
If we want factual information, maybe sometimes just extracting information from an existing
Wikipedia page might be more efficient than having itself answering that.
So I'm not saying it's not necessary to use an LLM, but the LLM here would be more efficient

(37:28):
at going to that website and summarizing the text rather than rewriting the text, basically,
if you are looking for an answer.
I think that is one thing we could focus on, on how to basically delegate more efficiently
and building an AI that, let's say, delegates rather than tries to solve everything, in

(37:48):
my opinion.
And also to your point, the AI doesn't even have to be correct all the time when creating
text as long as we use it as a template, basically, not as the end product.
So I think Chachipiti, the main use for me, how I use it is to help me write texts, but
I'm filling in the blanks.

(38:09):
I'm not like, if I want to text about something, I usually write the text myself before, then
I say, hey, Chachipiti, rewrite this, and I see if I like it more or less.
I take certain sentences, and then I even tweak them afterwards.
I'm not really literally copy and pasting anything or in the same way with information.
So there was another LLM, I think it was called Galaxy something, or Galactica, I think, Galactica,

(38:34):
where they had an AI or LLM that was writing research papers.
I think there was this misconception that you let it write the whole research paper.
I see it more as something that writes the template for a research paper.
It's more like, I would say, a sophisticated template builder.
I think it would have been better if it wouldn't fill in numbers or any factual information.

(38:55):
It would leave blanks so that it's more clear to a human, like, hey, you have to fill in
the numbers and the details, and they're not provided by the machine learning AI system,
basically.
So I think having these models, it's essentially about using them responsibly, essentially.
Yeah.
You bring up so many interesting points.

(39:17):
To talk about the different tasks that you want to complete, I see a future where, yeah,
depending on what prompt, basically, you are asking, you could use something that's rule
based or it could pull up the correct tool.
The sibling or predecessor of ChatGPT, InstructGPT, sort of was going into that, how you can take

(39:45):
an initial prompt and then have some follow ups.
That's what's really nice about ChatGPT as well, that you can sort of take the output
and you can say, make it longer, make it shorter.
I saw another recent paper, Toolformer, basically showing some examples of how to use tools,

(40:06):
you can basically combine the power of large language models and using third party tools.
I think it's this ability to sort of find that hybrid approach.
When a rule is the right approach and when should you be using more advanced systems,
which is kind of like always a question.

(40:27):
Can you make it simpler?
It's like this saying, if you have a hammer, everything looks like a nail.
I think this is right now a little bit true with ChatGPT because we just have fun with
it.
Let me see if it can do this and that, but it doesn't mean we should be using it for
everything.

(40:48):
The question is basically the next level would be how to basically when to use AI and when
not to use AI basically.
Right now we are using AI for a lot of things because it's exciting and we want to see how
far we can push it until it breaks or doesn't work.
Sometimes we have nonsensical applications of AI because of that, like training a calculator,

(41:08):
a new network that can do calculation that doesn't really make sense.
But there are examples where I think reinforcement learning found a more efficient matrix multiplication
algorithm, more like the algorithm itself finding that.
That makes sense, something you as human wouldn't think about, but we wouldn't let it do the
matrix multiplication itself because it's not deterministic in a sense.

(41:30):
You don't know if it's going to be correct or not depending on your inputs.
There are definite rules that we can use, so why making it, let's say approximate when
we can have it accurate?
Yeah.
I think that that's something in the machine learning field that's really such an interesting
area that deserves more research.

(41:55):
Machine learning models are going to make predictions.
There are systems where it might not.
It doesn't have high enough confidence to make a prediction, but when it makes a prediction,
usually it's like it's usually binary.
It's usually like it made a prediction.
This is what it thinks the answer is, but it doesn't give you that confidence level.
You know how when you're talking with a human, you can kind of tell how confident someone

(42:18):
is when they're saying something, when they're saying it, or they might validate it.
They might say, oh, I think I heard about this.
That's lost when you're talking about chat GPT.
On top of that, one thing is also there's a whole branch of research on that neural
networks are typically overconfident on out of distribution data.

(42:41):
What happens is if you have data that is slightly different from your training data or let's
say out of the distribution, the network will, if you program it to give a confidence score
as part of the output, this score for the data where it's especially wrong is usually
overconfident.
It's over estimating its confidence, which makes it even more dangerous.

(43:05):
Even the confidence score, let's say, it's misleading if it's a tricky problem, which
is kind of ironic or paradoxical.
It's kind of an interesting research problem.
There are methods that try to address that, but it's not out of the box.
It's a lot of extra effort.
It's an ongoing research field.
Like you said, even if we had the confidence scores, it would be hard to use them or trust

(43:31):
them.
But also you bring up a good point.
So chat GPT doesn't give us any confidence about anything, but then there's also, I mean,
an even better example, I think where it's more clear is this classifier they developed
to classify whether text is written by an AI or a human where they have different labels

(43:53):
like likely or not likely generated by an AI or something like that.
And yeah, it's just a label.
So you trust it or not.
And for example, when I used Shakespeare Macbeth texts in there, it predicted it was likely
generated by AI.
It's just a label.
And well, what do you do with that?
It's like totally wrong, but because Shakespeare was around way before AI was a thing.

(44:20):
So there's another approach is called GPT zero where the researcher who developed that
just gives you a score.
It's only the perplexity score.
And then you as a human, you have to compare it and think about it, which is maybe a better
approach than just giving a label.
But yeah, you bring up a good point.
We just take it for granted or we just take a score and yeah, we use it.

(44:42):
And it's maybe out of convenience because that's the simplest user interface.
But with things like machine learning, yeah, it is depending on application, tricky.
Yeah.
And I think that's definitely a problem that machine learning practitioners should try
to address, but it's extremely difficult, right?

(45:03):
Especially as humans, we're trying to interpret these very complex machine learning, deep
learning models and something that's out of distribution and it's trying to make a prediction
on it and you get a prediction and the prediction is high confidence.
And it's like, that doesn't even, it's like, why?

(45:24):
It doesn't even make sense.
It's a little scary because sometimes like, so take an active learning system where you're
going to label samples that have low confidence.
And then like those high confidence ones are just going to slip through.
Yeah, in that case, it would be achieving totally the opposite of what you want because

(45:45):
yeah, it will give you the high confidence for the ones that you actually need to label
because they are so different.
Yeah, it's essentially antagonistic or adversarial.
Yeah.
Yeah.
I mean, it makes you think about just how many moving parts there are with machine learning

(46:08):
and just trying to understand, it's so important to understand every aspect of it.
It's not just the algorithm.
It's not just the newest language model.
Sometimes it's like common sense things, understanding the data, understanding the output, why are
you making this?
How is it going to be used?

(46:29):
Those sorts of things.
And I want to say we are complaining here.
Oh yes.
I'm sorry, I'm just wanted to say we are complaining about this here that machine learning
systems make these mistakes and we don't get the scores and we don't interpret them.
It's something to think about, I wanted to say, but it is challenging.
It is not that I would say people who are working on this, they are trying their best.

(46:55):
They put a lot of effort into improving that and getting the best out of it as possible.
It is just such a hard problem that I think it needs more time and work.
We are trying to do the best we can or most researchers are doing the best they can when
they release the products.
It's just such a hard problem.
So I would say there's no one to blame about that.

(47:16):
It's just how hard this problem is.
Yeah, of course.
I didn't mean to say it in that sense.
There's an interesting trend that I've found actually with machine learning practitioners
after they work in the field for a certain amount of time, many then shift their focus

(47:36):
into AI ethics, which is exactly trying to address these types of problems, which I find
that to be very, very interesting.
The more I work in this field, you have to think about those things.
Yeah, that's actually a good point because I think it makes a lot of sense to start with

(47:59):
machine learning and then go into AI ethics because then you basically get exposed to
all the problems that exist.
But you also notice that it's maybe not so trivial because I think it's easy to say,
well, this is not good and this is a problem.
Fixing it is the more difficult problem really.
And I think experiencing the maybe frustration around machine learning, that's a good way

(48:22):
to also be prepared for what's possible and whatnot and what could we do.
And I think it is frustrating sometimes to work with machine learning systems because
we train these classifiers and then we see exactly, okay, this gets this input wrong,
but we don't know why this particular input.
We can maybe include more training examples of this particular input.

(48:44):
We improve the system.
It doesn't get this one wrong anymore, but then it gets something else wrong instead.
And it's really like you're trying to fix one thing, the other thing breaks.
And it is very, very challenging.
Yeah.
Yeah.
Yeah, it's very, very tricky problems.
And it's nice to have the chance to discuss this with somebody that's dealt with these

(49:10):
problems.
And yeah, it makes sense after you are applying machine learning and understanding maybe some
of the pitfalls to then transition into some more of like the AI ethics sorts of questions.
To change things up, not really though, but in the spirit of learning from machine learning,

(49:32):
let's zoom back to someone who's just starting out in the field.
What advice would you give to someone that's just starting out in machine learning?
I would say, yeah, that's tricky.
I don't want to give anyone wrong advice, but I would say machine learning is a big
field.

(49:52):
I think even like what we just covered, there are so many moving parts that are involved.
And I mean, even zooming back, we have predictions, we have generative models, we have computer
vision, we have natural language processing and all kinds of different fields.
And then for each field, we have different approaches for generative modeling.
We have, let's say just for images, we have autoencoders, diffusion models, generative

(50:16):
adversarial networks and so forth.
And they're all kind of like almost fundamentally different in terms of how they work.
And it can be very, very, very overwhelming, I think, when you start out.
So I would say, honestly, I would start with the book or a course and just work through
that with, I would say almost with a blindness on not getting distracted by other, let's

(50:41):
say, resources at that point, just working through that.
Because I think that happens to me all the time, I get distracted by something else,
I look it up and then it's like a rabbit hole.
And then you feel like, wow, there's so much to learn.
And then you get frustrated and overwhelmed because it's like, oh, the day only has
24 hours, I can't possibly ever learn it all.

(51:01):
So I think really doing one thing at a time, like step by step, it's a marathon, not a
sprint, I would say.
So I think, I would say, yeah, take it slowly, enjoy it, make sure you have fun, try not
to do all at once.
And maybe also finding a balance between trying things out or maybe implementing some ideas

(51:25):
in a project after reading about them.
And then going back to reading about more things, trying them out.
So having a balance between soaking up knowledge also and trying out things you learned about.
Yeah, I think that's really good advice.
It's interesting, when someone asks me, oh, how can I learn about machine learning, there's

(51:49):
no shortage of resources out there, right?
There's no shortage of new material coming out, but it's sort of like hacking through
the weeds and staying on a path to get yourself to a point where you can understand a certain
level of the basics.
You don't need to know every paper that's coming out daily, right?

(52:13):
It's not necessary.
It's much more important to understand the basics.
So you're setting yourself up for a future of success basically.
In a similar vein, if you have anything, what's one piece of advice that you've received that
has helped you along your machine learning journey?

(52:34):
That's a good question.
Top of my head, I wouldn't have a good, let's say, advice someone, let's say, gave particular
to me.
But I would say going back to the Andrew Ng class that we talked about in the beginning,
I think something Andrew Ng always said in his classes was, if you don't understand this

(52:55):
part, don't worry about it.
And I think that's a good saying.
Maybe if we don't understand a certain thing, maybe let's not worry about it just yet.
Some things are more important than others.
Also when we specialize, I think letting go of some things to make room for other things.
For me, I worked on some more mathematical papers where we proved theorems and so forth,

(53:23):
like the ordinary regression papers we worked on, which was fun.
But I, for example, I know that I'm not that good at proving theorems because I'm more
like a person who enjoys coding.
And for proving theorems, you have to sometimes sit there for days or weeks and stare at it
until you get some inspiration.

(53:43):
And this is not for me.
I think that's okay.
I would say not getting frustrated, I guess, saying, okay, this is not for me, recognizing
that, focusing on my other strengths.
And that would be something like, don't worry about it.
Oh, sorry, I almost knocked off this thing here.
Let's say, what Andrew Ng said, not worry about it.

(54:04):
That is like something I think that kind of relieved me.
It's a small thing that's really nice because when Andrew Ng was going through, say, a proof
for something or showing all the mathematics behind gradient descent or changing the weights

(54:25):
or back propagation and things like that, you don't need to know every single detail
right then and there.
You might not ever really need to know every detail, but understanding the...
Getting an intuition.
And that's what Andrew Ng always used to say, gaining that intuition and getting that gut
feeling and things like that.
That's what's going to help you along the way.

(54:48):
Yeah, that is a good point.
Other than Andrew Ng...
Oh, yes.
So I wanted to say exactly what you said.
I wanted to say what you brought up a very good point is, yeah, you should, of course,
make sure you understand the bigger picture and intuition in a certain way.
But the details are sometimes implementation details, I would say.

(55:08):
But like you said, yeah, recognizing when it's time to focus on the big picture and
when it's time to dive in and really making sure you don't have to dive into everything
basically.
Also, very good exercises to implement things from scratch, like reading about, let's say,
decision trees and then implementing decision trees from scratch.

(55:30):
For example, that's one homework I usually give where students have to code a cart decision
tree or a C4.5 tree from scratch, which is a good learning exercise.
But I wouldn't say do that for every algorithm, because if you do that, yeah, you would get
stuck.
You would never really move forward because it takes a lot of time.
It takes weeks to do that.
And life is also, in a way, short if you spend your whole time re-implementing old algorithms.

(55:57):
Yeah, that's also not a good way of spending time, I think.
It's like being selective, I think, also focusing on the big picture, sometimes diving in, but
not diving into the details of everything.
Right.
Yeah, one of my professors, during my masters, he had us by hand, step by step going through

(56:18):
back propagation for neural networks.
That sounds fun.
You're beating your head against the wall, and it's very frustrating.
It's not like you ever need to do that.
But there's something about even just doing it once that you do just kind of gain a better
sense of it.

(56:39):
Yeah, at first, the details aren't that important.
Future, when you're in industry and you're trying to get a model into production, I mean,
sometimes things are so abstracted that you don't necessarily need to, which could be
a good thing or a bad thing, right?
Because it's fine if there's no problems, but it quickly becomes a bad thing when you

(57:02):
start to run into some issues and you don't really understand what's going on with your
model.
But yeah, I mean, at first, it's much more important to get the, in broad strokes, just
sort of get a handle of what's going on, building up that foundation so you can understand everything.

(57:24):
You can't learn recurrent neural networks without understanding what a decision tree
is, right?
It's just like, it's not, you just, you can't, there's certain things, it just wouldn't,
it wouldn't make sense.
Like, you should start with logistic regression.
You know, just do it, right?
Good advice.
I would say always start with, even if you know more sophisticated techniques, if we

(57:49):
go back to what we talked about with large language models, even if it makes more sense,
even for classification problem to fine tune a large language model for that, I would start,
like you said, with a simple logistic regression classifier, maybe back of words model to just
get a baseline, like something where you are confident it's very simple and it works.
Let's say using second learn before trying the more complicated things.

(58:10):
It's not only because we don't want to use the complicated things because the simple
ones are efficient.
It's more about also even checking our solutions.
Like if our fine tune model, our, let's say BERT LLM performs worse than the logistic
regression classifier, maybe we have a bug in our code.
Maybe we didn't process the input correctly, tokenize it correctly.
It's usually always a good idea to, I think, to really start simple and then increasingly

(58:35):
get complicated or improve, let's say, improve by adding things instead of starting complicated
and then trying to debug the complicated solution to find out where the error is essentially.
Right.
Even if worst case scenario, if you use a very simple model, you just got it.
You just have a baseline, right?
Just like a sanity baseline to work off of.

(59:00):
So other than Andrew Aang, who we both obviously admire, who are some other people in the machine
learning field that you gain inspiration from or that have influenced you?
Good question.
I would say, because I also recently enjoyed some of the educational material by Andrew
Capati, what he reminds me always is that it's fun to code things and it's very contagious.

(59:28):
If you see someone having fun coding things up.
So that's something I did very early on in my blog where I implemented a principal component
analysis from scratch or linear discriminant analysis, other things.
I always used to implement things from scratch, but over the years I have become more, I would
say, conceptual because things got more complicated.

(59:49):
I was focusing more, let's say, on implementing an end-to-end system and then not, let's say,
doing the step-by-step coding.
And his recent stuff reminded me of how much fun it actually is to do things from scratch.
So that's one inspiration, I would say.
Or other people, I would say maybe Paige Bailey because she always has so much fun on, let's

(01:00:11):
say, social media.
It's also to remind you, I don't know, whatever you do, have fun.
Enjoy, share the joy.
That is, I think, also important to keep in mind that things are sometimes complicated
and I don't know, work can be intense.
We want to get things done, but don't forget also maybe just to stop and enjoy sometimes,

(01:00:32):
like to share the successes, have spread some fun stuff.
Yeah, definitely.
Speaking of starting things from scratch, well, I think of it, I was able to read your
recent blog, Understanding and Coding Self-Attention Mechanisms of Large Language Models from Scratch.

(01:00:54):
And yeah, just, I mean, yeah, we were talking about some of that going into it and understanding
the similarities between cross-attention and self-attention.
It's really interesting to go down to the more basic principles and to see things from
the code.
And I, how do I say it?

(01:01:15):
It's like in production and like when you're deploying models, you don't want to reinvent
the wheel, right?
Yeah, exactly.
Right.
You want battle-tested, you want battle-tested things.
But when you're trying to understand something conceptually, it's really nice to understand
it from scratch.
Excellent point.
And then to your second point.

(01:01:37):
Yeah.
Yeah, we really want to emphasize that, like, I think for real-world applications, don't
try to reinvent the wheel.
I think, yeah, that is a lot of work and also risky.
But it is, like you said, it is good for learning.
It's especially good for learning.
Actually, one thing I like is also, so I build sometimes things both ways.

(01:01:59):
So when I want to implement something, I do the most naive implementation ever, like where
I just use a very plain, simple Python code, write some unit tests to know because I want
this and this output.
And then I try to make it more efficient.
So like adding more efficiency to that to see if I can improve things.
That's what I do usually for things that don't exist yet.
But for things that exist, you can actually use what is already out there and then kind

(01:02:21):
of like use that as a unit test almost and then try to make your implementation similar
to that.
But yeah, like you said, don't maybe use from scratch implementations if there's an existing
solution.
Only it's for learning purposes, essentially.
Yeah.
Definitely.
And then, yeah, to your second point from before, it's important to have fun, right?

(01:02:46):
To realize that learning can be, it should be enjoyable and expanding your knowledge
is so important.
So to conclude, learning from machine learning, the last real meaty question, what has a career
in machine learning taught you about life?

(01:03:10):
I would say it's like, yeah, being patient because there's so much out there.
So it's like, can't learn it all at once.
Take it one step at a time.
But like what we just talked about, making sure we enjoy what we're doing.
But then also, what I think what machine learning taught me, especially in the last couple of

(01:03:30):
years is things are changing quickly.
So in that sense, it's kind of like counter to what we just said, like taking things slowly.
But it's also be open to change, be open to new experiences.
It could be anything, like from job related things to location wise, where we live, what

(01:03:51):
our hobbies are.
And that is something related to machine learning in the sense that there are so many new methods
coming out there.
Things changed completely.
We were using GANs two years ago, now we're using diffusion models.
It's like being open to things and open to change.
And yeah, I don't know, like trying it out, making sure maybe we don't like it, we don't
have to use it.

(01:04:11):
It's the same with life, like trying new experiences, I think.
That's great.
Yeah, I think being patient when you need to be patient, but also just sort of accepting
that we are living in a very fast moving world where things are changing.
So being open to change.
And like machine learning, everything gets better with time, with more training epochs,

(01:04:34):
essentially.
So maybe hopefully when we like with life experiences and stuff like that, things get better usually,
I hope.
So yeah.
Yeah.
Sebastian, it's been such a pleasure talking to you.
If there are some listeners out there who want to learn more about your work, where
could they go to reach out or to find out more about you?

(01:04:58):
I think my website would be the best place because there I have links to everything else.
So yeah, my website is essentially my first name lastname.com, sebastianrushka.com.
It's maybe a little bit difficult to spell in the sense of it's easier if you maybe see
a link.

(01:05:18):
So it's my first name lastname.com.
I'll have it in the show notes.
Yeah, exactly.
Yeah.
So and because it's a very long name.
Otherwise, I'm very active on social media.
Most of them basically like Twitter, Mastodon, LinkedIn.
So on most platforms, I'm r-a-s-b-t.
So that is actually back, it's weird because it's back then on Twitter, there was a character

(01:05:43):
limit where the Twitter handle was cutting into that character limit.
So I try to keep it as short as possible.
Five letters, it's basically the first two letters of my last name, r-a.
And then s-b-t as in Sebastian, so r-a-s-b-t.
So I'm that on GitHub, Twitter, and some other platforms.
So yeah, if you want to reach out on social media, I'm pretty much everywhere, maybe too

(01:06:08):
much.
But I must say that is also one thing over the years.
I've been on social media over like maybe 10 years.
And if you use it responsibly, you can learn a lot of things.
We are always having good discussions where we discussed recent papers.
There's always someone who knows more than you do.
So it's always nice to have always these comments where someone points something out or follow

(01:06:30):
up material or, hey, have you thought about this and that?
And yeah, I think it's basically if you use it responsibly, it can be a very effective
way for learning too.
Yeah, for sure.
So, Sebastian, yeah, it has been so nice chatting with you, you're like a fountain of knowledge.
I feel like there's so much that we could chat about more.

(01:06:53):
We could do a whole other episode, maybe sometime in the future.
But thank you so much for your time.
I really appreciate it.
Yeah, that was fun.
Thanks for having me on your podcast.
It was like a really fun hour to spend today.
So yeah, thanks for inviting me.
I had a lot of fun.
And yeah, anytime again.

(01:07:23):
Thank you for tuning in to this episode of Learning from Machine Learning.
I hope you enjoyed the insights and knowledge shared by Sebastian Raschke, a renowned author
and machine learning expert.
Don't forget to check out the show notes for links to Sebastian's work and resources discussed
in this episode.
If you enjoyed this episode, please leave a review and share with your friends and colleagues.

(01:07:45):
Until next time, keep on learning.

All Episodes

Sebastian Raschka: Learning ML, Responsible AI, AGI | Learning from Machine Learning #4

Episode Transcript

Popular Podcasts

Bookmarked by Reese's Book Club

On Purpose with Jay Shetty

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Sebastian Raschka: Learning ML, Responsible AI, AGI | Learning from Machine Learning #4