Exploring Literate Programming For Python Projects With nbdev

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Unknown (00:13):
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great.
When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.
With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform,

(00:34):
including simple pricing, node balancers, 40 gigabit networking,
dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode,
that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

(00:56):
Your host as usual is Tobias Macy. And today, I'm interviewing Jeremy Howard and Hamil Hussain about nbdev, a library for turning Jupyter Notebooks into Python libraries. So, Jeremy, can you start by introducing yourself?
Sure. I'm Jeremy Howard. I'm a founding researcher at fast dotai.
And Hamil, how about you?
I'm Hamil Hussain. I'm a staff machine learning engineer at GitHub. I spent a lot of my time working on Fast AI with Jeremy.

(01:23):
Going back to you, Jeremy, do you remember how you first got introduced to Python?
I was a Perl programmer
largely. I started a started a company called Fastmail, which is a email provider, and I used Perl for that.
And I remember when Python started getting popular,
I was kinda not particularly interested in it because I thought that Pearl is really great. But as I did more and more stuff with machine learning,

(01:47):
Python became a bigger and bigger part of my life, and now it's what I spend most of my time doing. And, Hamil, do you remember how you first got introduced to Python?
I just started using Python. I was kind of a data analyst at some point, and I wanted to automate some things. And at that time, I started actually with
R and,
you know, I wanted

(02:07):
another programming language, and Python seemed like a good 1 at that time. And so I can just kinda naturally drift into that.
R has got a lot of cool libraries, but I never liked it as a language. I do much prefer
working in Python as a language. But there's a lot of libraries from RMS.
Yeah. The R ecosystem is definitely
pretty attractive, and there's definitely a lot of stuff from there that has inspired things in Python because people working in Python wanted to be able to have all the nice tools that the R folks did. But I'll agree that the language

(02:36):
coming from somebody who's worked primarily in Python is definitely a bit foreign.
PR libraries just tend to be
more elegantly put together, though, I find. You know, Python libraries have a tendency
to get the job done, but they tend to have more clunky APIs for us. The Arc ecosystem,
you know, that community seems to really care about the developer experience a lot, which I really like.

(02:59):
There's 1 thing I really miss about R, which is relevant to this conversation,
is the development environment that I used when I was working with R Studio and R Markdown,
you know, where you could write prose and text in the same sort of
context. I I thought that was really nice, and I I sort of missed that when I went to Python and, you know, until I discovered nbdev, which we'll talk about more. But

(03:22):
So going into Nbdev,
can you give a bit of an overview about what the project is and the overall goals of it?
Nbdev
comes from,
you know, my kind of decades of enthusiasm
for literate programming and exploratory programming
and
never quite finding the right tool for the job. The closest I got was

(03:44):
Mathematica,
which
I always really enjoyed working in, but I always found
nearly impossible to deploy
and very difficult to kind of
get good performance and
but the actual idea of being able to mix
any kind of outputs you want, whether they'd be animations
or hypertext or whatever,

(04:05):
along with code
and kind of hierarchically structured documents
and
have your kind of coding and documentation all in 1 place. I always found that just quite brilliant.
So, obviously, when Jupyter Notebooks came along, I got very excited about
that, that I had the same problem, which is, you know, how do you deploy these things? Like, it's a great kind of scientific journal

(04:30):
kind of environment,
but
I was trying to build
artifacts that other people could use easily.
So nbdev is something which brings the kind of worlds of
a software development of Python libraries and notebooks together so that you can use notebooks. A single notebook will create your tests,
your documentation,

(04:50):
and your actual Python module
all in 1 place. And I just love working with that. I find it dramatically more enjoyable and more productive.
In terms of some of the storyline behind it and how you got it off the ground and how it got to where it is today, I'm wondering if there are any interesting anecdotes that you can share.

(05:11):
Well, it all came out of really the development of the fast AI library, which is 1 of the most popular
libraries for doing deep learning. And,
originally,
the Fast AI Library kind of came out of
creating courses. So Fast AI has, I think, pretty much the world's most popular courses for learning deep learning,

(05:32):
and they're all done in Jupyter Notebooks. It's such a great way to teach and such a great way to learn.
And
we were also doing a lot of research
in Jupyter Notebooks, finding better ways to train better models.
And often, the course would include a whole lot of stuff about, like, oh, here's some research we just did. Let's learn about the research together and see how it's done and understand the kind

(05:55):
of motivations behind it and so forth.
So
we would very often be creating
new algorithms or implementing algorithms that were in papers but didn't have code.
And so we really needed to find ways to
make that code available easily to everybody. So that was really where it started,
was
taking fast.ai

(06:17):
research
and educational
materials and turning them into libraries.
But, really, the long term goal for fast dotai is to make it so that anybody can do deep learning
without needing to do much
education.
So
the fast AI library
has kind of become a focal point of

(06:39):
that work.
And so
it's just been a very natural progression
of using
notebooks to do research and to build educational materials and to build libraries.
And it's been really wonderful to see
how many other people have
found that same approach to development works for them, and MB Dev is now getting

(07:00):
really popular, which is great to see.
And in terms of the I know that there are a lot of different types of people,
and
there are a number of different sort of verticals and industries where people are using notebooks.
I'm wondering if you talk about who the sort of target audience is for the nbdiff project and how that influences

(07:20):
the features and design of MB Dev project.
I think MB Dev works well
for
really any kind of project that you want to do. I mean, it's not, you know, limited to data science projects at all. In fact, we've been using MBdev for a number of different types of projects,
including
various utilities, DevOps tools,

(07:43):
APIs,
a API clients,
and lots of things. And so it's it's really a general software engineering tool.
I think I know that some people
ask, you know, when
might you not want to use MBdev or
when might that be challenging? You know, I've tried to introduce MBdev to a lot of people into a lot of projects, and it is a new way of developing software.

(08:07):
And so you have
to kind of look at what your colleagues are using, and what your colleagues are willing to use. And you have to kind of assess like whether or not it's worth it
to transition a project to MBDEV
or whether your colleagues will be willing to give it a go and write software in MBDEV. And you kinda have an have to have an open mind to try this new type of software environment.

(08:31):
So that's the main consideration I see deciding,
you know, when to use MBdev for a project. Hey. 1 nice thing about MBdev is you don't have to set out deciding to use it.
You know, you can just start hacking something together in a notebook, you know, which is often what I wanna do just to explore a new API or to explore an idea or explore an algorithm.

(08:54):
Maybe you don't even have a sense yet that that exploration is gonna lead anywhere useful.
And then I find that once I get to a point where I think, oh, this is actually turning out quite nicely,
then it's very
easy to then kind of n b devify that notebook.
You just add, you know, like,
1 comment to each cell that you actually wanna export.

(09:14):
And the nice thing is for anybody who kinda cares about code productivity and code quality
or project quality, you you end up with a very quickly in a very nice place because once you decide, okay,
I do wanna make this something that other people can use.
If you've used notebooks, then you nvdevify it. You now have a really nice high quality documentation site for free. You have

(09:38):
parallel parallelized tests for free. You have a PIP and a condor installer for free. You have a read me generated for you for free. So, like, all those kind of things that make a project
complete,
you know, and helpful to developers
and reliable and maintainable. They're all done for you

(09:58):
automatically,
which otherwise I found before I was using nbdev.
It just seemed like a huge learning curve and lots of things to maintain, and I'd have, like,
10 different places that I had the same information
and dozens of different tools trying to work together. So it makes it, like, really easy to go from an experiment you're hacking around with to a really high quality

(10:22):
complete library.
Yeah. It's definitely easy to, like you said, start off with something. And then before you know it, realize that you have something that's actually full blown and you want to be able to use it in more places. And I'm wondering if you can talk to
the primary challenges that you see of using a notebook itself as a means of
building and collaborating on projects, particularly with other people, because notebooks are definitely

(10:48):
useful
for exploratory programming, as you said, and they can be very useful for sharing the results of your work from a documentation
and display perspective. But
in terms of the
collaborative or team oriented aspects of it, I know that there are some shortcomings. I'm wondering if you could just talk to the challenges that you've come across in that regard.

(11:09):
If you're not using nbdev and you're just using notebooks, there's a lot of shortcomings.
Take something very simple, which is
nbdev doesn't play sorry. Yeah. Notebooks don't play nicely with version control kind of out of the box.
So you end up with
the diff markers that Git will add in to the file, makes it not JSON anymore, which means the notebook can't read it anymore, and that's gonna be a real mess. You end up with a lot of conflicts

(11:34):
because, like, metadata can change in the cells that creates, like, dozens of conflicts even in cells you haven't changed.
So
with MBDEV, it has its own
diffing merging
tool,
which actually
ends up really nice because it does it at a cell level. It knows to ignore metadata.
It ignores differences in outputs.

(11:56):
So we actually end up with quite a nice
Git integration.
1 of the things I really like about working with notebooks is there's a
really nice web based tool you can use, ReviewNB,
which all your code reviews and PRs can go through. And when I'm doing a PR, it's really nice. So I'm not just looking at source code. I'm also looking at the documentation, the outputs, the hypertext,

(12:20):
you know,
in a web page. So I can see, has somebody made a PR that has
reduced the clarity of the
image augmentations in the fast AI library, for example. So normally with the plain diff,
I'd never be able to see that. I'd only see the code. But when you're working with notebooks in this way,
suddenly, it becomes

(12:41):
really, really nice that you actually get to see how it changed the outputs.
So that'd be 1 example.
Another example of a challenge is simply that
code that is in 1 notebook can't be kind of imported into and used in another notebook. So, again, for collaboration, that's
a nightmare, not just collaboration, but for yourself.
You know, you kind of end up putting everything into 1 notebook or copying and pasting.

(13:05):
So, again, with the dev,
that all gets handled for you. They get turned into libraries so that you can import code from 1 into another just like a plain Python library.
Another problem with collaboration is, like, notebooks.
There's some quite nice notebook viewers on
the web, and GitHub isn't has a basic notebook viewer, but they're not as

(13:26):
not nearly as nice to work with as properly
indexed
documentation
with proper hyperlinks and tables of contents and search and so forth.
So, again, nbdev will add that for you.
So, yeah, all the kind of limitations
of working with notebooks, of which there are many,
suddenly actually become

(13:47):
features when you add nbdev on top of it. I just wanna add to this. And your question was, you know, what challenges MBdev
present to, like, collaboration?
And there's, like, little bit of fixed cost for a contributor to learn MBdev. But in my experience,
once people
do that little bit of learning about MB Dev, collaboration actually becomes a lot easier

(14:12):
because MBdev
promotes
a very nice workflow for software engineering and promotes best practices.
So NVIDIA
really encourages you to write documentation
and tests
because you do it in the same context. You write your code, your documentation, and your pros and tests altogether.

(14:33):
And
so when someone is trying to contribute to your project, and I've experienced this many times
at work, you know, that person is forced to explain the code that they're adding. And oftentimes
in that process, we realize, hey. Like, we're not able to really explain that code or that code is too complicated. It ends up being naturally refactored because you're writing docs and tests at the same time in the same context.

(14:56):
And you're really looking at code as
your documentation as a first class citizen
and writing code so that it can be presented to other people and and understood. And so I found that that really helps with collaboration. It kind of naturally works out. I find I'm doing less back and forth with people.
Yeah. I mean, that's a good point. I find as an open source maintainer,

(15:19):
the PRs I receive are higher quality and then be dev projects because when somebody's adding code, they're in the middle of the tests, the documentation. So,
you know, it's pretty rare for somebody to
misunderstand the context of why their code is there because they're, like, literally in the middle of documentation about it as they write their code. Pretty rare that they wouldn't have tests because, again, they're kind of adding code in amongst all the tests.

(15:45):
So, yeah, I do find I get higher quality PRs with MBdev projects.
When I first started out with MBdev, I jumped into this project
called Fastcore.
It's a fairly advanced Python library
built by Jeremy. And I thought there's no way I'm gonna understand this. This is, like, basically, like, magic.
But it's because of MBdev. MBdev allowed me to read the documentation and code together

(16:10):
and play with it in a very nice
interactive environment that I was able to catch on really fast,
much faster than any other project of similar complexity.
And the nice thing is those explorations that you did, Hammel, became part of the documentation. Because quite a lot of those explorations,
you made part of a PR to say, like, here's how this thing works. So I thought that was kind of cool, but you're exploring in the notebook became

(16:36):
explorations that other people could then learn from. Yeah. Definitely. Yeah. It was really gratify you know, like, the learning also paid off. Anytime I would read code, I would say, hey. Let me just add a little bit to the documentation here. Let me add another test, and it's not clear. So that's what really got me hooked. I really saw the power of mbdev.
Because what frustrates me as a user

(16:57):
when I use any Python library
is lack of documentation.
I think documentation is really underrated.
And so that's something that, you know, MBdev
really promotes.
Allows you to just write it in a very natural way.
There are a number of other projects that work to complement overall ecosystem of working with Jupyter Notebooks.

(17:18):
You know, there's the JupyterLab project to make it a little bit more like an IDE.
There are a number of different plugins to Jupyter itself.
And then there's also the overall ecosystem of other
notebook environments
beyond just Jupyter. And I'm wondering if you can just talk
to how nbdev compares to or complements some of those other tools either within or outside of the Jupyter ecosystem.

(17:41):
Yeah. JupyterLab
is an exciting
development
of Jupyter Notebooks.
The most recent version, version 3, that just came out a week or 2 ago,
includes an integrated graphical debugger,
which is a really cool step. The nice thing is that n b dev works fine with whatever
Jupyter Notebooks

(18:02):
host or Jupyter Notebooks server you're using. So nbdev works just as well with the classic notebooks
as with interact, as with lab, or whatever you prefer.
So it it's great to see how
the notebook community is
rapidly iterating and improving.
You know, other cool stuff happening in the notebooks world includes stuff like Voila. Voila is a system that lets you create a graphical web applications

(18:28):
entirely in
Jupyter.
And JupyterLab
isn't even now has a
beta version of
a drag and drop GUI builder that will create a Voila app from a notebook for you. And, again, all this stuff integrates really well with nbdev because once you've got things working the way you want, nbdev will then let you turn that into a library that anybody can pip install or or condor install with continuous integration and tests and documentation.

(18:57):
Digging a bit more into nbdev itself, can you talk to how it's implemented and the feature set that it provides, and how the overall design and goals of it have evolved since you first began working on it?
There's a lot of features in MBDEV.
Something that Jeremy just mentioned is continuous integration, which is really exciting. So
a lot of people don't really find they don't understand continuous integration or find it very difficult. I mean, certainly, I,

(19:23):
when I
first learned about continuous integration, I thought it was pretty difficult
to get my hands around.
And so MBdev runs a CI for
you out of the box without any intervention from the user.
MBdev implements allows you to write tests in notebooks in a very natural way. You don't have to learn a special API.

(19:44):
Like, for example,
if you wanna use pytest,
you don't have to learn pytest. You can just
write tests, like, with assert statements.
Then b dev machinery will execute will find those and execute those as tests automatically,
and then they'll also run them in CI. So when you write your code and you push it, let's say, to GitHub,

(20:05):
it will run-in
GitHub actions for you and execute those tests and let you know whether or not all your tests are passing. So that's pretty advanced, you know, production level
best practices,
stuff that gets
done for you automatically.
And to get to that point, you literally just type. So there are various

(20:26):
command line tools installed with nbdev, and 1 of them is nbdev
new. And that will create a project for you. And 1 of the things that's created in that project
is a GitHub actions
continuous integration
runner. Now if you don't use GitHub, you use something else, you would obviously need to modify that a little bit to work with your CI,

(20:47):
but it's pretty straightforward to do that. And then you'll see that as soon as you
push,
you'll actually get an email saying,
oh, your continuous integration is currently failing. So you actually set it up so that it, like, shows you how to write and pass your first test. So, like, out of the box, you're actually being told about the fact that that continuous integration is there. It's set up for you, and it shows you how to get your first test passing.

(21:14):
Another really central feature to MBDEV,
perhaps 1 of the most central ones, is the doc how the documentation gets built.
So you don't have to know anything about HTML, CSS, web hosting, anything like that. You don't have to know Sphinx. I don't know Sphinx myself. You don't have to learn any kind of special

(21:34):
presentational
API thing.
Notebooks get rendered into documentation for
you and get hosted for you
on GitHub Pages.
So, you know, you don't really have to do anything. And the documentation
has a lot of nice touches to it that are added in for you automatically. So 1 of my favorite features of the documentation

(21:56):
is if you surround
a name of a module in backticks
either from your library or the Python standard library or other things, MBUX will automatically
introspect that and find the link
to the source
code and will create a link for that. And not just modules, but also functions and also classes, pretty much any kind of symbol. Yeah. Definitely. And, you know, you'll create table of contents.

(22:21):
It will automatically
kind of expand documentation for you if you have, you know, docstrings.
It's very robust, so you can hide cells,
show cells, hide output, show output. You can have collapsible cells.
So it's really easy to use. It's very customizable.
That's another feature that is super exciting for me. All of these things

(22:45):
happen
from these simple command line tools I mentioned.
So 1 of the nice things about this is, you know, you can work in
whatever
environment you like, you know, because they're just tools that you run at the terminal. You can integrate them into any
scripts or processes or
whatever, and they'll integrate well with any other extensions that you're using and so forth. So a big part of the design of n b dev has been to ensure that it's

(23:13):
very flexible
and doesn't lock you into
any particular details about the tools that you're using
other than that you're writing stuff in notebooks.
And in terms of the workflow of somebody who's using nbdev within a notebook environment to build a project, Can you just talk through some of the steps involved? I know you mentioned the

(23:34):
commenting on certain cells and how you're able to mark them as being used for particular purposes, whether it's the code or the documentation or tests, etcetera, hiding and showing.
And, also, for somebody who is working in Jupyter, at what point should they start thinking about whether they want to bring nbdev in and just the overall experience of building a project with it?

(23:56):
When I start a new project, I always start by typing nbdev new,
regardless of whether I actually think this is gonna end up being something that I
export into a library and documentation with MB dev or not just because that's gonna create the, you know, the basic structure that I need regardless.
And there are certain, like,
nice little things that are gonna be created there. Like, if I type make release, it'll upload things to PyPI and Anaconda for

(24:24):
me. If I build a library, it'll create a read me for me. So I can kind of, like,
get a bunch of nice functionality even if I don't actually need nbdev for that much stuff for for a particular project. So I'll start by typing nbdev new.
Pretty much anything I do regardless whether I'm creating a server or a command line application
or a, you know, model training library for deep learning or whatever, I'll start

(24:48):
in a notebook because a notebook is basically a a REPL.
But it's a REPL that is highly flexible and is not just text and is
not just line oriented.
So it's kind of this incredibly
flexible, powerful REPL.
And so then I'll generally start exploring. You know, I very rarely know
exactly what I want to build and exactly how to build it. You know, I'll often now have to learn about some API I haven't tried before

(25:14):
or
try and implement an algorithm
or whatever. So I'll start
exploring.
And often just to help myself explore, I'll write little bits of markdown
pros here and there to kinda say, like so for example, recently, I played with the GitHub API.
It has a new fairly new OpenAPI
specification.

(25:35):
And I've never used an OpenAPI specification directly before, so I started just, like,
loading in the JSON, finding out what keys were in it, and so forth. And as I did that, I was just adding little bits and pieces of markdown to kind of explain to myself
as I went along
what it was that I was doing.
And, yeah, at some point, I kind of thought like, oh, okay. Those steps I just did look like a pretty good way to,

(26:01):
you know, pull the the list of methods out of a open API specification.
So I merge them into a cell, create a function, and then at at the top of that cell, I'd write hash export.
And so that now is gonna be the first thing in my library.
And then the markdown
that's around that will then become along with the docstring,

(26:25):
and the signature will become the documentation
for that. So I can just kind of gradually
build out from there.
I'm interested in understanding the scalability of this solution as you work on projects that grow
notebook
to
export
to.
Now
you
don't

(26:46):
have
to
have,
the notebook to export to. Now you don't have to
have everything export to the same module. A notebook is very customizable. You can have different cells export to different modules,
but, you know, you can also have a notebook
export to a module. So it's not that different than writing code in a text editor with regards to organizing that code. You know, oftentimes, we'll have 1 notebook per,

(27:13):
like, a 1 to 1 mapping almost between notebooks and Python files. So it scales pretty well. There's no issues that that I can
see where scaling per se is is a concern.
I mean, the fast AI library, for instance, is a pretty big and complex library with many dozens of modules.
But, yeah, because as Hamel said, really, most of the time, it's just a notebook maps to a module. It doesn't really look any different to

(27:41):
any other kind of,
Python library you would build.
Another aspect of working with notebooks is the ability to do out of order execution where, particularly if you're exploring, you start with cell 1, and then you get down to cell 15, and then decide, oh, I need this this value back in cell 4.
And so you might go bouncing between various cells in, you know, a semi random order, and then you want to be able to ensure that everything actually works from top to bottom. And I'm just wondering what that looks like in terms of your work flow when using nbdev to build an exportable module and just ensuring that you

(28:18):
aren't confusing
the functionality
of the code as it is displayed with the inherent internal state that's built up over the course of working within that notebook?
The ability to bounce around and
manipulate the state in a notebook is kind of much misunderstood
feature of the environment, which is actually critical to

(28:39):
all kinds of explorations.
So, for example, in deep learning,
often, it's gonna take a few hours to
train a model,
and you don't wanna, like,
have that few hours retrained
every time you modify a cell. You know? So the ability to have state
and manipulate it is critical.
Or if you've downloaded, you know, some big JSON data structure

(29:02):
and you don't wanna be having to deal with, like, figuring out what things to serialize and then load back and find some way to optimize
things so that you can work interactively.
It's just like using
your shell, whether it be bash or zsh or whatever,
that your shell is
stateful. You know, your file system is stateful.

(29:23):
You create files, delete files, move files, and depending on the order of things, you know, it it's not
fully reproducible unless you rerun those commands in the same order. So a notebook's really like that.
Now as you say,
once you've done that, since you're gonna want to turn this into
a library, into a module or a bunch of modules that other people can run,

(29:46):
they are gonna run it from top to bottom. So both Jupyter and nbdev
have things to make this convenient.
Both the continuous integration and the
integrated interactive tests that can happen at your terminal with nbdev
run things from top to bottom, and they run every cell from top to bottom. So that will let you know if anything's not working.

(30:07):
And then Jupyter itself
lets you run every cell from top to bottom starting out with a clean state. And unfortunately, out of the box, it doesn't come with a key binding. So 1 of the first things I do when I set up a new machine, the 1 nowadays is all automated, but I always tell my students, put a key binding
on the restart and run all

(30:29):
command in Jupyter, because that's something that you wanna be running from time to time just to double check that everything's working smoothly.
We've talked a bit about this as far as the
experience and the change in perspective that comes from working in a notebook and using this literate environment
and how that influences your approach to software engineering. But I'm wondering if you can just talk through some of the more detailed aspects of

(30:56):
how you change your approach to writing software if you're in a text editor such as Versus Code or emacs or Vim versus working in Jupyter
and just how that changes the way you think about the project design and the approach to building the software.
When developing software notebooks like this,

(31:16):
1 thing that has changed for me is compared to a text editor where you might have a bunch of code and, you know, you have various functions,
and those functions may have entry points.
It's unclear, like, what the entry point to that function is or what code path leads to that function.
So debugging can be a little bit sometimes complicated. But when you, develop code in notebooks and along with the documentation,

(31:41):
you're creating
playground
where
you want to show everybody what is the entry point to that function, how to execute it, what are the dependencies.
You know, you kinda create this environment with the minimal dependencies required
to execute that function or method, and that is really powerful.
You And you want to also be able to do that to specify your test in a convenient way. That is 1 thing. Another thing is I try to simplify my code a lot. Because

(32:09):
when you're writing documentation,
if something is trying to do too many things,
you know, that can be really
painful
for you while you're, you know, trying to explain it. So it really forces you to write better code. 1 of the things actually
Hamil was talking about kind of having this playground to explore,
there's a feature in nbdev, an optional feature you can turn on in the configuration that will automatically add a

(32:33):
launch in Colab button at the top of every page of the documentation.
So Colab is a free online
Jupyter
environment.
And so this means that you can literally click a button
or your users can click a button in your documentation, and instantly, that documentation has been converted from
something you read

(32:54):
to something you interact with. And that's really great because
I love working with other people's nbdev
libraries because I can click that button, and then I can start actually experimenting
with the examples they have in their documentation.
I mean, overall,
you know, I've been coding for, gosh, many, many decades,

(33:15):
and I find working in Jupyter Notebooks and MB Dev, I am
some multiples
more productive
than I am
using
Versus Code or Visual Studio or VM or, you know, other I've used a lot of different environments.
And I hear this a lot from other people as well. We quite a few people, you know, come on to our Discord chat and

(33:40):
say, my workplace,
you know, has not standardized on nbdev, and I have to use something else.
And, literally, we hear people talking about
sharing
stories of which companies
let you use nbdev, and people are, like, talking about quitting their jobs in order to go to another job where they can use nbdev. That's, like, the level of

(34:01):
love that people have for for using this and frustration they have when they can't.
I think it's really counterintuitive
to people that there can be
a much better
development
environment and way to develop software because those tools haven't changed for so long.
And when you say that to someone, it's almost like a disbelief.
Like, what are you talking about? They look at you like you're a quack.

(34:24):
But it's only until you try it that these things become apparent to you and you realize, hey. I am a lot more productive. My code is more maintainable and spending a lot less time toiling away on these, you know, tasks I don't care about. And so, yeah, I think that's what we're
seeing. And for somebody who has an existing project that has been written in just the quote unquote standard fashion of just flat files

(34:50):
that they're organizing into a hierarchical structure.
What is the process of converting that to use nbdev
and moving from the
previous approach of my documentation lives here, my code lives over here, my tests are in a different place, and merging them all back together in a more natural form.
There

(35:10):
are tools out there
which
will help do that for you. It's important to remember that a notebook is just a JSON file,
and the JSON
each cell basically is part of a JSON array,
and so then there's a dictionary with 1 attribute that says whether it's a code cell or a markdown cell and 1 attribute saying what the contents are. So it's actually trivially easy to

(35:34):
turn
a Python
module source code file back, you know, into a JSON file,
splitting
each functional class into a cell. And so there are tools that'll do that for you,
but that's only
the first part of the process because to actually take advantage of this properly,
you really wanna be thinking about the flow

(35:57):
of that notebook in terms of somebody reading it is not just reading it there. Hopefully, they're interacting with it. So I would kind of start with some automated tool to create a notebook that basically does the job
from the
from the source code of the module.
And then I'd start think you know, looking at my tests and thinking, okay. Well, which 1 of these

(36:17):
really quite descriptive of what this module's really doing? Can I turn those into kind of documentation
tests?
You know? And then what things in the documentation
can I kind of integrate with those and, you know, just gradually bring it together 1 piece at a time? You don't have to do it all at once.
And another interesting point is how nbdev

(36:40):
integrates with the rest of the Python ecosystem.
I'm thinking in particular about things like dependency management, whether you wanna use pip or
poetry
or
PIP tools
and
how it fits with things like linting
and just the overall
integration points that are available
for using nbdev for actually

(37:02):
building the project, but also taking advantage of the the rest of the developer tooling that exists for people using Python and building Python projects?
The integration is pretty good with, you mentioned PIP, for example.
So
nbdev
automatically
generates
standard setup tools,
setup packages. 1 of the nice things about it is that you have a single configuration

(37:27):
file that
your version number and description and so forth are in. So for something like your PyPI package, when it's uploaded,
that'll all be used. It'll automatically use your
index dotipynbnotebookfiletocreatethe
description that will appear in PyPy.
You know, things like poetry and stuff are not particularly

(37:49):
either here or there. You can use whatever
environment
you like.
Most of the developers of nbdev
generally use conda environments,
but you can do whatever you like there.
For
linting,
that's pretty orthogonal to nbdev. You can use whatever linter you like.
JupyterLab

(38:10):
has
extensions that lets you plug into whatever linter
you prefer,
or you could do it as part of the GitHub or GitHub actions for, again, working on the JSON file.
It's not
opinionated at all about
what the rest of your environment should look like and what other tools you might use.

(38:30):
In terms of people who are using nbdev and building things with it, what are some of the most interesting or innovative or unexpected ways that you've seen it used?
There's a lot of cool things that I've seen. So 1 example that sticks out of my mind is what Jeremy was describing earlier about the
Python client for GitHub's API that uses the open API spec. You know, if you go through GitHub's documentation,

(38:53):
you have to click on 20 different pages to see all the endpoints.
But because he's generating things from the open API spec, there's, like, a 1 pager of, like, all the endpoints,
and that's linked to all the various
things that you need to know about using that endpoint.
And that's integrated deeply into the documentation

(39:13):
itself
for the Python client also.
And so
when you try to use the Python client,
it's called gh API,
and you call help on an endpoint, you get a link in the docs
that take you to the GitHub
documentation
for the endpoint.
I've seen some really cool things people have done with documentation
to make the documentation richer

(39:35):
with regards to linking to other relevant sources automatically.
I think that's really cool.
In terms of your experience
of building and
working with nbdev, what are some of the most interesting or unexpected or challenging lessons that you've learned in that process?
I mean, the interesting lesson for me overall
is the power of

(39:57):
this type of software development.
It's fairly under the radar. People
don't know about it, but it certainly
is a really good insight into
how powerful this technique is for writing software.
And it also
gives you a window into, like, maybe how these tools could improve,

(40:17):
you know, in the future.
I guess 1 thing that's been a challenge,
been a slightly surprising
from some people kind of push back against the very idea of using anything other than a standard text editor to create code.
I find that
very new programmers
and extremely experienced programmers

(40:40):
are very interested in nvdev
and really wanna try it out.
But
there's a group of people who are, like, kind
of 3 to 8 year
experience marks or kind of intermediate level programmers
who
seem to find it almost threatening,
the idea that
people might wanna use something other than emacs or VM or Versus code or something to to and they kind of people get sometimes quite emotional

(41:07):
about, like,
no. That's not how real software engineers write real code. And this kind of emotional response
from some people is not something I expected.
I'm actually not surprised by that
dynamic,
I suppose.
I find that with many tools,
there often is a resistance to change,

(41:27):
especially something like developer tools. So people have just gotten used to the idea that they've
taken developer tools for granted, and they haven't changed, and people are resistant to the idea that
it could be an order of magnitude better.
So
I was skeptical too.
When I first got into it, I said, okay. I mean, certainly,

(41:49):
you know, developer tools are as kind of a staple.
They would have been improved
themselves if they could have been like, you know, how is it that much better? You know, I tried it and I was really surprised.
Another interesting
phenomenon
is so there's this book called Working in Public, the Making
and Maintenance of Open Source Software by Nadia Ekblal. She's a former GitHub employee. She's done a study,

(42:12):
a lot of open source projects
and kind of the dynamics of them. 1 thing that she
documents in her book is
deluge that maintainers face
in terms of low quality pull requests that they have to deal with.
And I've talked with Jeremy about this before. Like, we don't really see that across fast AI. And fast AI has tons of projects and is extremely popular on GitHub,

(42:38):
and it has a lot of activity.
And I think the reason for that is MBdev
kind of forces you to write high quality PRs.
And so I think it saved Jeremy's sanity
as a side effect,
which is really interesting from a maintainer's perspective
and the open source

(42:58):
economy perspective. Yeah. And speaking as somebody who uses emacs and has become very comfortable there, you know, the thought of editing in my browser
is painful in the regard of I've I've gotten so used to the keyboard commands,
but I'm also very attracted
by the possibility
of
weaving together the code and the documentation and the tests because it it can be all too easy to be working in a text editor and write the main body of the functionality of the code and then say, okay. Well, I'll get back to the test another time.

(43:29):
You know, you can have, you know, your test open in 1 window or in 1 buffer and your code in another and bounce between them. But, you know, I'm definitely interested in experimenting with nbdev to see how it works. But I'm also curious
what level of support there is for people who are very comfortable in their text editing environment, but still want to be able to take advantage of what Nvdev has to offer.

(43:51):
That describes me very much. I've been coding for many decades, and
as you can imagine, I'm in love with tooling
since I invest so much in tooling.
So I, yeah, I know every keyboard shortcut pretty much of every,
you know, piece of software I use. So I certainly

(44:12):
love to jump into
Vim and, you know, do some stuff with a quick macro
or some motion commands or whatever.
And, yeah, that's fine. You can do that with nbdev. You can edit the modules, the text files directly,
and sync back into the notebooks automatically.
I will say though that the more I use

(44:34):
nbdev and notebooks, the less I find myself
doing that.
I used to do it a little bit, but it was mainly kind of habit.
It's very nice to be able to
jump around to cells
rather than code and to kind of jump into through hierarchies.
You can kind of, like,
hack together hierarchies in

(44:56):
a max and vim and so forth, but I really like the true
hierarchical nature of notebooks that you can create actual headings and
stuff like that. So, yeah, you can certainly use your own editor if you want to, but I find I do it less and less.
For people who are interested
in experimenting with nbdev,

(45:18):
are there any problem domains or
integrations with existing libraries or workflows where you see nbdev as being the wrong choice or something that is incompatible
with the existing environment?
Hamil and I are both working on something that doesn't lend itself very well to nbdev, which is we're working on build tools.

(45:39):
So we're doing a lot of stuff with, you know, make files and
conda packages and automatic build systems
running on GitHub actions.
And so there's basically almost no Python involved, and it's yeah. It doesn't lend itself particularly well to nbdev. We were just saying to each other, we wished this morning it did because we

(46:02):
aren't really enjoying being outside of the notebook environment.
This is something I'd like to improve actually because Jupyter can do other kernels other than Python. There's a bash kernel, for example, which is kind of cool, and I've written some nice documentation using the bash kernel.
To my surprise,
I found
that nbdev and notebooks works very well for creating servers.

(46:23):
I didn't really expect that at first, but, actually, I found I could write servers with n b dev very nicely.
So, yeah, generally, I mean, I haven't found too much stuff that is largely Python based, which
isn't suited to MBdev. I don't know if you have AML or thought of other things like that. To be quite honest, at this point, MBdev is like crack to me. Like, I just

(46:45):
it's hard not to use it. It's very painful not to use
it. As you continue to work on the project, what are some of the plans that you have in store for the near to medium future?
We are doing a rewrite of NV Dev at the moment. I tend to rewrite
my major pieces of software every year or 2, which I really like.

(47:06):
And
the new version's gonna be orders of magnitude faster.
We're also looking at replacing the Jekyll based
documentation
with Hugo based documentation.
Again, 1 of the reasons there is for
performance that Hugo is really fast, which is very nice. We we kind of love
working with tools that

(47:27):
are fast enough that things feel almost instant.
Definitely isn't the case with Jekyll. You know, 1 of the things I've been thinking about also is supporting
directly building
c based extensions
by integrating Cython
within b dev.
So those are some of the big things that we're hoping to implement in the coming months.

(47:47):
Are there any other aspects of the Nbdev project or working in the notebook environment that we didn't discuss yet that you'd like to cover before we close out the show?
I think, you know, 1 thing that we may have not covered is is fast core. So, you know, fast core is kind of an extension. You can think of it as an extension almost
to the Python programming language.

(48:08):
I mean, don't take those words literally, but, you know, it adds a lot
of functionality
that's easy to access. That's important for MB dev because we've add there's a lot of utilities
in fast core
that make
using
Python and MBDEV a lot easier.
So for example,
if you have a let's say you have a really big class

(48:29):
that have tons of methods in them, and you wanna write pros
that surround your text, you might want to
define, like, a method in a different cell. You might not want this 1 giant cell for your let's say, you know, to your class.
Well, with fast core, give you easy ways to kind of break up that class so you can just pull the methods out into a different cell. It's all tested and works with this integration testing with MBdev. So a lot of utilities that just make your life a lot easier. I would recommend checking that out. It's a very interesting library.

(49:01):
For anybody who wants to get in touch with either of you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks.
And this week, I'm going to choose an audiobook I listened to with the family recently called Rivals, Frenemies Who Changed the World. It's just a really fun production of some
short pieces of from history about different people who were

(49:24):
at 1 point friends and then ended up creating these historic rivalries and the impact that that's had on our modern world. So a lot of fun stories there and just fun, production value to keep kids interested in learning some history. So definitely recommend checking that out. And so with that, I'll pass it to you, Jeremy. Do you have any picks this week?
My pick is the game of chess,

(49:44):
which I
always assumed was
really boring until
my 5 year old daughter started getting into it. And so we started playing a bit together, and I suddenly discovered is actually
really deep and much more fun than I expected.
Yeah. I'll definitely second that 1. And, Hamil, how about you, Dave? Any picks this week? Actually, Jeremy recommended this book to me, which I've been reading with great

(50:08):
interest and surprise.
It's called Moonwalking with Einstein
by Joshua
Foer.
Before reading this book, I thought
not having a good memory was a sign of stupidity.
But, actually,
this book goes into really deep in great detail about, like, how memory works,
common
misconceptions about memory,

(50:29):
how people that have good memory, what techniques
they often use, and what it means. So it's really fascinating.
Well, thank you both for taking the time today to join me and share the work that you've been doing with nbdev. It's definitely a very interesting project and 1 that I'll have to experiment with myself to try and understand
benefits that it can provide to my own development. So thank you for the time and effort you've put into that, and I hope you enjoy the rest of your day. Thank you very

(50:55):
much.
Thank you for listening. Don't forget to check out our other show, the Data Engineering
podcast@dataengineeringpodcast.com
for the latest on modern data management.
And visit the site of pythonpodcastdot
com to subscribe to the show, sign up for the mailing list, and read the show notes.
And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com

(51:18):
with your story.
To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

All Episodes

Summary

Announcements

Interview

Keep In Touch

Episode Transcript

Popular Podcasts

Stuff You Should Know

24/7 News: The Latest

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Exploring Literate Programming For Python Projects With nbdev