All Episodes

November 9, 2020 • 36 mins

Summary

The CPython implementation has grown and evolved significantly over the past ~25 years. In that time there have been many other projects to create compatible runtimes for your Python code. One of the challenges for these other projects is the lack of a fully documented specification of how and why everything works the way that it does. In the most recent Python language summit Mark Shannon proposed implementing a formal specification for CPython, and in this episode he shares his reasoning for why that would be helpful and what is involved in making it a reality.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to pythonpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s pythonpodcast.com/talkpython, and don’t forget to thank them for supporting the show.
  • Your host as usual is Tobias Macey and today I’m interviewing Mark Shannon about his efforts to create a formal specification for the CPython interpreter

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing the current state of how the Python language and the CPython runtime are defined?
  • What is your motivation in advocating for a specification?
    • After ~25 years of the language, why is now the time to pursue this effort?
    • How does the history of the language and the scope of the ecosystem and community impact the effort required to make this a reality?
  • What is involved in creating the specification and where would it be located once complete?
    • What are some examples of languages that are formally specified?
  • What are the possible benefits of creating a specification for the CPython virtual machine?
    • What is the distinction between a specification for the VM as opposed to a specification for the language?
  • What are some potential downsides to having a (semi-)formal specification become part of the definition of the interpreter?
  • Can you describe the process of doing the work to create the specification?
  • How are you approaching the actual definition of the specification (e.g. prose vs programmatic)?
    • What are the tradeoffs of prose vs. an executable specification
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Unknown (00:13):
Hello, and welcome to Podcast Thought in It, the podcast about Python and the people who make it great.
When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.
With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform,

(00:34):
including simple pricing, node balancers,
40 gigabit networking,
dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode,
that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Mark Shannon about his efforts to create a formal specification for the CPython interpreter. So Mark, can you start by introducing yourself?

(01:05):
Yeah. I'm Mark Shann,
a Python core developer.
I've been using Python since 2005 ish.
I started using it when I was doing my master's degree on building a CPython for a stack machine. And it's 1 of those things where you just need to do little tasks like generate tables of data and all sorts of things.
And,

(01:25):
you know, you just find that c or Java is just a pain.
And something like PHP is just even worse. And then I just came across Python, and it was just perfect. So I've been using ever since.
And so can you start by describing the current state of how the Python language and the CPython runtime are defined?
If you're the PyPI people, effectively, you have to just

(01:47):
assume that CPython is the definition of the language and just try and be bug for bugmatical.
There is a sort of specification on, like, for users. So there's object model,
how the syntax works, etcetera,
Which is very useful if you're learning the language, well, probably not great for initially learning the language. But once you've learned it, sort of, you know, to look stuff up and, like, what's this supposed to do? But in terms of a detailed specification

(02:12):
for
continued CPython development
and development of other potential
virtual machines like PyPy
is sort of lacking.
And so you mentioned
how CPython is the reference implementation, and everything has to define itself
in terms of whatever CPython happens to be doing.
And what that is can change from version to version.

(02:35):
And
so I'm curious if you can do what the motivation is for actually trying to advocate for a more formalized
specification of the language
and
whether
the specification
is then tied to CPython itself or if it is a body
apart from that, and all implementations of Python should be trying to adhere to that? I think initially it would just be for CPython, but there's no reason why

(03:03):
we can't differentiate
what's sort of CPython specific
and what's more generally just Python.
The motivation really is
this sort of evolution
of the language
and the runtime.
If we talk about changing things, it's
you know, just looking at patches
of C code

(03:23):
or sort of informal discussions about what we're gonna do, whether, you know, how the language will change if you add a new feature.
It's very hard to see the sort of odd corner cases
or to work out the sort of long term ramifications
of things. So I'm gonna use PEP 380 as an example. So PEP 380,
the title is syntax for delegating to subgenerates, but you probably know it's just yield from, it's just the yield from keyword.

(03:49):
Now that's a nice little feature.
It's pretty well defined in the PEP.
It's got a big long chunk of code, sort of equivalent sort of behavior.
But nonetheless
that produced quite a few sort of odd sort of corner case bugs and in fact has taken quite a long time to knock out all this sort of obscure little bugs
to do with handling,

(04:10):
throw,
various other little corner cases. Most of which, almost no 1 would notice, but they sort of just crop up occasionally.
And those sort of bugs, I don't know if we would have got rid of them if we'd had a more formal spec, but I think it's likely that we would have more likely seen those upfront.
Python itself has been around for on the order of 25 years at this point. I'm curious why you see now as being a good time to pursue the effort of formalizing the language and the runtime?

(04:36):
So I think it's basically now is a good time because
I can't do it 20 years ago or 10 years ago. It's the old saying, what's the best time to plant a tree? You know, it's 20 years ago, but if you can't do that, do it now. So I think it's a long term useful thing, and would be nice if we'd had it 5, 10 years ago, but we didn't. So let's do it now.
And so because the language does have all of this history accrued at this point, and there is this entire ecosystem that has grown up around it in terms of libraries and packages and other implementations of the runtime.

(05:08):
I'm curious how that overall scope and weight
impacts the overall effort required to actually bring a specification
to fruition?
If it were to be a complete,
pretty formal specification, that would make it a huge job.
But I still think
a specification of, like, the core language itself, even we're a bit fuzzy about sort of some of the interactions with CAPI

(05:33):
or things of that sort of level
are vague. And the other thing is it doesn't have to
I mean, it can grow. If the spec's kind of almost there
for, say, a new language feature or
interaction with some important library,
then, you know, specifying changes to those things, it might be useful to add those things to specification as part of that.

(05:55):
It's just the nature of open source stuff is it's, you know, if there's something there that's almost there, people are motivated to sort of push it a little bit further to get what they want. So somebody has to, like, do enough of the work to get things going.
You brought up the topic of the specification
during the language summit in the lead up to what was supposed to be PyCon this year. I'm wondering if you can summarize some of the reactions of the other folks who are engaged in that conversation.

(06:23):
I yeah. I think general
skepticism. I think people need convincing there's any purpose in it. A few people sort of see value in it and were interested in helping. It's not obvious what the value is. I think
it's a useful way to talk about
changes the implementation
is the implementation doing the right thing?

(06:43):
If you're not dealing with actually implementing Python, it's probably not that much value.
And you mentioned, for instance, the folks who are building PyPy and the fact that they have to try and be
both feature and bug compatible with CPython to ensure that people who are writing programs to run on this other execution engine will function as intended.

(07:06):
And I'm wondering if you can just talk through some of the potential benefits of what the specification might provide for people who are working on the CPython virtual machine and people who are working on other implementations such as PyPy or Rust Python or Jython.
Developing PyPy has found a lot of bugs in CPython
where experimenting, well, what happens if you do this? And CPython behavior is

(07:31):
somewhat odd.
Sometimes we kinda have to specify those as that sort of behavior for historical reasons. Other times it's been changed and got fixed.
So
I think then it's
then having a model to sort of say, well, this is how it should work. So, you know, here's what we thought the model was and here's what CPython does. Is the model wrong or is CPython wrong? As opposed to

(07:53):
you know, just having a bunch of test cases and saying this is what it should do. Obviously, the test cases should correspond to the model,
which is an interesting thing of like, well, how do you actually verify that a model matches the tests? But I think we're gonna come on to that later.
In terms of
the actual act of specifying a language, I'm wondering if you can give some context of other ecosystems and other run times that do have a more formalized specification and what you see as being the overall impact in the direction that that language has taken or the value that it provides to those ecosystems?

(08:28):
Python itself does have a reasonable sort of specification.
It's
not really sort of semantics, though, sort of operational or doing intentional semantics.
So the idea is to have a model of like an abstract machine upon which
the language runs. The idea is you specify the translation to that machine and the operation of the machine, that's sort of operational semantics.

(08:51):
Java has a virtual machine specification and a language specification,
which isn't quite the abstract machine specification. It's it's more detailed in parts of how it actually operates
in the physical world. But C plus plus there is an abstract machine, in which
supposedly
your C plus plus language will translate to machine code obviously, but it's as if it were running it were translated onto this abstract machine and running on that. C plus plus is extremely complicated language,

(09:19):
how well it maps.
ML, which is not machine learning, but the ML, the programming language from the 19 seventies, did actually have a full form of specification.
But I'm not sure how much value that added because
I think 1 of the things of formal specifications
is
you can determine properties of a system from it.

(09:39):
So they're very useful if you have things like you want to specify a form of sort of state machine model or something, and you want to say things about your property, like your system, like it won't deadlock or
something like that. Whereas I think for language specification, that's not really that useful. What you really want is just, is my implementation doing the right thing? Do we even agree on what it's supposed to do?

(10:01):
So I think
a slightly less formal specification is what we want there. So other languages here, I covered ML, Java, C plus plus
No doubt there are others, but I I couldn't tell you off the top of the head.
1 of the things that you mentioned there is the intent of a particular
operation, and that's always 1 of the hardest things to be able to understand as somebody coming into a code base who wasn't there at its creation because

(10:28):
that information can easily be lost because it just lives in somebody's head, you know, for the duration of them actually working on themselves will likely forget about it if they try to come back to the same piece of code.
And so given that, I'm wondering
what your thoughts are on the viability of being able to try to
reconstruct what the intent is of various behaviors in CPython, given that the people who implemented it either might not be part of the community anymore, or they might just not even remember what the purpose of that was at the time of creation.

(11:00):
I guess it's kinda hard to know what people were thinking when they wrote code. You only have the code to go by.
But it's reasonable to assume
that any weird peculiar corner cases, unless they're documented specifically,
are just
likely to be errors rather than deliberate choices if they
conflict with the sort of general
idea of things. And there's also the problem, of course, that code is written with 2 things in mind, to do the right thing, but also because it's a programming language to do it as quickly as possible.

(11:28):
So those 2 aims of, sort of, clarity and precision and performance are often in conflict. And
often these corner cases are where things have been made faster and they're made faster incorrectly.
This can happen
fairly easily, unfortunately, especially because it's all written in C.
So having a specification where you say, right, this piece of syntax translates into this sequence of operations and these operations individually do these things,

(11:55):
with no interest in performance whatsoever, we just don't care. So we can make that entirely based around clarity.
And I think that helps a lot.
It's much easier to describe
things doing So this pushes this thing onto a stack. Everyone knows what a stack is.
But if it's written in C, you have to worry about buffer overflows,

(12:15):
point arithmetic,
and all sorts of stuff. And the general intent often gets lost in the details.
That also brings up the point of the specification
being tied to the virtual machine itself of CPython
versus the language, which is defined by the syntax
for which you can use the grammar that's embedded in CPython to understand

(12:36):
what is the actual allowable structure, but that, again, doesn't necessarily convey the appropriate semantics. So I'm wondering if you can maybe just draw the line between
creating a specification for the CPython VM versus creating the specification
for Python the language.
The idea here is that we have an abstract machine rather than a virtual machine. So a virtual machine, despite the name, is is a real machine, you can actually run code on it. I mean, the CPython virtual machine, Java virtual machine, whatever

(13:05):
virtual machine you choose will run programs, it will produce real output, it'll heat your computer up, hopefully do something useful in the while doing so. Whereas Avonnect machine is really just a pen and paper exercise.
So it's described
in terms of, you know, you probably describe it in the way you would describe a program in terms of data structures and how things

(13:26):
change those data structures.
But they don't need to be restricted by
practical or finite machines.
You'd have to worry about what happens if someone keeps multiplying interested together until they run out of memory. You don't have to worry about does the stack overflow, you don't have to worry about any of that sort of stuff. There's nothing stopping you using

(13:46):
potentially
algorithms that wouldn't terminate
or crazily inefficient
as long as the meaning is clear
because
the video their actual run time is not isn't really an issue at all. So this is the idea of abstract versus virtual machine. Once you have an abstract machine, it's a much easier thing to define a language
on because you're not having to define all these little details that are largely irrelevant. You can just say things like memory

(14:13):
there is as much memory as you need, it's automatically managed. That's all you need to say. You don't say anything about how it works, any timeliness guarantees, or any of these things that you might need to worry about for a real machine.
In terms of the actual effort to define the specification, I'm wondering if you can just outline the approach to actually
committing that to paper and writing it down and what's involved in Dirich to

(14:38):
identify the areas that need to be specified.
And then once it is created, where that specification might live to ensure that it is accessible to people at the time that they need it.
I'm currently working on git repo. I'll put a link in the notes. Currently, what you wanna do is you wanna say, okay, here's a Python program,
what will it do? And I'll be able to sort of describe to you using various elements of these sort of semantics how it would work. There's various stages in translating a Python program into some sort of machine

(15:10):
description, whether it's an abstract machine or a virtual machine. And first we have to tokenize it, parse it, and so on.
The specification
of that I think is pretty well done because
the grammar itself is a fairly formal specification of the translation from source code to
to AST. So picking up from the AST, if we can describe the translation

(15:32):
from AST into a series of abstract machine operations
and then describe the abstract machine operations. We have, at that point, a more or less
end to end description of how Python source code runs what it does.
So
there's there's 2 basically
parts to that. There's the how does the translation, how's the AST draft map to a sequence of operations,

(15:57):
and
what do those operations do. So I guess there's 3 parts because there's what the operations do and what the operations operate on. One's as a machine model,
which says there is some threads and each thread has a stack, and the stack's made up of frames, and the frames has local variables, etcetera, etcetera.
And there'll be you still find a start state,

(16:18):
and then you'll say, well, this
program translates to this sequence of operations,
and
you can specify each operation as here's what it does to the machine state,
and the machine state also defines what operation's gonna happen next. That sort of defines an operation,
operational semantics.

(16:39):
So this is useful for someone implementing the virtual machine. In terms of actually trying to understand what's going on with your Python program, I would say it's next to useless because
it's heavily recursive
and it's quite complicated, so it's not really gonna be terribly enlightening.
However, it is useful sort of discussion for
how a new programming language feature would work or how the current ones are supposed to work.

(17:03):
Because,
you know, each of those planes is reasonably well defined
even though in combined they're quite complicated. But if you're look focusing on a narrow part of the language,
then I think it should be comprehensible
in a way that the actual source code isn't because it's just too big and too complicated.
Particularly for people who are working on the CPython runtime,

(17:27):
if you're deep in the guts of the interpreter and you're trying to figure out how you wanna approach a particular problem,
is the formal specification something that might be embedded into the comments where there's a link to say, this is where the specification is for this particular section of the code so that it is collocated and easy to find? Or is it just something that will live independently as its own body of text and you just know that if you're trying to figure something out, you reference that to understand whether that area of the interpreter is covered in specification?

(17:59):
I think the first thing is to get something up to the state of actually being useful to the point where someone at some point will say, actually, this is a useful thing
to
describe this other thing, you know, new feature or or whatever.
Just a bug.
Maya should be a likely 1 as well.
And at what point in there, maybe we consider moving it to into sort of under the Python organization, but I'm not gonna suggest that. Now it's far from ready from that, and

(18:27):
I think just having it sort of on a pirate repo just just makes it more sort of accessible, and people can play with it, fork it, whatever.
In terms of
once we get to a state where this specification
is adopted, if that day does come, and then somebody wants to add some new capability to the language
where, right now, we have the PEP process to define what it is we're trying to do,

(18:52):
some examples of how this might be implemented, perhaps a reference implementation to look at. Would they then also need to
append to the
specification itself to incorporate that new information
at the time of submitting the PEP? Or is it something that you think might happen after it's been accepted, then they need to go in and update the specification?

(19:12):
Just curious what role this might have in terms of
additional work for
upgrading the language or adding new capabilities?
A PEP for a new feature for the language should be well specified. It should specify
the behavior of the new feature in detail.
So I would hope that
as the sort of formal specification would actually make it easier to specify those because

(19:36):
specifying it not to new features
are described in sort of equivalent existing Python code that they would do, but that only works if the language as it exists can or it can support that feature, which isn't always the case.
It's very hard to describe, you know, generators in terms of Python without generators. I mean they add a fundamental

(19:57):
capability to the language.
But they could be described in terms of extending the form of specification because for example generators,
you'd extend the machine model
and add a few operations for yield and
and then the converse.
So
I think that it shouldn't make it harder to write a well specified PEP.

(20:20):
Hopefully, it would make it easier, and
it might also offer a framework
for those who aren't possibly familiar with specifying things reasonably formally
to build on because there's already sort of specifications of language features that they're familiar with.
In the future world where we have a formal specification
and it has been adopted as part of the Python language and community, what are some of the possible benefits that you see

(20:47):
coming out of that for people who are using or working on Python?
For users of Python, I would say probably
very little. I'd imagine the prose documentation
would be far more useful.
Those people writing the prose documentation, at least it gives them sort of a more formal thing to base their documentation

(21:08):
on.
Because if you're documenting a fairly small feature of Python, you wanna write that in prose that fits in with the other prose around the other language, the features that fit around it. Whereas a specification doesn't need to really
work that way. It just needs to be precise.
It doesn't need to be particularly readable or accessible.
But it's there as a reference for those people wanting to make more accessible documentation.

(21:33):
The Python language documentation is pretty good.
I don't think there's a great use for these more formal specs for most users of language. I think it's implementers of the language that it really has value for.
And what are some of the potential downsides of having this specification?
More of anything, isn't it? So errors are a problem. You know, if there's

(21:56):
errors in the specification,
then that can be misleading.
That's obviously bad.
Yeah. So I think it takes more work and like anything we do, it's likely to introduce errors. I mean, the only way to not have any errors or anything is not to do anything. Those are the obviously
problems.
It's less likely to have errors. I mean, for a new feature,

(22:19):
I would say specification is less likely to have errors than code.
For well tested
older features, and the code tends to have had the bugs knocked out of it. It's hard to test a specification.
1 possible thing to do a specification is write
a implementation
of the language based against the specification which has

(22:41):
the device of the abstract machine model. If you can write in Python or some other high level language, Python is our other obvious candidate.
And performance just really isn't an issue in the slightest. So
it should be an interesting exercise to do that to cross
check the specification against
the language.
And that might be more useful from a sort of educational point of view.

(23:03):
But yeah, I think the basically problem is, yeah, it's more work and it has the potential for errors.
Another possible approach to formally specifying a given run time is to use an executable version of that with something like tla plus or Alloy. And I'm wondering what your thoughts were on the potential value of that versus the amount of extra effort that's necessary to be able to actually get something like that correct.

(23:26):
Well, as I say, it's a lot of extra effort, but I think the main problem with it is accessibility.
It basically requires people to learn those effectively another programming language.
And I'm not sure how useful it is because I think the point of these specs is that you can prove things about them. Now this is very useful for certain systems that should have certain properties.

(23:48):
For example, like deadlock,
various other properties you may want.
Your avionics software to have certain properties like not dropping your plane out of the sky,
but it's very harder to define the properties of a programming language that you want to prove.
I mean, it would be nice to prove things like it doesn't underflow
the stack or it doesn't have memory or it doesn't crash in these various ways,

(24:12):
but
that's
really where you wanna prove of the actual implementations,
not the specification.
So if you can, like, run some sort of model check or a verifier on your c code, then that would be great. But I don't think the sort of
these formal specification languages would be that useful. And I think also just mainly it just excludes people. I think

(24:34):
force individual operations in the sort of machine model, it should be sufficiently simple that the pros are unambiguous.
And
that way it's understandable.
And another effort that you are trying to take on right now is proposing some means of improving the overall execution speed of Python programs by speeding up the interpreter,

(24:58):
and your current goal is by a factor of
5. And I'm wondering how the efforts of formalizing
the CPython
runtime tie into your goals of improving the overall execution speed and some of the work that you have planned to make that a reality?
Can I first say that that's a very long term goal?
So before anyone gets too excited,

(25:19):
as terms to this helping, I'm not sure it does that much because
the speeding up CPython
has to
remain
compatible with CPython. It's not just the language semantics that we may or may not have defined. It's all the other features,
the CAPI,
certain behaviors

(25:40):
that are not necessarily part of the language, but are expected in terms of, say,
immediacy of garbage collection.
We have reference counting.
Garbage collector, which has
pros and cons relative to sort of chasing garbage collector that Node and the JVM use. But 1 of its pros is it's very prompt reclamation
of larger short lived objects.

(26:02):
That's a feature of CPython we'd probably would want to keep or if not kept. We don't wanna keep it. You know, we'd we wanna formally document the change, but that's independent of improving performance.
So
it might help in terms of sort of
discussing
why optimizations
are valid
or coming up with ideas for optimization because

(26:24):
it gives you a mental model of
how
Python is being executed that's independent from the C code, which is a very low level and quite bulky.
So it might give you a clearer way of saying, well, actually, if we represent
this abstract structure
instead of how we currently are represented

(26:44):
in this other way, it would run faster.
So in that sense, it might be useful.
Although, to be honest, I think the way it's actually come about is more the other way around. In that I have been thinking about how to improve
performance of CPython, and that has led me into thinking, well, what are the fundamentals here?
And then having worked some of those out, thinking, well actually it's useful to document that. So that's kind of how the formal semantics actually came out of the

(27:11):
my performance work, not the other way around.
And is this effort of formalizing
the specification
something that you're currently just doing by yourself? Or are there other people who are contributing their time and efforts to understanding some of the intent
of various aspects of the runtime and contributing to the specification itself?

(27:35):
It's currently just me. Informally, a few people have mentioned they might be interested. I've sort of wanted to get it to a stage where it
was at least covering enough of the language to sort of make sense for people to add to it.
So
I'm kind of aiming to for example, I'm specific I'm not planning obviously excluding long term

(27:56):
coroutines,
but in my immediate sort of trying to get some sort of specification for a chunk of the language, I've chosen to leave those out in order to get something coherent.
That's pretty much done and should be out by the time this podcast is released. There will be a GitHub thing. I mean, if you wish to collaborate, then issues and pull requests on GitHub are always welcome.

(28:19):
For documents that intend to try and create some formal specification,
there's usually a specific set of grammars or terminology that are used to to try and avoid any ambiguous language.
I'm wondering if you have any sort of glossary or
defined means
of referring to various aspects of the runtime or

(28:42):
things such as with RFCs, where there are specific meanings around should or must, what your thoughts are on the level of detail and level of rigor that's necessary for this document?
To us,
not really so far. As opposed to the whole should, must thing,
I think that's sort of like for formal English.
Now obviously, this will be based on formal English, but the idea of having this sort of abstract machine

(29:07):
is that it does operate to form a sequence of operations, and that's
possibly a more mathematical
construction. So I think
we're generally into less ambiguous
language because of that.
That's not to say that there are no ambiguities in the specification I've written so far.
But, yeah, I think the
more
mathematical style

(29:28):
of English rather than the sort of legal style probably helps.
Maybe that's just my personal biases.
In your work of trying to
define the specification
and
determine the goals and potential benefits of that, the actual work of committing it to paper. I'm wondering what you have found to be some of the most interesting or unexpected or challenging aspects of that work.

(29:52):
To define
Python, you also need to define
what the objects do.
So Python itself has an object model. You know, if you add 2 things together,
the definition of it is fairly simple, but it doesn't actually say what anything does.
It calls the done to add method, and if that return is not implemented, it calls the done to our add method

(30:14):
on the right hand side with some
fiddling around with subtyping to make sure that that works properly.
But it doesn't actually tell you what anything does. It doesn't say what happens if you add 2 integers or add a floating point to int or anything like that.
So there's that sort of object model sort of
part of the language, and that's I've really barely started on that.

(30:35):
But that also brings you into another key point of
this is why this is a sort of CPython semantics rather than just a Python semantics. What happens when you're interacting with like compiled
C code,
NumPy, things like that. Because these also
have all these operations. There's a reason for CPython is that extensions can
you know, extension classes and extension objects are first class

(30:58):
members of, you know, CPython.
There's no real difference between
a NumPy array and a built in list or integer.
There might be some subtle differences in terms of performance because of integers and lists are just fundamental to the virtual machine. But in terms of what their capabilities, there is no difference. So you need to define
how

(31:19):
the VM
transfers
information to these opaque blobs and what it gets back and what that means.
And there's also the other side of the interaction with,
let's say, C code. I mean, it doesn't have to be written in C, it could be Rust or whatever. But there's there's sort of machine level code. And there's also how this machine level code interacts with the abstract machine itself. How does it start? So like what are the start states of the abstract machine? So it's already well saying it's got, you know, it's a stack.

(31:49):
So let's describe addition.
You evaluate the left hand side of the addition and you push that onto some stack. You evaluate the right hand side, you push that onto a stack. And then addition pops the 2 off and adds them together and push the result back on the stack.
But what stack?
How did that stack get in the 1st place? What's the starting state of the stack?
Is it like a stack per thread? And so on and so forth. So there's a lot of

(32:13):
how the abstract machine,
its internal states,
how that's all put together, what the starting state is, what the end state is.
You
also think of a Python program or any program as like how it operates. Usually you haven't put a lot of thought into how it sort of starts and finishes.
It's interesting because if you look at how a compiled program works, if you write a small c program,

(32:36):
a small c program is a lot of code in a small c program, low machine code, and a lot of that has to do with loading the program up, setting up its state before it can actually do its job. And it's a similar thing with the abstract machine.
In your exploration of CPython,
either as a contributor or with your work of trying to define some of the semantics behind it, What are some of the more interesting or unexpected behaviors that you've uncovered or any sort of particular quirks that you thought are worth calling out?

(33:05):
I think with CPython, I think something interesting is the difference
or lack of difference between mappings and sequences.
So
conceptually there is a difference.
At the most abstract, a sequence is just a mapping of continuous range of integers to some set of values and a mapping is just a mapping of anything to anything.
But obviously they tend to be implemented quite differently internally because sequences are often implemented in sort of linear layout of memory so it's much more efficient.

(33:33):
So the CPython virtual machine sort of distinguishes between the 2.
Python doesn't really distinguish between the 2. If you have getitem,
you're mapping or a sequence, depending on what you'll take, whether you just take a range of integers or any value.
So the CPython
has different

(33:54):
implementations for the different things, sort of like different function pointers internally.
So this sort of thing occasionally leaks out a bit.
So that's 1
example.
And I'm curious to see how things will play out, whether it ends up being ultimately
accepted as part of the core of Python itself or if it continues to be a side project for you. So we'll definitely be keeping an eye on that. But for anybody who does want to get in touch with you or follow along with the work that you're doing or contribute to this effort, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose American Gods, both the book and the TV series, where the book is by Neil Gaiman and the TV series, I think, is on Starz. Been watching the first 2 seasons that have come out so far, and I think they did a really good job of bringing the story to the screen. So I look forward to continuing that. It is not suitable for children, but it is a good story nonetheless. So definitely recommend that if you're looking for something to keep yourself entertained. And with that, I'll pass it to you, Mark. Do you have any picks this week?

(34:54):
Yeah. So first 1 is a book called Roadside Picnic. It's actually a 19 seventies Russian sci fi book. It's a little unusual
for a sci fi book.
The sci fi element is sort of rather mysterious and
as a backdrop for sort of interesting story.
The authors are 1970s Russia obviously, or the Soviet Union as it was then.

(35:18):
And there was some veiled criticism of the then Soviet Soviet Union, which was only possible in sci fi, which makes it interesting. But it's also an excellent book and well worth reading regardless of that background. And the other thing I want to recommend is something completely different. For anyone who's got a VR headset, and if you haven't you might wanna get 1 because since we're basically not allowed outside for the next few months,
A game called In Death, which is much less seminal, but good fun.

(35:42):
Thank you very much for taking the time today to join me and discuss your work of trying to provide a specification
for CPython. It's definitely an interesting project and 1 that, as I said, I'll be following closely. So thank you for your time, and I hope enjoy the rest of your day. Thank you. Goodbye.
Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com

(36:06):
for the latest on modern data management.
And visit the site of pythonpodcast.com
to subscribe to the show, sign up for the mailing list, and read the show notes.
And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com
with your story.
To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Advertise With Us

Popular Podcasts

Stuff You Should Know
24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.