Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Unknown (00:13):
Hello, and welcome to podcast dot in it. The podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes
next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node
balancers,
(00:35):
40 gigabit networking,
dedicated CPU and GPU instances, and worldwide data centers. Go to python podcast.com/linode
today. That's l I n o d e, and get a $60 credit to try out our Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.
(01:00):
For more opportunities to stay up to date, gain new skills, and learn from your peers, there are a growing number of virtual events that you can attend from the comfort and safety of your own home. Go to python podcast.com/conferences
to check out the upcoming events being offered by our partners and get registered today. Your host as usual is Tobias Macy. And today, I'm interviewing Carl Nelson about Jpipe, a language bridge that lets you use Java classes in your Python programs. So Carl, can you start by introducing yourself?
(01:26):
My name is Carl Nelson.
I'm a doctorate of electrical engineering from University of California.
I'm currently working as a senior scientist at Lawrence Livermore National Lab. I was 1 of the contributors to the gnome project
and the author of GTK Minus Minus.
I am known by my handle, Framios,
(01:47):
which happens to also be my League of Legends hand.
Yeah. It's always funny how the nom de plume in Internet communities ends up sticking with you for a long time.
Yes. It is.
And so do you remember how you first got introduced to Python?
Well, my introduction to Python is really part of a love hate relationship.
(02:07):
I find that Python language doesn't really fit my programming style. So to be honest, I've been avoiding Python for many, many years. I was mainly a c plus plus programmer who used Pearl for scripting
and MATLAB for prototyping.
But eventually,
I found that the overhead in terms of memory management
(02:28):
for c plus programming
was so much that as I drifted away from access to the hardware,
I decided that I should start working more in the Java side. So I converted all of my algorithm code that I develop at work
into Java, and I used
MATLAB primarily as a scripting language that would do both the prototyping
(02:52):
and calling the Java code as part of my test frameworks.
However, 1 day, my sponsor called up and objected to the licensing cost for MATLAB
that they had to, of course, have many seats of MATLAB to
service the software that I was delivering, and so I was asked to find an alternative. Most of the local users said that they wanted to use Python. So I basically pulled up and evaluated a whole bunch of bridge codes to try and make it so that we could use Python with
(03:26):
our Java production
libraries.
1 of them looked reasonably
just started diving in and trying
to alter anything that where there were rough edges or problems
until I got rid of all of the major seg faults. So I kind of view myself sort of like the mechanic that doesn't know how to drive because my actual Python skills are very poor.
(03:55):
And so now you're the primary contributor to the jpipe project, which has been around for quite some time, actually. And you mentioned that you had evaluated
a number of the other bridge options for
working between Python and Java. But I'm wondering if you can just start by giving a bit more an overview about what the Jpy project is and what motivated you to become such a regular contributor to it.
(04:18):
Jpype is a module in Python, which is the intent of exposing all Java packages
as Python modules.
It does this by using reflection through the Java native interface
to basically
find out whatever the capabilities
are of a Java class
and reexport them into the dictionary based Python
(04:42):
world. The whole philosophy is to be able to directly cut and paste Java code
into the Python language and just make a few syntax changes in order to basically execute
the same code that would have been done within Java.
Its primary use is to
for scientific engineering codes,
where the code is already written in Java and you wish to exercise it within Python,
(05:06):
but it's found many other applications
in terms of use.
As far as the use case, you mentioned it's used heavily in the scientific community. I'm wondering if you can
dig a bit more into
what the benefits are of being able to write Java code within a Python context and be able to call between those 2 different ecosystems.
(05:31):
For me, this really comes as a development tool for Java.
The way that I start all of my programming tests
is that I want to develop something that is gonna be used as production Java code that is gonna be shipped out to 1 of our users.
And so what I do is I take that idea,
(05:53):
and I use scipy
and NumPy
basically to flesh out and develop a prototype.
Once I have that prototype, I then go and create a whole test bench
within Python in order to exercise all of the aspects of that Java code that I wish to develop.
I then take that code and then pull it back and create all of the classes and the framework within the Java system,
(06:19):
filling out the stubs using the ability then build proxies,
which call the portions of the Python code that I'm
eventually going to make use of.
Then once I have all of the pieces in, I can then use the exact same Python test bench that I originally started,
which really gets me to my production Java code.
(06:42):
I find this is a really great advantage in my development cycle
because although
Python is a good prototyping language, the half life
of Python code, I find to be
very, very short, especially if it's being worked in active development. And so getting the benefits of having both
a prototyping language and a strongly typed language
(07:05):
working side by side is a very valuable
tool. In the use case for jpipe,
the calling direction is from the Python code into the Java. But does it also allow for embedding Python within the Java runtime so that you can go the other direction?
Yes. It does.
So, basically, any interface
(07:26):
within Java
can be implemented by a Python code.
It currently doesn't support the capability
of actually extending a class
if it's a already basically
implemented in Java, but you can take an interface and implement it in Python. And so I often use this for auditing. So
(07:47):
I wanna do a graph in the middle of my Java code, so I'll set up a Python framework
that calls the Java code. I'll put an audit
interface within
Java and says, call this
hook, and that hook then gets implemented
within the Python code.
It then stops right there, plots a graph using matplotlib,
(08:10):
and then I can basically audit and work with
Python as a development tool for Java.
There are a number of other projects
that are aiming to be able to combine the capabilities of both Java and Python with the most notable being Jython, where it's actually an implementation of Python in the Java runtime.
(08:33):
Then there are also projects such as Pygenius,
there's Jpipe, and there are a number of others.
And I'm curious what you have found to be the relative trade offs between the different options and what it was about Jpipe that made you choose that to be able to actually dedicate your time and energy into for being able to use it for your work.
(08:56):
I downloaded basically all of them and sort of worked out a table of which ones I liked and which ones I didn't. I had a number of requirements,
which is, first of all, my group has entirely made physicists
who
programming is not exactly the strongest suit.
They are really great at the concepts,
(09:18):
but not really good at the software architecture.
So I needed to find something that gave that native Python look and feel, while the same time allowed them to exercise everything that they had
within that Java world.
So
given those constraints,
I also had to find something
that was forward looking because we're gonna be using software and developing it
(09:42):
not for just 1 or 2 years, but, you know, 10 or 15 years at a time.
And so I decided at the time that I really had to find something that would work with Python 3.
So of the options that I evaluated,
obviously, Jython is 1 of the top contenders.
It takes the approach of rewriting all of Python into Java, which has the really clear advantage
(10:07):
that if you want to embed Python in a Java program
and ship it as an applet or something like that, then it would be very strong and very capable.
Unfortunately, this also introduces a large number of pitfalls.
First of all,
the Jython approach of trying to pack Python, which is famous for chewing through objects
(10:29):
with almost no regard at all,
doesn't really work well in a virtual machine that only recycles objects with the global garbage collector.
This really causes a huge speed and performance hit
that forced it so that the average CPython developer never would consider
the Java Virtual Machine implementation
(10:50):
as being a good alternative.
Also, because they tried to implement purely in Java,
they also sacrificed the CPython
API, which is the greatest value
for the scientific computing like I'm using.
So
sacrificing the thing that plays the enormous role in the success of Python
(11:13):
is a huge negative, at least for my users.
So with a huge task and not enough programmers to really pull it off, Jython has already been really stuck behind the other alternatives.
So that left them with
basically Python 2 support only.
I do understand that they're working to reboot the whole project from scratch, and I'm really looking forward to seeing that.
(11:39):
As for PyGenius,
it's primarily developed for the Android platform, and it appears to be a side project
from a much larger effort of porting Python onto the Android system.
I've worked with the guys in the past, and they are very dedicated.
But just answering all of the questions
(12:00):
on their support forums for how do you do stuff on Android
means that they don't have a lot of free energy
for getting down and delving into how do you
deal with all of the interactions between the Python and Java virtual machine.
So when I got started working with jpipey and evaluated,
I went and did a functional test of PyGenius,
(12:24):
and I hit a number of segmentation faults.
And those sort of segmentation faults would, of course, cause
my physicist user group to quickly
give up and have and declare that this has problems.
So I decided that that really wasn't the best choice.
As far as PyGenius,
(12:44):
it doesn't expose anything beyond classes, fields, and methods,
and it doesn't really have integration
into arrays and buffers, which are critical to being able to get high performance scientific code working.
The 1 alternative that you didn't mention is, of course, p y 4 j,
which is another good alternative to Jpipy.
(13:06):
Just like Jpipy, they provide the whole Java environment
in Python, but they do so using
sockets rather than going through the Java native interface.
This has the advantage, of course, that p y 4 j can
attach itself to Java and then
detach as needed, or it can be attached to multiple copies of the Java virtual machine if this is necessary.
(13:32):
Of course, the big disadvantage
of p y 4 j is that if you look at programming it, it's gonna look a lot like an RPC language
where you have to set up this bridge code known as a gateway,
pass instructions across it, and then get things back,
which again isn't going to provide that high level of integration that I was looking for in my user base.
(13:55):
Jpipe in particular has been around for quite a while. When I was looking into the project and preparing for the show, I noticed that the initial work dates all the way back from 2, 000 5 when it was on SourceForge, and it has since migrated onto GitHub. And
with you as the lead contributor, it actually has a fair amount of activity on it. I'm curious what it is about Jpipe that has allowed it to outlast so many other projects that are trying to achieve a similar goal.
(14:22):
As I view it, Java libraries being accessed from Python is sort of a niche area.
It's certainly, very prevalent in the scientific and in engineering world.
However, being a niche area, this kind of limits the number of open source developers that can do
this sort of thing, and only those people that really had to do serious work in both environments would even consider ending up as developers.
(14:47):
The other thing that is a huge barrier to getting into developing a bridge code is that it requires you to have a very high level of fluency in many different APIs
and programming schemes. So
although Python being 1 of my weakest languages,
there's a lot of things that have to go into
(15:08):
the c, the c plus plus Java,
and understanding the JNI,
all the way down into that virtual machine that really provides a limiting factor
for getting into that sort of development.
So as near as I can tell, jpipe is probably 1 of the first
out of the gate, but there have been numerous efforts made by scientific organizations to try and continue to address this need because there've been
(15:36):
all of the ones previous have been pretty deficient.
For example,
there is 1 called JEP,
which was developed
basically to do the reverse where you're trying to be able to call Python
from within the Java environment.
But like all of these projects,
they start out with some initial amount of funding and initial amount of effort,
(15:58):
and, eventually, they just don't have enough programmers that can maintain interest.
Technologically,
trying to merge 2 virtual machines together,
without building a dedicated virtual machine
like Grails VM
is an enormous task.
And,
therefore, all of these other projects basically get to alpha quality,
(16:20):
where they're capable of doing
a reasonable amount of things, but there's so many rough edges where you're going to fall off into
memory leaks
or into segmentation faults.
And as both Python or Java are very active communities,
eventually, any project that is not being maintained at a pretty high level is just going to fall behind and become rather useless.
(16:46):
The key advantage of Jpipy
is that it is pretty darn small in scope.
The whole API is less than 20 primitive classes
and about the same number of derived
classes and support functions,
but it takes close to 1, 500
different unit tests just to exercise all of the behaviors because you have so much
(17:11):
different behaviors that exist behind the scene with the interactions with Java.
So
Jpipe being
limited in scope,
not really having the ability to connect and disconnect the JVM,
and already having an extensive testing framework, I had a lot of good material to build on. I would really like to thank the
(17:37):
the constant pressure from my local users group
at Lawrence Kniffermer National Lab that have really pushed me to try and bring this code up to production quality
so that it can attract not just people in the scientific and engineering,
but also people who would like to use Jpipe before other applications.
Digging more into how Jpipe itself is implemented, I'm wondering if you can talk through some of the ways that it has evolved and in particular, some of the updates that you've had to make to be able to bring it into
(18:10):
full support of Python 3 and more recent versions of Java?
The original author
basically laid out all of his thoughts in a blog post.
So he wanted to construct this really large framework,
which would be able to implement
both bridges from Java into Python, but also into Ruby.
(18:31):
So he abstracted the entire API,
unifying
Python, Ruby,
and JNI.
But this resulted in the code that was very, very large
and exceptionally hard to debug.
So it implemented all of the c Python layer using capsules,
and then it limited the total communications
(18:53):
through those capsules
to only a very small number of calls
that are passing back and forth
between these interfaces below.
And since he said that he wasn't really interested in working
in a c programming language while implementing a Python module,
he did everything he could
(19:15):
on the Python side, which meant that all of these interactions going through jni, which is already not the fastest of interface,
while implementing them from within the Python native code. And so the result was really, really slow and nearly impossible for anyone to get under the hood.
So my main contribution
(19:35):
was not really working on the front end, but working solely on the back end.
So I took all of the code,
read through it, figured out what all of these different layers were. And
for the most part, I was actually a negative contributor,
meaning I ripped out more code than I added
(19:56):
for about the first 2 years that I was working on the project.
Once I had ripped out all of that code and got down to the fundamentals
of what was underneath,
I could figure out all of the pieces
and then
hold back all of that interface
and identified,
here's all of the speed
(20:17):
critical pieces that I could then move and create these primitives in CPython,
which then could be used to implement.
The most critical thing in doing all of that work was, of course, dropping the Python 2 interface.
So the way that this operates is every time that you encounter a class, you're going to need to generate a dynamic
(20:40):
Python class that's going to represent all of the things that are in that
Java class that you're trying to expose.
And so
what that's going to do is
force
you to get really deep into the Python object model.
The way that they were doing it with the old Capsule system is they implemented everything
(21:02):
in a way that both Python
and 2 and Python 3 could create these objects,
but it required 3 layers,
Python object modeling
just to be able to glue it all together.
And so
when we dropped that Python 2 support,
I was then able to go and
(21:23):
create all of the primitives that represent each of the Java objects
and improve the speed
of most operations by about a factor of 10 to 400,
depending on which operation was being performed.
That also allowed me to go and add all of things like the direct buffering support, which allows you to directly
(21:44):
interact between Java
and scientific libraries like NumPy.
And in terms of being able to keep the project up to date, how do the relative versions of the Python runtime and the Java runtime influence the amount of effort involved in keeping it up to date and what the compatibility matrix looks like?
(22:05):
From the perspective of the user, there is no real difference for any version of Python after 3.6
nor any Java version after Java 8.
Internally, though, it's a pretty massive challenge.
So
as I mentioned previously,
Python 2 was a huge stumbling block for jpipy, and thus, I celebrated January 1, 2020
(22:30):
in a way that few other people would be able to understand.
I've got to get all the way down deep, deep into the guts of the CPython
system.
And often I discover down at that level, there is no real integration
for these internal private structures.
So I've got to basically do something
(22:51):
that is only done in the native CPython
API
in order to get my work done.
Let's take a typical example.
Python
lets you create an int type, and that int type has an API for creating
a new int.
When you're working in native Python, you can, of course, take and drive another class
(23:15):
from the type int and add something to it.
But when you get into
the CPython
API, you'll find that there's nothing like that
at all. So
when you want to create an int, what it does is it creates an int
in of the original base type,
then it copies
(23:36):
that base type int into the derived type memory space, which is then going to take a whole bunch of additional time as well as creating additional objects,
which is gonna really hit that object
limitation.
And since I've gotta go down into the guts to create basically new types
for all of the Python primitives to represent each of the corresponding primitives in Java,
(24:01):
I've gotta go down there to the level being able to say, add a native stack frame
into Python that really doesn't have any API at all.
The same thing can happen in terms of Java.
So
I started with Java 7 when I took over the project,
but
(24:22):
since that time, Java 7, fortunately, has gone away
to where Java 8 became the mainstream, and it, of course, had really the best
at the time in terms of capabilities because they were introducing a lot of really important and vital functions.
But then they took a massive shift when they went over to the module system in Java 9.
(24:44):
And so you have to span both the current last
long term support, which is Java 8, and run out to Java 15.
So the way this works is because
a good portion of jpipy is actually written in Java
is that, internally,
we have a Java 8 library,
(25:05):
which then calls
using reflection to ask, is this Java class that's available in the later version of Java available? And if so, grab it, load it, and then use its methods
only using their names.
And this is, of course, not going to create the bastion of clarity that I really like in the software that I'm developing.
(25:27):
In terms of the
impedance mismatch between the Java object model and the Python object model, what are some of the difficult edge cases that you've had to work around or
particular limitations that you just weren't able to overcome and you just have to call out to users of Jpipe that this particular operation isn't possible?
(25:48):
So there are 2 places where
there's a lot of difficulty.
So 1 of them, of course,
is the problem
of being able to extend objects
as well as starting and stopping the JVM.
When you start the JVM, it's basically going to completely marry
the Python virtual machine to the Java virtual machine,
(26:10):
and it's going to create a whole bunch of basically pointers that are gonna be accessed between the 2.
And so if you ever tell it to say stop the Java Virtual Machine and you wanna continue using Python,
all of those pointers are gonna become stale.
This often leads to, you know, pitfalls
as far as the Java module failing to
(26:34):
or the Python
hitting things where they are no longer existing and having to go into error and fault handling routines.
Of course, the biggest difficulty with operating something where you've got these 2 virtual machines is gonna be memory management.
Both Python and Java independently
manage their memory spaces and are garbage collected languages,
(26:57):
and neither knows about the other. So
whenever you're working with a foreign language
within an interface,
you have to have some way of holding it alive for the purposes of garbage collection.
So both Python and Java use basically a reference counting system.
And so when you ask for a object to be held alive, you put a reference for it.
(27:23):
This kind of becomes a problem though when you have a Python object,
which is pointing to a Java object, which is in fact holding another Python object.
Because if you ever create something where it's a circular reference,
those 2 languages both have references
going forward, and you now have this irresolvable
reference loop.
(27:44):
You can, of course, do things like adding weak references,
which both language support, but this doesn't actually hold things in memory, which can create other problems.
And
so the ultimate limitations of the jpipy
will always be
having 2 virtual machines
means that a user, through a series of fairly simple calls,
(28:06):
can create basically very bad memory loops.
For somebody who is actually using Jpipe to create a project, I'm wondering if you can just talk through the overall work flow and the software life cycle for that project,
particularly things like managing dependencies and packaging and distribution
across those 2 different language communities?
(28:29):
Jpype itself is a Python installable module that's available through PyPI
or can be downloaded using distributions such as Anaconda.
The only requirement that it has is that there'd be a working copy of the JRE,
which is typically pointed to using the environment variable Java home.
When distributing
(28:50):
Jpip as a module
or a module that's using Jpipy, I typically recommend that people use something like Ivy or Maven
to pull in
the Java library
and include it in the Python package, which is going to then be shipped and installed
into the site packages as part of the startup.
(29:12):
There are some restrictions that are gonna, of course, happen, which is often people write what I refer to as a heavy wrapper
of a Java class.
That's where somebody goes and implements
a Python module, which just uses Jpype as the back end, and they export
(29:32):
the interface
in their own native
Python format.
But this, of course, has the problem that you can only start 1 JVM,
and you can only start JVM's 1 time.
And there's some restrictions about how,
when you start the JVM,
that all of the jars
that have been loaded at that time
(29:53):
have special preferences.
So
to try and get around this restriction, I've been working on a custom class loader that should be able to make it so that you can actually load JARs
after a JVM has already been started.
Aside from things like the memory loops and some of
the particular limitations
(30:14):
of how and when to start the JVM, are there any other potential pitfalls or sharp edges that users of J Pipe should be aware of?
No. I think that we've covered most of them on that front.
The most common being people trying
to start and stop the JVM and expecting objects to continue working past the lifespan is usually the most
(30:36):
often 1 that we see.
And as far as
the actual uses of jpipe, I'm curious what you have found to be some of the most interesting or innovative or unexpected ways that it's been employed.
On that, I'm not really a great expert.
I do see that there's a lot of projects that use basically JDBC,
(30:56):
as well as, of course, the number of scientific codes,
engineering codes that made uses of it.
Oddly, I do see a lot of works of East Asian chatbot client.
Unfortunately, not being able to read east Asian languages, I've never really looked into see what they're doing.
The 1 project that I'm most pleased to see is that a user took the jpipe API,
(31:20):
and they built a stub generator that allows you to use Python IDEs
that they can pull a Java package,
expose the entire module
to the IDE.
It was really that sort of contribution that inspired me to basically write a parser
Java doc and turn it into Python docstrings
(31:43):
so that we could make
the concept of the lightweight wrapper
in which the Java package is actually considered to be
a Python module.
Another major area where Java is popular is in the Android ecosystem,
and I was noticing that Jpipe does also have capabilities for being able to interact with the Android ecosystem. So I'm wondering if you can
(32:09):
talk through some of the ways that that manifests
and what's involved in using Jpipe with Android to be able to take advantage of some of the capabilities of Python on that operating system.
Jpipe just started working on the Android system.
So I was working with the folks over in KIV and
PyGenius.
(32:29):
I will confess that after my initial evaluation of them about 3 years ago,
I completely forgot that PY Genius existed.
And so I was really shocked when I was sitting there looking for
similar bridge codes and to find out how that they were working 3 years on. I discovered that it was actually an active project and that it was actually a big competitor of mine because people were trying to use scientific codes using the PyGenius
(32:58):
system.
So realizing that we're basically
2 sets of developers
attacking the same
sort of platform,
I decided that it would be a good idea to meet up with them and see if we can come up with a way to get J PIPE to cover their needs.
So I worked out all the patterns,
(33:20):
basically, to use their building system
to create and demo
that jpipy can work within the KIV remote shell.
And then I turned all of that work over to the KIV developers
to try and get it integrated back into their distribution.
It's currently waiting for actions on their side, which as I mentioned before, they have a very large project they're trying to maintain.
(33:46):
And so they haven't really gotten into that distribution so much.
But when it does get in place, it will be a drop in replacement
for the current
PyGenius.
There are, of course, some minor differences in how the object conversion works
because
in jpipy, whenever you hand something to a method, it will always return a Java type,
(34:13):
which basically duck types to the nearest thing in Python.
In PyGenius,
they're going to
convert everything
on the incoming, meaning you have to pass in a Python array, and you get a Python array out the other side, which leads to a lot of conversion overhead.
But with jpipey on the Android platform, I think we're gonna be able to do basically all the same things that we do in terms of the scientific and direct
(34:42):
memory and buffer transfers that we did in the jpipy
on the regular PC
and provide that same support on the Android.
Yeah. I know that Pygenius has always been 1 of the sharp edges in the Kivy ecosystem
for being able to build projects for Android. So it's so great that you've been able to help in that regard and take all the work that you've put into J Pipe and help to translate it for them to be able to take advantage of it and reduce some of the overall maintenance burden.
(35:13):
That is exactly why, I thought that I could do a big contribution to that project.
In terms of your experience
of working with J Pipe and
doing maintenance on it and helping your end users to leverage it for being able to call between the Python and Java ecosystems,
I'm wondering what you have found to be some of the most interesting or challenging or unexpected aspects of that work.
(35:38):
So, obviously, the most challenging thing that I have in terms of working with Jpipe is
trying to work around the fact that neither Java nor Python have adequate hooks. So
simple things like providing a closure slot when you pass an object through a native interface and getting back that in extra information
(35:59):
provide is a real challenge to doing the sort of development that I've been doing.
So you end up being forced to do suboptimal solutions like static variables
and maps.
But each of those different solutions creates its own problems in terms of things like thread handling
and map cleanup.
So I often have to work many times taking a shot over and over and over
(36:22):
in order to develop
and come up with an elegant workaround
for, these sort of internal processes.
So I would really prefer, of course, that language developers would future proof their APIs so that developers like me don't have to
be creative
as I put it. But perhaps the biggest challenge that I had was just trying to basically add a Java slot
(36:47):
to Python objects.
So as you probably would are aware, whenever
you work with Python, it has these concepts of dedicated slots
so that you can basically add
something through and get order 1 access
to that whenever it's needed, especially if it's being used in something like a tight loop.
(37:09):
So I needed to add a Java slot to basically
every single 1 of the Python primitive
types, but they all have different memory layouts,
and they don't allow
you to arbitrarily
just add a user slot on top of something that has a different memory layout.
So Python does provide for the concept of multiple inheritance within the native Python interface,
(37:34):
but that's really all just a trick because they've already dedicated
a slot to be able to add a dictionary,
and that dictionary is then the only thing that you can
add to a class
without causing an additional conflict.
So the way that I had to deal with that is I had to create a custom memory allocator
(37:55):
that adds extra space after the end of every Python object that needs to also be shared with Java.
And this, of course, took a 2 month long nightmare
just resolving all of the edge cases of adding a new allocator into the Python system.
It's unfortunately this limitation
(38:15):
that has made it so that I can't get jpipey to work in the pypy
environment because
I've never managed to get that debugging of all of those edge cases
for this extra memory that I need to add on the end.
Obviously, that's a very challenging problem and something that I'm very glad to have solved.
(38:37):
Because of the fact that you're working at such a deep level with the Python language and runtime, I'm curious if there have been any opportunities for you to contribute changes upstream to the CPython runtime to help improve the overall capabilities of the language, particularly for the use cases that you are working on?
(38:57):
Unfortunately, I don't really have a lot of visibility
into
the CPython
development cycle.
I do see the people's names and so forth, and I've tried to reach out on IRC and other communication channels, but it seems really opaque to me as a developer.
So I end up writing out to user groups and and so forth, and I I never get anything back as far as
(39:23):
how to solve or fix these problems.
I also have a big difficulty that
even if they did solve the problem for me today,
I got to be able to work with much older versions
of Python
than what is currently in the development cycle. And so
that means that I'm going to be forced to deal with the problem even if I did manage to get the developer support
(39:47):
to add new hooks in tomorrow.
For
anybody who is considering using J pipe in their own projects, what are the cases where it's the wrong choice and they might be better served with 1 of the other language bridges or just going a completely different architectural route?
JPIP, of course, is going to have 2 virtual machines. So this is twice the overhead, which is, of course,
(40:10):
a big impediment in terms of speed
and in terms of resources that you're going to end up using.
So I would say any project that already has access to a high quality Python library
that provides the same capabilities as the Java 1 that you're considering
using, you'd be much better off avoiding
(40:31):
adding Java in simply because it can be done. The other thing, of course, is that since you're going to be marrying 2 machines
at the process level, you can't stop 1 without harming the other.
And so anything in which you want to use
Java as a slave system
(40:51):
rather than directly integrating them, then jpypey is probably not the best choice, and you'd probably should go with the p y 4 j.
This, of course, leads to the most frequent problem that I get of having a project
with this sort of long standing
history
is that we have a huge pile of bad reviews.
(41:12):
The most common way that this comes about is that someone tries to use
jpiping in a way that is not appropriate
or directly called out in our limitations
and that have been ignored.
And when they do so, they end up with not getting a very satisfactory
result, and they feel the immediate need to go out and say, oh, you know, go use the other alternative because jpipe is just crap.
(41:37):
And as you continue to work on jpipe and continue to support your users, what are some of the plans that you have for the future of the project, either in terms of
quality improvements or new features or just paying down technical debt?
I really have 2 different directions that I am pushing for than future versions of j piping.
(41:58):
The first is to try and press forward on this heavy versus light wrapping.
So the majority of the projects that put out Python modules
are really heavy in that they go and they hide the use of jpypy,
and they have to wrap a completely different API,
which is, of course, costly in terms of effort.
(42:19):
And these heavy wrappers tend to be both incompatible with each other
as well as not nearly as complete as the underlying Java API that is already being provided.
So what I'm really looking to do here is to add more customization
to the process
where you can actually store
(42:40):
a Python customization
class in the Java jar
so that the Java jar becomes the actual Python distribution
that allows you to use all of the different things within that Java package.
So, basically,
each Python customer can say, do things like rename methods, change the method resolution order, or add new Python specific idioms
(43:05):
into an existing Java class.
What my hope is that this will allow for people to actually just distribute the light wrapper
in which you just distribute Java JARs that are usable by both
Java and Python developers
without the need for rewriting the whole API.
2nd, I'm working on a complete reverse bridge in which you can allow Java to use Python
(43:31):
as easily as it currently
allows Python to use Java.
So to achieve this, I'm going to basically implement a code extractor,
which pulls all of the Python protocols from the objects that it encounters
and creates customized
stub classes
that exist within Java.
There will then be a Java library that contains all these stubs.
(43:55):
And whenever it encounters a Python class, it's going to create a mix in in the form of different Java interfaces that represent the capabilities of any individual object.
It will then use Java ASM in order to dynamically create this new
Java wrapper, which will be treated just like a native Java object
(44:18):
and thus allow
things like scipy and numpy to be used completely within
the Java environment.
Obviously, this has a lot of work because we have concepts like keywords
and the dynamic nature of Python
that is hard to represent within the Java framework,
but I'm confident that this can be achieved.
(44:40):
I see this as being a way to get great dividends in the scientific community
by being able to reuse and share code between the 2.
In particular,
the use case of being able to bridge in both directions
for Java and things like SciPy and NumPy. I'm wondering if you have also looked into being able to integrate the use of the arrow format
(45:04):
for easy
exchange and interoperability
of in memory data structures?
I haven't looked so much at the Arrow format,
though I have done some work with the Google protocol buffers,
which I use very often for my inter process communication.
But
the key thing with, JPipy is it's going to be entirely written
(45:28):
using the JNI interface. And JNI interface, as far as I know, it doesn't have a lot of ability to interact with the arrow.
Are there any other aspects of the work that you've done on J pipe or the ways that it's
(45:49):
Well, I
Well, I would certainly like
some help as far as, getting some of these hooks
within the CPython
interface,
improved.
It would make a great deal of difference both
if there were hooks within Python or within the JVM itself.
I am have not looked yet at GravelVM,
(46:10):
and I understand that that may have provide a lot more of capabilities
of mixing the 2 languages together, and I'm looking forward to seeing the progress that it makes.
Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose going out for a hike. It's just a great way to spend some time and relax and get some of the benefits of being in the outdoors. And for that, there are actually a couple of apps
(46:43):
that I've been using for finding good hiking trails nearby. So 1 of them is AllTrails,
and another 1 I just found is called the hiking project, which is nice because it allows for downloading the maps to be usable offline so that if you don't have cell service, you can still find your way around.
And so, yeah, definitely recommend checking those out and finding some time to get outdoors. And so with that, I'll pass it to you, Carl. Do you have any picks this week?
(47:06):
I don't really have much in the way of that front.
My only thing is I am, of course, an avid gamer, and I really enjoy
fighting it out on summoner's rift, as I mentioned in the past.
Unfortunately,
I am once again stuck in silver, and that means that I will, of course, need some help. Now being like all good library maintainers, I only play support,
(47:29):
and this means that, well, I'm stuck with the usual.
Well, I appreciate you taking the time today to join me and discuss the work that you've been doing with J Pipe and the ways that it's being used. It's definitely a very interesting project, and it's great to see that it has managed to continue
velocity
and stay up to date. So I appreciate all of the time and effort you've put into that, and I hope you enjoy the rest of your day. Thank you very much.
(47:57):
Thank you for listening. Don't forget to check out our other show, the Data Engineering
podcast@dataengineeringpodcast.com
for the latest on modern data management.
And visit the site of pythonpodcast.com
to subscribe to the show, sign up for the mailing list, and read the show notes.
And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com
(48:18):
with your story.
To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.