I am trying out PythonCall.jl and I am loving it so far. However, it seems that when using some Python libraries that call C/C++ under the hood (for example NumPy) Python objects created with PythonCall.jl are never automatically collected from memory. For example:
pkg> activate --temp
pkg> add PythonCall, CondaPkg
julia> using CondaPkg
pkg> conda pip_add numpy
julia> using PythonCall
julia> np = pyimport("numpy")
julia> function createarray()
a = np.random.rand(10000,10000)
return nothing
end
julia> createarray() # memory usage increases
julia> createarray() # memory usage increases even more
julia> createarray() # memory usage keeps increasing
# ...
# if you keep calling createarray(), the memory usage keeps going up
# until eventually it fills up and the julia process is killed by the OS
Inserting manual calls to GC.gc() and/or PythonCall.pydel!(object) helps releasing the memory, but force the user to do “manual memory management” (i.e. having to keep track of when specific Python objects stop being needed and inserting appropriate function calls). Failing to do so correctly quickly lead to memory issues if you are dealing with somewhat large data structures, complicating the usage of Python libraries from Julia. This behavior is not unique to NumPy, as I noticed it also when trying to use the OR-Tools library.
EDIT: Please ignore the following example, it was meant to be about Python lists, but I just realized I was creating a Julia Vector instead, and not even correctly. I am sorry about that . Everything else should still be valid. END OF EDIT
In constrast, Python lists do not seem to suffer from the same problem:
julia> random = pyimport("random")
julia> function createlist()
a = [[random.random() for _ in 10000] for _ in 10000]
return nothing
end
julia> createlist()
julia> createlist() # memory usage stays constant
I hoped that while using PythonCall.jl we could still have take advantage of the automatic memory management that makes languages like Python and Julia so convenient to use.
My question is: is it expected behavior that some Python objects (like NumPy arrays) are never collected unless the user manually does it? Or am I using the package wrong somehow?
It’s been a while since I used PythonCall (and I really used it the other way with juliacall), and I don’t pretend I know how Julia, Python, or PythonCall works on any deep level. But considering the basics:
CPython counts references to promptly free non-cyclic garbage, which is most of it in programs with numerical arrays.
CPython and Julia have tracing GCs triggered by allocation thresholds. I don’t know the details of how the number and sizes of allocations are considered, and the existing documentation is not necessarily up to date.
In PythonCall, Julia wrapper objects keep Python objects alive by incrementing reference counts. Julia’s GC freeing the wrapper objects decrements the counts.
In juliacall, Python wrapper objects index a global vector referencing the Julia objects. Python’s reference-counting or GC freeing the wrapper objects kicks the Julia objects out of the vector.
We have a plausible explanation. In createarray, you make 1 small PyArray wrapper of a huge 10^8 NumPy matrix. The Julia side didn’t allocate the matrix, so it has no idea how big it is. The repeated calls discarded a handful of wrappers at most, not enough to spur Julia’s GC. Underinformed GC delays are unfortunately very familiar for interop in general. I can’t rule out some interaction with recent changes to CPython’s reference counting and GC, but I doubt downgrading Python helps.
In createlist, random.random() -> float is converted to Float64, so each float value imminently loses all references. You could make a version that makes Python lists, I’d expect it to run into the same issue but with PyList.
Thank you very much for your informative reply, what you are saying makes a lot of sense to me and is very helpful.
If your explanation is right (and I understood it correctly), then I would expect this problem to appear in general anytime you are trying to use a number of large Python objects, where “a number” and “large” means enough to fill up your memory before the corresponding PyArray wrappers can trigger Julia’s GC. Essentially, if you are using large Python objects, you cannot rely on them being garbage collected appropriately, because Julia actually “thinks they are small objects” (because it only sees the corresponding wrapper).
I also imagine that if Python’s allocations could somehow contribute to reaching the threshold needed for triggering Julia’s GC, then this problem would not happen. Do you think some sort of solution on these lines could exist, or is it technically not feasible?
Indeed, this version with Python lists has the same problem
julia> @pyexec """
global random
import random
def createlistp():
return [random.random() for _ in range(10000000)]
""" => createlistp
julia> function createlistj()
a = createlistp()
return nothing
end
julia> createlistj()
julia> createlistj()
# ... and so on, memory usage always goes up
# but manual call to GC.gc() works
while, interestingly, a version that calls directly PyList does not
julia> function createlistj2()
a = PyList([rand() for _ in 1:10000000])
return nothing
end
julia> createlistj2()
julia> createlistj2()
# ... and so on, memory usage stays roughly constant
I have a similar issue with C++ objects (High memory usage when creating lots of temporary C++ objects) and my explanation is the same as @Benny’s. Unawareness of the Julia runtime of the actual size of foreign objects seems to be the root issue here.
Unfortunately, there seems to be no solution for the general case, only manually triggering GC every so often.
[rand() for _ in 1:10000000] first makes a 10^8 Julia vector, then PyList converts it to an equivalent Python list to be wrapped. While createarray only makes a small PyArray as garbage, createlistj2 makes a PyList and that giant Julia vector as garbage. The extra memory pressure’s benefits are consistent with existing developer documentation of Julia’s GC managing heap size.
That’s my understanding as well. Finer, proactive manual management with PythonCall.pydel!(wrapper) also saves some work by not triggering the Julia GC cycle like Base.GC.gc(). The similar-looking PythonCall.GC.gc() only works on queued PyPtrs from finalized or GCed Julia wrappers in threads that weren’t holding CPython’s GIL, so it alone won’t help these cases with unfinalized Julia garbage in the main thread holding the GIL.
Something along these lines do exist, but the more I look the more insurmountable the barriers appear. The main problem is GCs don’t have stable and deep interfaces, so it’s extremely hard to coordinate heaps and GCs between 2 runtimes, let alone multiple runtimes and languages. The implementations aren’t even stable; there has been work to use third-party GCs for Julia, and as I mentioned before, CPython’s GC became incremental in the latest release. PythonCall’s implementation makes many assumptions about both languages’ GCs, and it would likely break if those assumptions stop holding.
So, coordination has been very limited. Some shared libraries allow their heap allocation routines to be replaced, so Julia can insert its allocators to put those objects on the Julia heap; this is how Julia’s arbitrary precision integers work. If you’re writing your own C to begin with, you could use Julia’s allocators. It’s hypothetically possible to build CPython or Julia to share allocators, but I have low confidence because the two GCs don’t agree on collection thresholds, and I’m not sure if PythonCall can handle 2 tracing GCs that share a heap but don’t trace from both languages.
An obvious possibility is for wrapper objects to indicate how big the underlying objects are, but it’s hard to keep that size accurate and updated across chains of pointers, dynamic resizing, wrapper copies, and the separate languages’ threads. Julia avoided the overheads and difficult multithreaded coordination of reference counting; a heap size tally would be much worse.
Given that it looks like there is no general solution on the horizon, I tried using PythonCall.pydel!(wrapper) in my original use case which motivated this post.
I want to solve a series of constraint satisfaction problems using the OR-Tools Python library, but my code is plagued by the problem discussed in this thread. I would like to solve it with pydel! instead of triggering GC.gc because I need to avoid any unnecessary slowdown.
I tried really hard to make it work, but without success. Any idea that could help making pydel! actually release the memory would be greatly greatly appreciated, as I am a bit hopeless at the moment.
MWE
julia> using CondaPkg
pkg> conda pip_add ortools
julia> using PythonCall
julia> cp_model = pyimport("ortools.sat.python.cp_model")
julia> const allowed_assignments = [rand(0:1, 100) for _ in 1:200000];
julia> function createmodel()
model = cp_model.CpModel()
# add variables to the model
variables = [model.new_int_var(0,1, "x_" * string(i)) for i in 1:100]
# add list of possible assignments to the variables
model.add_allowed_assignments(variables, allowed_assignments)
# Omit the solving phase
return nothing
end
Repeatedly calling createmodel leads to Out of Memory events. In contrast, if I add a manual call Gc.gc inside the function, the memory usage stays roughly constant:
julia> function createmodel_gc()
model = cp_model.CpModel()
variables = [model.new_int_var(0,1, "x_" * string(i)) for i in 1:100]
model.add_allowed_assignments(variables, allowed_assignments)
# trigger garbage collection
GC.gc()
GC.gc()
return nothing
end
This makes me think that probably there are some wrapper objects that are too small to trigger the Julia GC, so the large Python objects that they reference are also never collected. My goal is to have a version of createmodel that avoids Out of Memory events when called repeatedly, through the usage of pydel! instead of GC.gc.
My best (but still failing) attempt at a solution with pydel!
I tried to identify all the places where a Python object wrapper was created in createmodel, keep a reference to them and then call pydel! for each one of them.
julia> function createmodel_pydel()
# the following comments identify where Python wrappers were being created
# model
model = cp_model.CpModel()
# each element of variables
variables = [model.new_int_var(0,1, "x_" * string(i)) for i in 1:100]
# both variables and allowed_assignments were being automatically converted to Py when passed as arguments to add_allowed_assignments
py_variables = Py(variables)
py_allowed_assignments = Py(allowed_assignments)
# the object that is returned by add_allowed assignments
constraint = model.add_allowed_assignments(py_variables, py_allowed_assignments)
for v in variables PythonCall.pydel!(v) end
PythonCall.pydel!(py_variables)
PythonCall.pydel!(py_allowed_assignments)
PythonCall.pydel!(constraint)
PythonCall.pydel!(model)
return nothing
end
Expected behavior: createmodel_pydel keeps the memory usage roughly constant when called repeatedly.
Actual behavior: createmodel_pydel behaves similarly to createmodel
pydel! should be able to solve the problem in principle
If we add a call to pydel! in the MWE from the first post of this thread, we see that the problem disappears and repeatedly calling createarray keeps the memory usage constant.
julia> function createarray()
a = np.random.rand(10000,10000)
PythonCall.pydel!(a)
return nothing
end
By eye it does indeed look like those pydel!s should free all the python objects you create in that function. It’s possible that there might be some intermediate objects created somewhere that are not being freed, maybe in Py(...).
Actually, one source of intermediate values is in converting arguments of python functions (like "x_1") to python objects. If you do this yourself for all of the arguments and pydel! them yourself afterwards, this might help?
You could also try using @py to rewrite that function body. It inserts pydel! into the generated code for you, though possibly not everywhere.
First an apology for refusing the answer acceptance, but I don’t believe it’s beneficial to do that to a thread if your problem isn’t fully resolved. It’ll just get less promoted in the feed, and that would happen to an idle unresolved thread with time anyway.
Before this particular call returns, the variables model and variables are still live, so the GC will not affect any Julia objects directly or indirectly referenced by them, and PythonCall will keep the wrapped Python objects alive. By the time the function returns and the variables go out of scope, the referenced objects have to wait until the next call’s GC.gc() to be collected. That’ll still get everything over repeated calls, but be aware that GC.gc() usually occurs between big calls, not in them.
Now I’ll try to micromanage the pydel! attempt:
julia> function createmodel_pydel()
model = cp_model.CpModel()
# model = auto Py wrapper
# CpModel instantiation on Python side
# each element of variables
variables = [model.new_int_var(0,1, "x_" * string(i)) for i in 1:100]
# variables = Julia Vector of 100 auto Py wrappers
# Py converts Julia 0, 1, and String to Python 0, 1, and str, no wrapping
# new_int_var instantiates IntVar on Python side
## ! 3 new variables compared to createmodel
py_variables = Py(variables)
# py_variables = manual Py wrapper
# instantiates juliacall.VectorValue on Python side to wrap variables Vector, Vector referenced by global cache
# indexing on Python side unwraps Py element for existing IntVar
py_allowed_assignments = Py(allowed_assignments)
# py_allowed_assignments = manual Py wrapper
# instantiates VectorValue on Python side to wrap global allowed_assignments Vector, Vector referenced by global cache
# indexing on Python side wraps inner 100-Vector in another VectorValue, Vector referenced by global cache
# inner indexing converts Int to int, no wrapping
constraint = model.add_allowed_assignments(py_variables, py_allowed_assignments)
# model = auto Py wrapper
# add_allowed_assignments instantiates a Constraint on Python side
## pydel! decrements reference count, nulls and caches Py wrappers for future auto
for v in variables PythonCall.pydel!(v) end
# Python possibly frees IntVar
# if so, py_variables VectorValue would be wrapping a Vector of nulled Py
PythonCall.pydel!(py_variables)
PythonCall.pydel!(py_allowed_assignments)
# Constraint likely still references these VectorValues
PythonCall.pydel!(constraint)
# CpModel likely still references this Constraint
PythonCall.pydel!(model)
# Python possibly frees CpModel, then Constraint, then VectorValues
# variables and allowed_assignment vectors removed from global cache
return nothing
# variables Vector becomes garbage, its 100 nulled Py wrappers are not
# this function reuses 102 cached Py wrappers at most, nulls&caches 104 Py wrappers
end
There are some not-great things there, but I don’t see an explanation for why your pydel! attempt didn’t work. It appears to disconnect all the Python wrappers on the Julia side, and the CPython GC is free to do its work.
This function alone is expected to shuffle 102 Py wrappers from PYNULL_CACHE and add 2 each call. That’s technically memory growth, but it’s so small it wouldn’t be noticeable on modern systems for millions of calls, let alone run out of memory.
I would discourage so many wrappers, some zigzagging between the languages, because it tends to force extra unwrapping and wrapper instantiation beyond what you wrote, but that’s not a likely explanation because GC.gc() worked, albeit staggered across 2 calls.
The only Julia garbage I see is a Vector with 100 references, and it should be completely disconnected from the Python side. 800 bytes and some change shouldn’t be a problem for the Julia GC.
I’m suspecting that the CpModel, Constraint, etc on the Python side aren’t readily freed on the Python side like NumPy arrays are, but frankly I can’t figure that out from the documentation. Before you hit the error, there are options for checking memory usage without the GC, though I don’t really know how to do it well.
Julia internals (can disappear in future versions):
Base.gc_live_bytes() adds the size of live objects after the last GC cycle and the number of bytes allocated since. That’s not the size of the occupied heap exactly because we have to consider the freed gaps, but I’m not sure what in Base or Sys is appropriate for that.
I ran across Sys.maxrss() giving the maximum resident set size, the portion of physical RAM held by the process. Again, not sure this is an “occupied heap” measurement, but it sounds like it would increase if any memory usage is blowing up.
CPython:
The builtin tracemalloc library can be used to get snapshots of the private Python heap. This does not include allocations by other languages in the same process, so it’d miss a NumPy buffer for example.
Third-party psutil package tracks some memory stats. The resident set size is roughly psutil.Process(os.getpid()).memory_info().rss.
PythonCall internals:
PythonCall.Core.PYNULL_CACHE references nulled Py wrappers for future use
PythonCall.JlWrap.Cjl.PYJLVALUES references Julia objects that are wrapped by the Python side, freeing those objects make some elements nothing, indicated by the PYJLFREEVALUES cache.
Both suggestions do not seem to solve the problem. In both cases, I still run out of memory when I run the function repeatedly.
Here I tried to convert every function call argument, including numbers and strings, and pydel! them myself:
function createmodel_convertall()
# create model
model = cp_model.CpModel()
# create variables
py_zero = pyint(0)
py_one = pyint(1)
variable_names = [pystr("x_" * string(i)) for i in 1:100]
variables = [model.new_int_var(py_zero, py_one, variable_names[i]) for i in 1:100]
# create constraint
py_variables = Py(variables)
py_allowed_assignments = Py(allowed_assignments)
constraint = model.add_allowed_assignments(py_variables, py_allowed_assignments)
# pydel! every wrapper
PythonCall.pydel!(py_zero)
PythonCall.pydel!(py_one)
for vn in variable_names PythonCall.pydel!(vn) end
for v in variables PythonCall.pydel!(v) end
PythonCall.pydel!(py_variables)
PythonCall.pydel!(py_allowed_assignments)
PythonCall.pydel!(constraint)
PythonCall.pydel!(model)
return nothing
end
Here I tried to use @py:
function createmodel_pymacro()
@py begin
# create model
model = cp_model.CpModel()
#create variables
variables = []
for i in range(100)
variables.append(model.new_int_var(0,1,"x"+str(i)))
end
# create constraint
model.add_allowed_assignments(variables, @jl(allowed_assignments))
None
end
end
Your analysis is very clear and corresponds the mental model I had when I wrote the code.
I think this is not true. I have found an example where the pydel! approach actually works and where, interestingly, a small and seemingly meaningless modification makes it not work anymore.
gc = pyimport("gc")
np = pyimport("numpy")
cp_model = pyimport("ortools.sat.python.cp_model")
# the allowed assignments are contained in a numpy matrix, instead of a Vector of Vector.
# OR-Tools accepts both.
const allowed_assignments_np = np.asarray(rand(0:1,200000,100))
# pure Python function that creates and returns the variables
@pyexec"""
def createvariables(model):
return [model.new_int_var(0,1,"x_"+str(i)) for i in range(100)]
""" => createvariables
# pure Python function that creates and returns the constraint
@pyexec"""
def createconstraint(model, variables, allowed_assignments):
return model.add_allowed_assignments(variables, allowed_assignments)
"""=>createconstraint
function createmodel1()
# create the model, the variables and the constraint
model = cp_model.CpModel()
variables = createvariables(model)
constraint = createconstraint(model, variables, allowed_assignments_np)
# pydel! the model, the variables and the constraint
PythonCall.pydel!(variables)
PythonCall.pydel!(constraint)
PythonCall.pydel!(model)
# trigger Python GC
gc.collect()
end
function createmodel2()
# create the model, the variables and the constraint
model = cp_model.CpModel()
variables = createvariables(model)
constraint = model.add_allowed_assignments(variables, allowed_assignments_np)
# pydel! the model, the variables and the constraint
PythonCall.pydel!(variables)
PythonCall.pydel!(constraint)
PythonCall.pydel!(model)
# trigger Python GC
gc.collect()
end
Why I find this example interesting:
The two functions createmodel1 and createmodel2 only differ on their 4th line.
The function createmodel1 calls the python function createconstraint, which basically just dispatches its arguments to model.add_allowed_assignments.
The function createmodel2 directly passes those arguments to model.add_allowed_assignments.
I thought this would not make any difference, but actually it does!
The function createmodel1 frees up the memory as expected, so if you call it repeatedly the memory usage will stay roughly constant. In constrast, createmodel2 makes the memory usage always increase and eventually run out of space.
The modification that makes the problem appear seems small enough that it should be possible to understand why it happens, but I personally could not find an explanation for the difference in behavior between the two functions. I am not sure whether this is expected (in which case I would still be interested to know what is going on) or a bug in PythonCall.jl.
Note that:
all the variables are Py objects, so no conversion is happening during the function calls.
gc.collect() is invoked to make sure that the Python GC is being triggered. It is there just to make sure that the memory usage increase is not due to the Python GC’s allocations threshold not being met.
That’s great work narrowing it down, and it does look like shifting an attribute access and method call between the languages shouldn’t make a difference. Since triggering the Python GC doesn’t save createmodel2, it’s very unlikely to be cyclic garbage on the Python side. Before you try to dive deeper into the heap, check a couple of the simpler things:
Does GC.gc() on the Julia side between, not inside, createmodel2() calls save it?
Can you see anything change in PythonCall.JlWrap.Cjl.PYJLVALUES as the calls accumulate?
As a side note, pydel! was written before Julia had any escape analysis, and in fact the implementation most likely prevents effective escape analysis of any Py allocations. I suspect that if I “unoptimise” pydel! that actually these issues may disappear because Julia will be able to eagerly finalise these intermediate objects.
That sounds right, but it’s surprising to me that the bound methods instantiated upon calls from Python instances are big enough to add up to an observable memory error. I’ve just been lucky to dodge that issue by preferring minimal interop between 2 personal scripts.
Yes! Adding the pydel!s for the methods helps. I created a version of my function where the pydel!s successfully frees up the Python objects that the Julia-side wrappers were keeping from being collected.
const allowed_assignments = [rand(0:1, 100) for _ in 1:200000]
const allowed_assignments_py = np.asarray(stack(allowed_assignments; dims=1))
function createmodel_pydel()
# model creation
model = cp_model.CpModel()
createvar = model.new_int_var # method wrapper
variables = [createvar(0,1, "x_" * string(i)) for i in 1:100]
createconstraint = model.add_allowed_assignments # method wrapper
constraint = createconstraint(variables, allowed_assignments_py)
# pydel! all necessary wrappers
PythonCall.pydel!(createvar)
for v in variables PythonCall.pydel!(v) end
PythonCall.pydel!(createconstraint)
PythonCall.pydel!(constraint)
PythonCall.pydel!(model)
# make sure to trigger python GC
gc.collect()
return nothing
end
If createmodel_pydel is called in a loop, the memory usage stays constant! Unfortunately, if instead of converting allowed_assignments to a numpy matrix I simply convert it to Py:
and then use the same function, the problem comes back again . I found this very surprising, because the fact that with the numpy matrix everything works well means that we had identified all the Julia wrappers that needed to be pydel!ed.
I think something “bad” is happening when allowed_assignment_py is used on the Python side, inside the function model.add_allowed_assignments. Here is a simple test:
const allowed_assignments_py = Py(allowed_assignments)
@pyexec"""
def read_allowed_assignments(allowed_assignments_py):
for el in allowed_assignments_py:
for _ in el:
pass
return None
"""=>read_allowed_assignments
# now call this repeatedly:
read_allowed_assignments(allowed_assignments_py)
# RAM fills up until the Julia process is killed
The Python function read_allowed_assignments simply loops through allowed_assignments_py, but calling it repeatedly indeed causes the memory to fill up. Moreover, this really seem like the same problem again, because calling gc.collect() has no effect, but calling GC.gc() frees up the memory.
I can transform this example to use a simpler Vector{Int}, instead of a Vector{Vector{Int}} while preserving the same behavior:
v = rand(0:1, 100)
v_py = Py(v)
@pyexec"""
def read_vector(v_py):
for _ in v_py:
pass
return None
"""=>read_vector
# now call this repeatedly:
for _ in 1:200000 read_vector(v_py) end
# RAM fills up until the Julia process is killed
Here is what I suspect: each time an element of the vector is read on the Python side, a Py object is created which points to a Python int, which is the result of the conversion of the element inside the Julia array. All these Py objects are not being collected by the Julia GC (maybe they are not large enough?), and all the int objects cannot be collected on the Python side because they are referenced by the Py objects.
Moreover, as the important part is happening inside a Python function, I do not see any way to solve this with pydel!.
If this is what is going, then maybe the problem could be fixed by adding a pydel! in the source of PythonCall.jl every time an item is read from an ArrayValue?
That would be amazing! I am still a beginner in Julia, but I am happy to help if there is something I can do.
CPython int-s from -5 to 256 are pre-allocated and reused, not freshly instantiated.
The underlying object being indexed there is a Julia Vector{Int}, and the elements are Julia Int. Integer-s interconvert with Python int, no wrapping.
The prior cases involved Julia code using Py to wrap Python objects and obstructing CPython’s reference-counted frees. Here, we’re running Python code to wrap Julia objects, so theoretically the reference-counting should be doing the job. Empirically, you are still right to suspect something is being instantiated on both Python and Julia sides.
I’ll check your simplest example, but keep in mind the bigger examples can involve more. Even before we call the function, we’re already wrapping things in the global scope. Py(v) converts a Julia Vector{Int} to a Python object, specifically wrapping it in a juliacall.VectorValue, that gets wrapped by Py back on the Julia side. The VectorValue object puts the underlying Julia vector in the wrapped Julia objects cache.
Nothing strange so far in this setup.
julia> using PythonCall; begin
v = rand(0:1, 100)
v_py = Py(v)
@pyexec"""
def read_vector(v_py):
for _ in v_py:
pass
return None
"""=>read_vector
@pyexec v_py => "print(type(v_py))"
PythonCall.JlWrap.Cjl.PYJLVALUES # wrapped Julia objects cache
end
<class 'juliacall.VectorValue'>
7-element Vector{Any}:
Core
Base
Main
Pkg
PythonCall
#init_gui##0 (generic function with 1 method)
[1, 0, 0, 0, 1, 0, 0, 1, 0, 0 … 0, 0, 1, 1, 1, 0, 0, 0, 0, 0]
Each time we call read_vector(v_py), the Julia objects cache gets bigger by 1 JlWrap.Iterator (8 bytes for the reference). Python’s iteration mechanism instantiates a stateful iterator object from iterable objects, in this case a juliacall.IteratorValue from VectorValue. IteratorValue wraps a 16-byte JlWrap.Iterator on the Julia side, which references the underlying Julia vector and its iteration state (stores the 8 bytes allocated per iteration in this case). This also contradicts what I said at the beginning about freed CPython objects (IteratorValue has 0 references when the function returns) kicking the wrapped Julia objects (JlWrap.Iterator) from the cache.
GC.gc() clears them from the Julia objects cache and presumably frees them, which contradicts what I said at the beginning about the cache protecting them from the Julia GC until the Python side frees the wrappers. I really don’t know what’s going on here.
Running read_vector(v_py) 200,000 times does not increase the cache size by 200,000 (4.8 MB for iterators, 1.6MB for cache pointers) because the GC is being triggered (@time reports ~2.43 GiB, we know 160MB are iteration states) and leaving nulled indices in the cache for further iterators, though each nulled index also occupies 8 bytes in its own cache. Still, the hot loop evidently instantiates JlWrap.Iterators faster than the GC nulls them, and the cache cannot shrink by design.
julia> Base.gc_num().pause, Base.gc_num().full_sweep
(6, 2)
julia> begin
for _ in 1:200000 read_vector(v_py) end # expensive!
length.((PythonCall.Core.PYNULL_CACHE, # nulled Py wrapper cache
PythonCall.JlWrap.Cjl.PYJLVALUES, # wrapped Julia objects cache
PythonCall.JlWrap.Cjl.PYJLFREEVALUES)) # ^ nulled indices cache
end
(1, 68217, 49162)
julia> Base.gc_num().pause, Base.gc_num().full_sweep
(14, 5)
Ideally, the cache reaches 200000 iterators, the GC clears them after each loop, and the next loop uses the nulled indices. But the MBs of these growing caches appear insignificant next to the GiBs of garbage from each loop, and the GC can’t anticipate the next loop to time these smaller jobs. Several loops’ iterators can even fill the cache with no nulled indices:
julia> begin # omitted several identical evaluations
for _ in 1:200000 read_vector(v_py) end
length.((PythonCall.Core.PYNULL_CACHE,
PythonCall.JlWrap.Cjl.PYJLVALUES,
PythonCall.JlWrap.Cjl.PYJLFREEVALUES))
end
(1, 341502, 0)
Note that while the nulled indices cache can shrink, its underlying Memory buffer doesn’t, which is what actually fills memory. This part is an implementation detail of Julia Vectors, not PythonCall.
My hypothesis is these 2 caches’ buffers are monotonically increasing even while Julia’s GC (or you) frees the much larger garbage, and they escalate the heap size to the point the free physical memory can’t handle the program. I’m uncertain because each big loop is adding ~1GiB to the Julia process despite these caches being only a handful of MB. However, Julia routinely handles GiB-scale programs (for example, repeated x = rand(0:1, 250_000_000) #2GB kept a process steady at 4.1GiB), so I’m inclined to believe the caches are making a difference. In any case, check this on your system because that could change things.
Assuming I’m right, it’s a difficult problem to solve. As you said, Julia-side pydel!s can’t help at all. A hard cap on the cache won’t work because the cache must reference an unbounded number of wrapped Julia objects to prevent use-after-frees. A compacting alternative to Vector might keep the heap size in check, but that adds overhead the current index-nulling design is avoiding, and I don’t actually know if the heap size can dynamically decrease and trigger the GC more often to maintain the cache. Until contributors figure out something, the typical advice still holds up: avoid crossing the language barrier too frequently, especially when the language runtimes have very different opinions on things like automatic memory management.
Minor point, array-making functions generally support specifying multiple dimensions e.g. rand(0:1, 200000, 100), and Julia’s comprehension supports comma-delimited multidimensional inputs e.g. [rand(0:1) for row in 1:200000, col in 1:100] (note that this is not the same as nested for clauses in comprehensions and is in a way written backwards from it). This isn’t like NumPy where base Python forces us to stack lists as rows first.