I found out that creating lots of temporary C++ objects through CxxWrap takes a lot of memory (duh!) and does not return that memory to the system after GC.
My use case is this package wrapping Voro++ library. Logically, iterating over a Voronoi tessellation there is meant to yield a Voronoi cell object on each iteration, and ideally I’d like the objects to be independent.
That rises an issue as each of the objects does not get immediately GC’ed if unused after the current iteration, so that memory allocation rises. Then at a later point GC deletes the objects. But the delete operator does not return the memory to the system but instead returns it to some memory pool for reuse. So, even though the objects get GC’ed at some point, the memory they allocated isn’t returned to the system.
That alone isn’t a huge problem, the problem arises when I run multiple loops creating such objects. Then, if the GC does not run after each loop, the allocated memory piles up and can consume all RAM.
So, I’m wondering how to make the object get collected as soon as possible after they are not needed?
The solution I have so far is to make two iterators, a “safe” one which allocates fresh objects, and “unsafe” which reuses a single objects for iteration. My feeling, though, is that neither is user-friendly. The first one because it can quickly lead to OOM, especially in an exploratory environment where a user tries the same thing with minor modifications over and over again. The second one needs a user to be concious that the results of iteration are not independent.
Am I missing a good third way to do the right thing without risking OOM?
What is the type of the iterator that you are using? If this type needs to be wrapped like a normal “opaque” C++ type, this will indeed allocate for every iterator that is produced. If you could somehow rewrite this to return only references and primitive types from C++, then there should be no heap allocated Julia objects involved.
The objects are composite structures having some nested arrays inside (as I am not related to the original library devs, I only have limited high-level understanding of the inner structure), in total like 200 kB per object.
Actually, the Julia-side allocations are not a problem, the memory is taken on the C++ side. Since GC happens only every so often, a lot of objects are allocated, and I think Julia runtime is unaware of the memory consumption. What happens is that I can iterate multiple times over a container having several thousand particles without a single GC sweep, while the memory on the C++ side grows substantially.
I think the root of the problem is the same as here. But the additional catch is that Julia runtime seems to be unaware of the C++ heap size and thus miscalculates when to run GC. At least in my experimenting, the system easily reached OOM (at 45 GB Julia usage) without a GC sweep.
If I add GC.gc() on iterator exhaustion, the memory taken is only one loop’s worth, which is understandable: temp objects stay uncollected until the end of the loop, then GC deletes them, memory stays reserved by C++ runtime but at least does on subsequent iterations.
On returning references: that might be helpful but I am not an expert. Would that need support from the wrapped C++ library to ensure correct memory management?
Tracking memory usage across language barriers is not trivial in general, and it’s extra important for tracing GCs triggered by memory pressure rather than variable lifetimes or reference counts. I don’t do my own language interop, but I’ve heard that jl_malloc/jl_free has been used to make Julia’s GC count C-allocated memory, and the BigInt implementation has apparently been swapping out libgmp’s memory functions e.g. :jl_gc_counted_malloc. No idea if or how this is generalized.
Assuming you are iterating over some large collection of objects that was already allocated in C++, then returning references instead of copies for each item should avoid this problem. I would need to see an example to know if this is possible here.
As seen in the linked issue, the general problem is indeed unsolved for now, the Julia GC has no idea how much memory is used by the wrapped C++ objects, and as far as I understand there is no functionality yet in the Julia API that we can use to inform it.
A hack workaround is to run GC.gc() every so often. Or maybe even GC.gc(); GC.gc(), the second full collection tends to find some more garbage to collect.
After that, the allocated memory is still not necessarily returned to the operating system, depending on your platform/libc/allocator. For example, on Linux with Glibc, doing GC.gc(); GC.gc(); @ccall malloc_trim(0::Cint)::Cvoid might be able to free even more memory. The Julia developers are considering moving away from Glibc malloc for this, and other, reasons.
I ran into the same problem with my package GTPSA.jl which wraps a C library.
Yes, this has worked for me, and this is a viable solution for now.
You can allocate the memory directly in Julia itself. This requires using ccall and jl_malloc, as well as specifying jl_free. See my implementation here where we have a mutable Julia struct wrapping a C structure: GTPSA.jl/src/tps.jl at 15f63c4f0ef1e2697197eb86a3cbd443ade259c4 · bmad-sim/GTPSA.jl · GitHub . We allocate the arrays using jl_malloc and specify a finalizer with jl_free
That’s the problem, the iteration is over a lazy collection, so that the objects are computed on the fly. The author’s intended approach was that user passes a preallocated buffer that gets mutated. I can make that the default iteration policy but then users would need to explicitly call copy whenever they want to decouple the curent object from the iteration state.
Another option would indeed be to preallocate all cells on domain creation. That will have an upper bound on memory consumption, at least.
Seems so. I guess the iteration state might need to include iteration counter to trigger GC at consistent intervals.
That’s what I see. Calling GC.gc() stops memoy footprint from growing but does not release all the memory allocated for temporary objects.
Not an option for me, I’m afraid, as the class under question does some dynamic memory management during object lifetime, so it’ll require rewriting the library internals. If not for that, it’d be the best solution (I guess, jlcxx::create does something of a kind, if not exactly that).