Memory leak when migrating from Julia 1.10.7 to 1.11.2

Hi,
I’m encountering a memory issue in Julia v1.11.2 with a Monte Carlo simulation. The code spans multiple files which makes it hard for me to post it here. I can say that the code doesn’t allocate new memory after the Monte Carlo simulation starts (I have checked this on my PC by monitoring the memory usage). On my PC, memory usage is stable, but on a cluster, it grows significantly (after the simulation begins), leading to OOM errors. This didn’t happen in Julia v1.10.7, where I ran the same code for over 20,000 CPU hours without issues (and the memory usage is stable once the simulation begins). What could be causing this memory leak in Julia v1.11.2, and how can I debug it?

2 Likes

Perhaps try taking and comparing heap snapshots.

Sounds like Memory leak with Julia 1.11's GC (discovered in SymbolicRegression.jl) · Issue #56759 · JuliaLang/julia · GitHub which was fixed by gc: improve mallocarrays locality by vtjnash · Pull Request #56801 · JuliaLang/julia · GitHub which is marked for backport.

Can you try nightly?

6 Likes

Thanks for pointing this out. My issue is exactly like the one in Memory leak with Julia 1.11's GC (discovered in SymbolicRegression.jl) · Issue #56759 · JuliaLang/julia · GitHub.

I will try to run a job with the nightly version of Julia and see if that fixes things.

The nightly version fixed the issue. Thanks.

6 Likes

Hi, I’m experiencing something very similar with Julia 1.11.3 – I’m doing a calculation using the ITensors.jl library, and writing to files using HDF5.jl, on a university cluster (SLURM). I run a somewhat intensive function a few dozen times - but I’m finding that memory usage grows without bound (even overshooting the heap-size-hint). I’m even manually calling GC.gc() each time after calling that function.

To give an example, if I call this function once, the memory usage is ~1-2GB. After calling it about 30 times, it’s hitting 5GB, and SLURM is killing my jobs.

Can anyone help me diagnose what the issue is, please?

Edit: To add to this, using How to track total memory usage of Julia process over time - #6 by sloede, on my computer I’m finding it’s consistently using slightly more memory after each loop. For a (much) smaller problem, it’s 10MB each time, resulting in total extra memory usage of 400MB by the end. I’m running the same diagnostics on the cluster for a bigger problem.

Could it be related to Automatic h5_garbage_collect() garbage collection · Issue #1186 · JuliaIO/HDF5.jl · GitHub

Thanks! I’m not sure that solves my issue, though.

I just ran my some diagnostics on the cluster, calling the HDF5 GC function, and the memory usage was growing by hundreds of megabytes each loop (e.g. increases by 1000MB during the calculation and then only frees 300MB after it’s over). I’m not sure it could be HDF5 because the file I’m writing to is only a few MB.

It’s very possible the problem is in ITensors.jl, so I’ll probably pursue this further with them.