Memory build-up when loading DataFrames in a loop

When I sequentially load a very large DataFrame into memory and then sort it in place, the memory usage builds eventually running out of memory?

Is there anything I can do to prevent this? Why is the garbage collector not freeing the memory each loop iteration? My guess would be it is to do with multi-threading?

My use case is that I want to sort some very large parquet files and then re-save them.


using Parquet, DataFrames

fpath = "./example.parquet"
for i in 1:100
    df = DataFrame(read_parquet(fpath))
    sort!(df, [:timestamp])
    df = nothing

Is the memory freed only after the loop (I mean for smaller data so that the process does not crash).

In general GC should be able to reclaim memory in the loop you presented (even without calling it explicitly).

No it’s not freed. If I run the script from a REPL for only 5 iterations, the process initially uses 5.2GiB memory, then once the for loop ends and it returns to the REPL the process is using 17.7 GiB
of memory.

I’m using julia 1.10.0 if that helps.