When I sequentially load a very large DataFrame into memory and then sort it in place, the memory usage builds eventually running out of memory?
Is there anything I can do to prevent this? Why is the garbage collector not freeing the memory each loop iteration? My guess would be it is to do with multi-threading?
My use case is that I want to sort some very large parquet files and then re-save them.
MWE:
using Parquet, DataFrames
fpath = "./example.parquet"
for i in 1:100
println(i)
df = DataFrame(read_parquet(fpath))
sort!(df, [:timestamp])
empty!(df)
df = nothing
GC.gc(true)
end