When I run the code below, the memory usage blows up:
function func(df::DataFrame)
X = df[:time]
indices = findall(X .> 0)
end
# read in R data
rds = "blablab.rds"
objs = load(rds);
params = collect(0.5:0.005:0.7);
for i in 1:length(objs)
cols = [string(name) for name in names(objs.data[i]) if occursin("blabla",string(name))]
hypers = [(a,b) for a in cols, b in params]
results = [DataFrame() for _ in 1:length(hypers)]
# HERE IS WHERE THE MEMORY BLOWS UP
Threads.@threads for hi in 1:length(hypers)
name, val = hypers[hi]
results[hi] = func(objs.data[i])
end
end
df is 0.7GB. When I run this piece of code my memory usage goes up to ~30GB!!! It seems like just accessing a column of df inside func() is copying the whole thing?
It looks like it could be similar to the issue that I ran into here:
If it is the same issue, I believe it was fixed on the julia master branch a few months ago, but hasn’t made it into a release yet. Again, if it is the same problem, things should work with julia 1.0.3 and earlier. I’ve been sticking with that release until we get a new bug fix release.