I have some multi-threaded code in which each thread calls a function f(df::DataFrame) which reads a column of that DataFrame and finds the indices where the column is greater than 0:
function f(df::DataFrame)
X = df[:time]
indices = findall(X .> 0)
indices
end
Inside the main thread I read in an R *.rds file which Julia converts to a DataFrame which I’m passing to f() as follows:
rds = "blabla.rds"
objs = load(rds);
params = collect(0.5:0.005:0.7)
for i in 1:length(objs)
cols = [string(name) for name in names(objs.data[i]) if occursin("bla",string(name))]
hypers = [(a,b) for a in cols, b in params] # length ~2000
Threads.@threads for hi in 1:length(hypers) # MEMORY BLOWS UP HERE
df = f(df)
end
end
Each df that is passed to f() is roughly 0.7GB. Analysing the memory usage when the multi-threaded loop is run, the memory usage goes up to ~30GB. There are 25 threads and ~2000 calls to f() . Any idea why the memory is exploding?
function f(df::DataFrame)
X = df[:time]
return findall(x->x>0, X)
end
function foo(objs)
params = 0.5:0.005:0.7 # don't collect
for i in 1:length(objs)
cols = [string(name) for name in names(objs.data[i]) if occursin("bla",string(name))]
hypers = [(a,b) for a in cols, b in params] # length ~2000
Threads.@threads for hi in 1:length(hypers) # MEMORY BLOWS UP HERE
df = f(df)
end
end
end
using BenchmarkTools
rds = "blabla.rds"
objs = load(rds);
@benchmark foo($objs)
Instead of one thread allocating, you have 20 threads allocating. FWIW, I have almost never seen a speedup from multithreading if the part that is running parallel is allocating. The GC runs on one thread so if you have 20 threads allocating, it will bottleneck things.
rds = "bla.rds"
objs = load(rds);
function f(df::DataFrame)
X = df[:time]
return findall(x->x>0, X)
end
function foo(objs)
for i in 1:length(objs)
df = objs.data[i]
Threads.@threads for hi in 1:2000
f(df)
end
end
end
@benchmark(foo($objs))
Is there really anything unexpected going on here, at all? To me it seems like a code that allocates one Vector of indices per iteration. The memory use is the size of the index vector times the number of iterations. If you want lower memory consumption you need to reuse memory between iterations.
What do the Numba benchmarks look like? Are you sure they are reporting the same thing as here?
Cutting out the dataframe and just working with vectors give me the same memory use.
Also, I ran foo() with just a single array, and the memory is still blowing up:
function f(time::Array{Float64,1})
return findall(x->x>0, time)
end
function foo(time::Array{Float64,1})
Threads.@threads for hi in 1:2000
f(time)
end
end
@benchmark(foo($objs.data[1][:time]))
The array is 1631339 rows.
I’m running the equivalent code in Numba as follows:
@jit(nopython=True)
def f(time):
return np.where(time > 0)[0]
@jit(nopython=True, parallel=True)
def foo(time):
for h in prange(2000):
f(time)
foo(df['time'].values)
I’m not sure how to get the equivalent of @benchmark in Numba, but looking at htop, the memory doesn’t blow up anywhere near what it does with Julia.
This is not a jupyter notebook thing either; running a pure julia script gives the same result.
What do you mean by this? The number reported is the sum of all allocations done during the execution. So if you disabled the GC completely you would have 32 GB of memory allocated. Fortunately, we do have a GC.