Why is the memory blowing up in this multi-threaded code?

I have some multi-threaded code in which each thread calls a function f(df::DataFrame) which reads a column of that DataFrame and finds the indices where the column is greater than 0:

function f(df::DataFrame)
    X = df[:time]
    indices = findall(X .> 0)
    indices
end

Inside the main thread I read in an R *.rds file which Julia converts to a DataFrame which I’m passing to f() as follows:

rds = "blabla.rds"
objs = load(rds);

params = collect(0.5:0.005:0.7)

for i in 1:length(objs)
    cols = [string(name) for name in names(objs.data[i]) if occursin("bla",string(name))]
    hypers = [(a,b) for a in cols, b in params] # length ~2000
    Threads.@threads for hi in 1:length(hypers) # MEMORY BLOWS UP HERE
        df = f(df)
    end
end

Each df that is passed to f() is roughly 0.7GB. Analysing the memory usage when the multi-threaded loop is run, the memory usage goes up to ~30GB. There are 25 threads and ~2000 calls to f() . Any idea why the memory is exploding?

Cross-reference: Julia: Why is the memory blowing up inside this loop? - Stack Overflow

(Just so you know, it’s considered good etiquette to mention it if you have posted the question other places.)

1 Like

Try re-writing like this:

function f(df::DataFrame)
    X = df[:time]
    return findall(x->x>0, X)
end

function foo(objs)
    params = 0.5:0.005:0.7 # don't collect
    for i in 1:length(objs)
        cols = [string(name) for name in names(objs.data[i]) if occursin("bla",string(name))]
        hypers = [(a,b) for a in cols, b in params] # length ~2000
        Threads.@threads for hi in 1:length(hypers) # MEMORY BLOWS UP HERE
            df = f(df)
        end
    end
end

using BenchmarkTools

rds = "blabla.rds"
objs = load(rds);
@benchmark foo($objs)

Instead of one thread allocating, you have 20 threads allocating. FWIW, I have almost never seen a speedup from multithreading if the part that is running parallel is allocating. The GC runs on one thread so if you have 20 threads allocating, it will bottleneck things.

2 Likes
BenchmarkTools.Trial: 
  memory estimate:  35.09 GiB
  allocs estimate:  59882
  --------------
  minimum time:     2.778 s (0.00% GC)
  median time:      2.801 s (0.00% GC)
  mean time:        2.801 s (0.00% GC)
  maximum time:     2.825 s (0.00% GC)
  --------------
  samples:          2
  evals/sample:     1

So is this threaded or not? Any chance you can provide a MWE? Right now we’re mainly guessing.

rds = "bla.rds"
objs = load(rds);

function f(df::DataFrame)
    X = df[:time]
    return findall(x->x>0, X)
end

function foo(objs)
    for i in 1:length(objs)
        df = objs.data[i]
        Threads.@threads for hi in 1:2000
            f(df)
        end
    end
end

@benchmark(foo($objs))

Thanks, but we don’t have any data to run this on. A toy dataset would work.

What sort of memory use are you expecting? How big is the data set?

objs is around 1GB.

So since you iterate 2000 times, conceivably you would expect anywhere up to 2TB of memory use?

OK. I guess I’m asking why is the dataframe not being shared across threads? I thought it’s passed by reference?

The same idea in Numba does not blow up the memory.

I guess it is, but you have a bunch of index arrays being allocated in your loop.

Some toy data with Numba and Julia code would be a big help.

(Gotta run)

1 Like

Interesting. Do you know if it is possible and planned to make it the case that each thread handles its garbage?

Is there really anything unexpected going on here, at all? To me it seems like a code that allocates one Vector of indices per iteration. The memory use is the size of the index vector times the number of iterations. If you want lower memory consumption you need to reuse memory between iterations.

What do the Numba benchmarks look like? Are you sure they are reporting the same thing as here?

Cutting out the dataframe and just working with vectors give me the same memory use.

2 Likes

What’s a good way to get equivalent benchmarks in Numba?

Also, I ran foo() with just a single array, and the memory is still blowing up:

function f(time::Array{Float64,1})
    return findall(x->x>0, time)
end

function foo(time::Array{Float64,1})
    Threads.@threads for hi in 1:2000
        f(time)
    end
end

@benchmark(foo($objs.data[1][:time]))

The array is 1631339 rows.

I’m running the equivalent code in Numba as follows:

@jit(nopython=True)
def f(time):
    return np.where(time > 0)[0]

@jit(nopython=True, parallel=True)
def foo(time):
    for h in prange(2000):
        f(time)

foo(df['time'].values)

I’m not sure how to get the equivalent of @benchmark in Numba, but looking at htop, the memory doesn’t blow up anywhere near what it does with Julia.

This is not a jupyter notebook thing either; running a pure julia script gives the same result.

What do you mean by this? The number reported is the sum of all allocations done during the execution. So if you disabled the GC completely you would have 32 GB of memory allocated. Fortunately, we do have a GC.

The RES memory in htop increases and doesn’t go back to what it was before I ran the function. I’m not seeing this in Numba.

Could you post your code as text instead of images so that people that want to try it out don’t have to type the whole thing in manually?

1 Like