Threads.@threads memory leak

I am running Julia 1.1, here is my Code

using BenchmarkTools
using DataFrames

df = DataFrame(rand(10_000_000, 10));
Threads.nthreads()

function singlethread(df)
    N = nrow(df)
    M = ncol(df)

    cumsum = 0.0
    for i in 1:N
        for j in 1:M
            cumsum += df[j][i]
        end
    end
    return cumsum
end

function multithread(df)
    N = nrow(df)
    M = ncol(df)

    cumsum = Threads.Atomic{Float64}(0.0)
    Threads.@threads for j in 1:M
        for i in 1:N
            Threads.atomic_add!(cumsum, df[j][i])
        end
    end
    return cumsum[]
end

@btime singlethread(df)
@btime multithread(df)

My Result:

Threads.nthreads()
4

@btime singlethread(df)
3.898 s (299994891 allocations: 4.47 GiB)

Windows Task Manager’s Memory Monitor:


Above looks like no memory leakage. It is good

@btime multithread(df)
5.845 s (143212953 allocations: 2.10 GiB)


I got some problem above

Questions:

  1. From windows task manager perspective, the Multi-threading is having Memory Leaking
    My code are very similar for both functions, just compute a cumulative sum. without creating any new variables.

  2. However, this is against the btime result:
    Multi-threading is creating 2GB memory allocation which is less than single-threading 4GB.
    Why is there such a big difference with windows task manager’s result?

  3. multi-threading cost 5second, which is higher than single threading.
    Is that normal?

Thank you

Nothing here suggests a memory leak. That would mean that memory usage would grow unbounded if you run the function for larger inputs. Does that happen?

Yes, since you are using atomics which are very slow.

However, when I btime the “singlethread” method, with the same data frame object of 10mm rows, there is no change on my windows memory monitor, it was a flat line, meaning there is no memory re-allocation.

both multithread and single thread method are reading the same large dataset and doing the exact same operation to do cumsum.
i just don’t understand why single thread seems to make my windows memory “Not Grow at all”, while the multithread method, make it grow, as below two graph shows:

On a separate note, i find that accessing data frame with df[j][i] is not a good choice, because there is type instability! that’s why it is allocating huge amount of memory.

I will try again by creating a 10m row, 10 column simple Array{Float64, 2} to test.
then i have no memory allocation at all…

Update:

I use Array now and there is no more problem. so I think this is NOT a “Thread” issue, it is an issue that data frame accessing using “df[j][i]” inside a function, will create type instability! that create many unnecessary memory allocation.

I run below code, then i have almost 0 memory allocation so that is perfect!

using BenchmarkTools
arr = rand(10_000_000, 10);
Threads.nthreads()

function singlethread(arr)
    N, M = size(arr)


    cumsum = 0.0
    for i in 1:N
        for j in 1:M
            cumsum += arr[i, j]
        end
    end
    return cumsum
end

function multithread(arr)
    N, M = size(arr)

    cumsum = Threads.Atomic{Float64}(0.0)
    Threads.@threads for j in 1:M
        for i in 1:N
            Threads.atomic_add!(cumsum, arr[i, j])
        end
    end
    return cumsum[]
end

@btime singlethread(arr)
@btime multithread(arr)

The threaded version is still 50x slower than the single threaded due to the atomic. You want something like:

function singlethread(arr)
    N, M = size(arr)
    cumsum = 0.0
    @inbounds for j in 1:M
        for i in 1:N
            cumsum += arr[i, j]
        end
    end
    return cumsum
end

function multithread(arr)
    N, M = size(arr)

    partial_sums = zeros(Float64, Threads.nthreads())
    Threads.@threads for j in 1:M
        t = Threads.threadid()
        @inbounds for i in 1:N
            partial_sums[t] += arr[i, j]
        end
    end
    return sum(partial_sums)
end
julia> @btime singlethread(arr)
  84.051 ms (1 allocation: 16 bytes)
4.9994246012104645e7

julia> @btime multithread(arr)
  26.634 ms (3 allocations: 176 bytes)
4.9994246012085415e7

Note that I also changed the order of i and j in your single thread loop to have it go columnwise through the elements like how Julias Arrays are stored,

4 Likes

perfect! i will always avoid atomic operations but create array to store each threads output.
I read from Julia documentation that atomic is useful for multi thread safe operations but actually it is not good.
thank you very much i will close this topic.

I learned a lot from this slides: julia-parallelism
It is a presentation, so there isn’t much textual explanation, but there are a few code snippets, benchmarks and tricks for dealing with threads in julia.

You might find that useful. I don’t how if the syntax if that of julia v1.0 though.