Multithreading of a simple loop

JohnZ · November 2, 2020, 8:29pm

Hi, I am trying to use multi-threading to parallelise a simple loop. The actual code is quite complex and I have given a simplified example.

The multithreaded version runs slower than the single thread version. I have probably not used the @threads macro correctly (and I am not sure whether I need to introduce any locks as outputs are stored to arrays).

How can I improve performance of the multi-threaded version? Is this the right way of using multi-threading?

using LinearAlgebra, CSV, DataFrames, BenchmarkTools

function generate_data(m)
    Values = rand(20.0:140.0, m)
    return Values
end
function summation(Values)
    A= cumsum(Values, dims =1)
    return A
end
function some_thing(Values)
    B = sum(Values, dims =1)
    return B
end

function run_singlethread(m,n)

V_id =  Array{Float64,2}(undef, m,n)
A_id = Array{Float64,2}(undef, m,n)
B_id = Vector{Float64}(undef,n)
for i in 1:n
    Values =  generate_data(m)
    A = summation(Values)
    B = some_thing(Values)
    V_id[:,i] = Values
    A_id[:,i] = A
    B_id[i]  = B[1]
end
df1 = DataFrame(V_id)
df2 = DataFrame(A_id)
df3 = DataFrame(ID=1:n,some_thing = B_id)
CSV.write("DataFrame1.csv",df1)
CSV.write("DataFrame2.csv",df2)
CSV.write("DataFrame3.csv",df3)
return A_id, V_id, B_id
end

function multithread_run(m,n)

V_id =  Array{Float64,2}(undef, m,n)
A_id = Array{Float64,2}(undef, m,n)
B_id = Vector{Float64}(undef,n)
Threads.@threads for i in 1:n
    Values =  generate_data(m)
    A = summation(Values)
    B = some_thing(Values)
    V_id[:,i] = Values
    A_id[:,i] = A
    B_id[i]  = B[1]
end
df1 = DataFrame(V_id)
df2 = DataFrame(A_id)
df3 = DataFrame(ID=1:n,some_thing = B_id)
CSV.write("DataFrame1mthreads.csv",df1)
CSV.write("DataFrame2mthreads.csv",df2)
CSV.write("DataFrame3mthreads.csv",df3)
return A_id, V_id, B_id
end

Run-time code

@btime run_singlethread(3,10000)      #48.490 ms
@btime multithread_run(3,10000)       #49.847 ms

stevengj · November 2, 2020, 8:48pm

Did you remember to set JULIA_NUM_THREADS before launching Julia? What is Threads.nthreads()?

Don’t allocate arrays in your inner loop if you can help it (pre-allocate arrays before running performance-critical code). (Both rand and cumsum allocate new arrays.)

(If you are doing lots of calculations on 3-component arrays as in your example here, you should strongly consider using StaticArrays.jl instead. e.g. V should be a Vector{SVector{3,Float64}}(undef, 10^4) rather than a 3 \times 10^4 matrix.)

(I would typically also only try to parallelize code that is expensive enough to run for at least several seconds.)

JohnZ · November 2, 2020, 8:58pm

Thanks. I am using Juno, which I believe starts with number of threads equal to number of cores.
Threads.nthreads() is equal to 4. This was just an example so I used rand to generate some data. In the actual code, I am running functions which I need to call in a loop on different sets of data, so generating random data for a MWE seemed a good choice to me. The actual code has a large number of iterations in for loop, so parallelising it makes sense.

Do I need to introduce locks in this example? If so, what would be a good a choice?

stevengj · November 2, 2020, 9:00pm

You don’t need locks since different loop iterations are writing to disjoint elements of the shared arrays.

Skoffer · November 2, 2020, 9:21pm

The problem is CSV.write functions. They take much longer time than actual data generation and affected by IO. By removing all lines after the loop (starting from df1 = DataFrame(V_id) up to return) I get the following numbers:

@btime run_singlethread(3, 10000)
# 1.847 ms (30006 allocations: 3.59 MiB)
@btime multithread_run(3, 10000)
# 814.549 μs (40051 allocations: 4.05 MiB)

JohnZ · November 3, 2020, 12:12am

Thanks @stevengj and @Skoffer
I am running in to a strange error with multi-threading when I use it with JuMP in a similar manner. I will post that as a separate question.

JohnZ · November 3, 2020, 1:07am

Link to the question can be found here

Topic		Replies	Views
Threads maxing out all cores, but no performance increase General Usage performance , threads	16	1728	April 6, 2021
Multithreading for nested for loops General Usage parallel , multithreading , threads	13	1559	August 16, 2023
Slower execution with multi-threading using @threads macro Performance question , parallel , multithreading	5	716	August 13, 2020
Multi-threading with DataFrames General Usage multithreading , dataframes	33	2587	December 24, 2023
Question about Multi-threading Performance Performance	3	1365	June 30, 2018

Multithreading of a simple loop

Related topics