# Speed up for-loop with multithreading

Hi Guys,

I’m trying to speed up my for-loop with multithreading like following (test3.jl code):

``````time0 = @elapsed begin
n = 250
a1 = Array{Array{Float64, 1}, 1}(undef, n)
a1[i] = [i+0.12345, 0]
# push!(a1, [i+0.12345, 0])
end
end

time = @elapsed try
a2 = Array{Array{Float64, 1}, 1}()
for row in a1
tmp = [round(x * 0.00001, digits=5) for x in row]
push!(a2, tmp)
end
catch
end

time2 = @elapsed try
a3 = Array{Array{Float64, 1}, 1}(undef, n)
tmp = [round(x * 0.00001, digits=5) for x in a1[i]]
a3[i] = tmp
end
catch
end

println("time0=\$time0, time=\$time, time2=\$time2")
``````

and i run:

``````julia> include("test/test3.jl")
time0=0.072520981, time=0.020055303, time2=0.070619797
``````

I have set Number of Threads = 4

I know, time2 > time because of overhead, which means, starting 4 Threads costs more time than directly running it without parallelism. But i still want to ask, is there any chance to speed up this for-loop? Maybe i should write more efficient codes? But how to do it?

Best Regards

Hi, it would be nice if you could edit your post to markdown the code in triple back ticks. That way it would be easier for us to copy and paste it. Thanks!

Thanks for your hint! I’ve done it!

Correct, here is what I see using `@btime`:

``````  27.400 μs (1023 allocations: 43.67 KiB)
5.283 μs (2 allocations: 64 bytes)
5.267 μs (1 allocation: 16 bytes)
``````

If you need more speed, this MWE doesn’t seem to be representative for your real problem? I can’t estimate if your problem size is bigger or if this represents a hot loop, which is executed a lot of times?

2 Likes

Thanks for your answer! This is only a part of the function (f1). In f1 exists only one loop with 250 iterations. I want to speed up f1 with speeding up this loop with multithreading. But it seems not to work.
The whole Program is very big, and there are many different functions. My final goal is to speed up the whole program. And speeding up f1 is only my first trying step. Maybe i should move on and try to speed up other functions.

That is what I suspected. Did you try to profile your program? I’d recommend to use a visual profiler in VSCode or Atom/Juno(which I still slightly prefer).

Edit: another idea. If your MWE is representative for your problem, allocations could be part of the problem. I’d see some room for improvement there.

1 Like

I don’t know about profile yet. I 'll read the docu :).

But the original benchmark is problematic: the `try`-`catch` blocks mask that `a1` is not defined. Corrected benchmark

``````using BenchmarkTools

init() = begin
n = 250
a1 = Vector{Vector{Float64}}(undef, n)
for i in 1:n
a1[i] = [rand(), rand()]
end
a1
end

round_serial(a1) = begin
n = length(a1)
a2 = Vector{Vector{Float64}}(undef, n)
for i in 1:n
a2[i] = [round(x, digits=5) for x in a1[i]]
end
a2
end

round_parallel(a1) = begin
n = length(a1)
a3 = Vector{Vector{Float64}}(undef, n)
a3[i] = [round(x, digits=5) for x in a1[i]]
end
a3
end

a1 = init()
a2 = @btime round_serial(\$a1)
a3 = @btime round_parallel(\$a1)
@assert isapprox(a3, a2)
``````

shows some speedup for me

``````  29.100 μs (251 allocations: 21.59 KiB)
10.500 μs (270 allocations: 24.00 KiB)
``````

Ok. Thanks for your check! I until there i still used @elapsed, because i need the returned time variable and the total run time is relevant for me.
I have read docu about profile. But i really don’t understand good, how can i understand the output of

``````ProfileView.@profview
``````

Code for visualizing profile:

``````using ProfileView

time0 = @elapsed begin
n = 250
a1 = Array{Array{Float64, 1}, 1}(undef, n)
a1[i] = [i+0.12345, 0]
# push!(a1, [i+0.12345, 0])
end
end

time1 = @elapsed @profview begin
a2 = Array{Array{Float64, 1}, 1}()
for row in a1
tmp = [round(x * 0.00001, digits=5) for x in row]
push!(a2, tmp)
end
end

time2 = @elapsed @profview begin

a3 = Array{Array{Float64, 1}, 1}(undef, n)
tmp = [round(x * 0.00001, digits=5) for x in a1[i]]
a3[i] = tmp
end

end

println("time0=\$time0, time1=\$time1, time2=\$time2")

``````

There’s not something in this picture that I’m familiar with. What is the crucial info of those plots i should extract?

Using

``````using Profile
Profile.clear()
a1 = init()
round_serial(a1)
@profile for i in 1:1000; round_serial(a1); end
Juno.profiler()
``````

for my example I see in Juno/Atom something like this

where you can navigate from the profile pane to the source code. In the source code pane bigger bars mean larger part of the runtime. Read color indicates part of the program which allocate, yellow color indicates dynamic dispatch.

Cool for you

I m using IDE VSCode, Juno is not set up and installed by me. Hmm Maybe i should start a new topic in discourse?