Hello, everyone. I’m following the tutorial introduction to GPU programming in Julia (Introduction · CUDA.jl), and the first example in the tutorial discuss parallelization on CPU. Didn’t find a specific section for CPU, so thought General Usage would be addequate. The code is
using Test
using BenchmarkTools
N=2^20
x = fill(1.0f0, N) # a vector filled with 1.0 (Float32)
y = fill(2.0f0, N)
function sequential_add!(y, x)
for i in eachindex(y, x)
@inbounds y[i] += x[i]
end
return nothing
end
function parallel_add!(y, x)
Threads.@threads for i in eachindex(y, x)
@inbounds y[i] += x[i]
end
return nothing
end
@btime sequential_add!($y, $x)
@btime parallel_add!($y, $x)
The results given by the tutorial for the benchmarking with 4 threads of function sequential_add is 487.303 μs (0 allocations: 0 bytes), and for the function parallel_add is 259.587 μs (13 allocations: 1.48 KiB), showing that parallel computation is faster.
When I run the code on my PC, the execution time is roughly the same (387.601 μs (0 allocations: 0 bytes) for parallel and 386.501 μs (20 allocations: 3.39 KiB)), and running Threads.nthreads(), the result is 4, therefore I am also using 4 threads.
This should be a problem, right? Am I doing something wrong while compiling, or wrongly interpreting the use of threads? Also, there is some package that I need to install before parallelizing?
To answer your question, no you don’t need to install a package or do anything special expect for making sure to start Julia with 4 threads, i.e. julia -t4.
Yeah, I guess in principle it could be a coincidence (e.g. caused by other OS threads keeping the cores busy). However, the timings are so similar that it’s way more likely that the benchmarks have been accidentally run with nthreads() == 1.
Perhaps the number of threads was 4 at some point but the OP has restarted Julia while experimenting around and forgotten to start Julia with 4 threads again (just guessing of course). @leopdsf could you ensure that this was not the case? Perhaps you could post a screenshot of your terminal like this, i.e. including the command to start Julia:
Yeah, I also missed it my screenshot above. Would be good to explicitly confirm it, but he is clearly starting Julia with -t4 (and the command line argument has higher precedence than any potententially set env variable).
It’s not the Julia version. (I would know but to be sure I’ve explicitly run the example in 1.5, 1.6 and 1.7, all with the expected speedup.) But of course, it can’t hurt to update at least to 1.6.
But that’s a generalized graph, not sure if 100% there means a single core loaded to the max or all cores loaded to the max. I get 100% and like 380% on my computer indicating 4 cores working .
The 4 cores are working, as shown in the print. Updating to 1.6.3 also didn’t help. I will try using the environments now. For now, thank you very much guys!