Shared-memory parallelization with large matrix

Have you export JULIA_NUM_THREADs=20 in your environment? Run Threads.nthreads() in your interpreter to check.

Also I would recommend setting the number of threads to the number of physical (not logical) cores for numerical workloads, otherwise you’ll have 2K threads contesting K floating point units.

In addition as another commenter mentioned, you should pass in a view, or better yet pass in the full array and an index range (since creating views allocates). For example.

using Base.Threads: @threads

function work(array, column)
    array[column] = sin.(array[column])
end

function main()
    x = rand(1000, 1000)
    
    @time @threads for i=1:1000
        work(x, i)
    end 
end

main()
1 Like