So here are my efforts. Firstly, here’s a basic code. I had to change it a bit because in my actual problem there are no issues with ordering the indices of the matrix, and I also wanted ‘timeconsuming’ to be more time consuming, so I added a stupid loop into it to slow it down. The index and value vectors can be shared and the sparse matrix constructed at the end.
using SparseArrays
function timeconsuming(params)
index = zeros(Int64, 10000, 1)
value = zeros(Float64, 10000, 1)
for k = 1:1000
ind = 10 * (k - 1) .+ (1:10)
index[ind] = 1000 * (k - 1) .+ Int64.(ceil.(1000 .* rand(10, 1)))
value[ind] = params
end
for k = 1:100
dum = rand(1000,1000)
end
return index, value
end
function test()
index = zeros(Int64, 1000000, 1)
value = zeros(Float64, 1000000, 1)
J = repeat(transpose(collect(1:100)), 10000,1)
for p = 1:100
params = rand(1, 10)
ind = 10000 * (p - 1) .+ (1:10000)
index[ind], value[ind] = timeconsuming(params)
end
Z = sparse(vec(index), J[:], vec(value), 1000000, 100)
return Z
end
which gives
@time Z = test(); 40.613715 seconds (225.51 k allocations: 74.619 GiB, 21.99% gc time)
With @threads, this becomes
using SparseArrays, SharedArrays, Distributed
function timeconsuming(params)
index = zeros(Int64, 10000, 1)
value = zeros(Float64, 10000, 1)
for k = 1:1000
ind = 10 * (k - 1) .+ (1:10)
index[ind] = 1000 * (k - 1) .+ Int64.(ceil.(1000 .* rand(10, 1)))
value[ind] = params
end
for k = 1:100
dum = rand(1000, 1000)
end
return index, value
end
function test()
index = SharedArray{Int64,2}(1000000, 1)
value = SharedArray{Float64,2}(1000000, 1)
J = repeat(transpose(collect(1:100)), 10000, 1)
Threads.@threads for p = 1:100
params = rand(1, 10)
ind = 10000 * (p - 1) .+ (1:10000)
index[ind], value[ind] = timeconsuming(params)
end
Z = sparse(vec(index), J[:], vec(value), 1000000, 100)
return Z
end
and @time Z = test(); 22.801165 seconds (228.17 k allocations: 74.604 GiB, 30.57% gc time)
and finally with @distributed
using SparseArrays, SharedArrays, Distributed
addprocs(12)
@everywhere begin
function timeconsuming(params)
index = zeros(Int64, 10000, 1)
value = zeros(Float64, 10000, 1)
for k = 1:1000
ind = 10 * (k - 1) .+ (1:10)
index[ind] = 1000 * (k - 1) .+ Int64.(ceil.(1000 .* rand(10, 1)))
value[ind] = params
end
for k = 1:100
dum = rand(1000,1000)
end
return index, value
end
end
function test()
index = SharedArray{Int64,2}(1000000, 1)
value = SharedArray{Float64,2}(1000000, 1)
J = repeat(transpose(collect(1:100)), 10000, 1)
@sync @distributed for p = 1:100
params = rand(1, 10)
ind = 10000 * (p - 1) .+ (1:10000)
index[ind], value[ind] = timeconsuming(params)
end
Z = sparse(vec(index), J[:], vec(value), 1000000, 100)
return Z
end
and @time Z = test(); 15.664146 seconds (12.10 k allocations: 53.835 MiB)
This last seems to be the better of the two.
I think I’ve learned a few things, and I’ll now have to see what I can do with my actual code. Thanks to everybody for helping, and any further comments would be gratefully received.