I get better performance from
function foo_polyester!(num,market_ids) @batch per=core for id ∈ market_ids
@views s = sum(num[id,:])
@views num[id,:] ./= s+1
end
end
which is predict_shares_bern_6.
Using 4 threads:
julia> @btime predict_shares_bern($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
547.625 ms (1262078 allocations: 1.88 GiB)
julia> @btime predict_shares_bern_bis($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
67.685 ms (280030 allocations: 15.22 MiB)
julia> @btime predict_shares_bern_ter($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
46.552 ms (30 allocations: 1.34 MiB)
julia> @btime predict_shares_bern_4($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
39.009 ms (78 allocations: 1.34 MiB)
julia> @btime predict_shares_bern_5($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
24.059 ms (35 allocations: 28.04 MiB)
julia> @btime predict_shares_bern_6($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
15.691 ms (32 allocations: 28.04 MiB)
julia> Threads.nthreads()
4
My performance doesn’t really improve with more threads; 12ms is about the best it can do on my computer (with 18 cores).