Speeding up a function

I get better performance from

function foo_polyester!(num,market_ids)           @batch per=core for id ∈ market_ids
               @views s = sum(num[id,:])
               @views num[id,:] ./= s+1
           end
       end

which is predict_shares_bern_6.
Using 4 threads:

julia> @btime predict_shares_bern($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
  547.625 ms (1262078 allocations: 1.88 GiB)

julia> @btime predict_shares_bern_bis($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
  67.685 ms (280030 allocations: 15.22 MiB)

julia> @btime predict_shares_bern_ter($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
  46.552 ms (30 allocations: 1.34 MiB)

julia> @btime predict_shares_bern_4($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
  39.009 ms (78 allocations: 1.34 MiB)

julia> @btime predict_shares_bern_5($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
  24.059 ms (35 allocations: 28.04 MiB)

julia> @btime predict_shares_bern_6($delta, $randvar_nu, $randvar_nu_inattention, $mat_1, $vec_1, $market_ids, $nu_bern);
  15.691 ms (32 allocations: 28.04 MiB)

julia> Threads.nthreads()
4

My performance doesn’t really improve with more threads; 12ms is about the best it can do on my computer (with 18 cores).

3 Likes