Hi all,
I have a function that needs some speeding up. I tried many things, here is the current state and a minimum working example
#
using BenchmarkTools, Random, Distributions, Base.Threads
Random.seed!(123)
market_ids = repeat(1:1000,35)
delta = rand(Normal(1,5), 35000)
nu_bern = -1
randvar_nu = randn(35000,100)*5 .+ 6
randvar_nu_inattention= randn(35000,100)*5 .+ 6.5
mat_1 = similar(randvar_nu)
vec_1 = similar(delta)
function predict_shares_bern(delta, randvar_nu, randvar_nu_inattention, mat_1, vec_1, market_ids, nu_bern)
@views num = (mat_1 .= exp.(randvar_nu .+ delta ))
@threads for i in 1:length(market_ids)
@views num[market_ids .== i,:] .= num[market_ids .== i , :] ./ (sum(num[market_ids .== i,:],dims = 1) .+1)
end
vec_1 .= mean(num, dims = 2)
num .= exp.(randvar_nu_inattention .+ delta)
@threads for i in 1:length(market_ids)
@views num[market_ids .== i,:] .= num[market_ids .== i , :] ./ (sum(num[market_ids .== i,:],dims = 1) .+1)
end
share = (vec_1 .= vec_1 .* (exp(nu_bern)/(1+exp(nu_bern))) + mean(num, dims = 2) .* (1 - (exp(nu_bern)/(1+exp(nu_bern)))))
return share
end
predict_shares_bern(delta, randvar_nu, randvar_nu_inattention, mat_1, vec_1, market_ids, nu_bern)
@btime predict_shares_bern(delta, randvar_nu, randvar_nu_inattention, mat_1, vec_1, market_ids, nu_bern)
I get this output:
Some notes:
- replace 35,000 with 10,000,000 if you want, that’s where I have to go.
- market_ids: right now it is just 1:1000 repeated, that’s not what it will be. The markets will have different sizes, so looping with 1000s will not work.
- I have nthreads() = 8 right now, I think the code will run on a computer with 260 cores+.
- mat_1 and vec_1 have no purpose other than being constructed once, as this function is run many times and thus I have less allocations overall.
Let me know if you have any ideas. Using DataFrames to subset the data etc. seems to be slower, I cannot get improvements with Tullio and LoopVectorization but I think it is me.
Many Thanks!