Speed up a loop with matrix operations

Hello:
I am working on a maximum simulated likelihood estimation on a multi-layer (group/firm/time) model. There are three random variables and I have to do simulation draws for two of them. The estimation is very slow because of the simulations; one small sample took about 8 minutes to estimate via MSLE. The speed becomes a hurdle for a Monte Carlo exercise where the model is to be estimated thousands of times.

I took out the time-consuming part of the log-likelihood function and made it into the following MWE. Any suggestion on how to further speed up the run time will be much appreciated.

using Random, Distributions, FLoops, StatsFuns

 @fastmath NH_like(sigmau::Real, lambdau::Real, uit) = 
   (exp.(-0.5 * StatsFuns.abs2.( uit/sigmau  )) * invsqrt2Ď€ ) .* 
   StatsFuns.erfc.(-(lambdau * uit / sigmau) * invsqrt2)/2  


function test1(; S=200, G=100, F=3, T=2)
    # S=Sw=Sc: simulation draws; G:groups; F:firms;  T:times;
    TF = T*F
    ww = rand(S)
    cc = rand(S)  

    wc = (repeat(ww, inner=S) + repeat(cc, outer=S))'  # 1x(Sc*Sw)
    wc_expand = repeat(wc, outer=(T,1))  # Tx(Sc*Sw)

    epsi = rand(G*F*T, 1) # (G*F*T x 1)
    lnLike = 0.0  

@floop begin    
@inbounds for g in 1:G
            Like_g = ones(1, S)
@inbounds   for i in 1:F
               @views u_it = epsi[1 + (g-1)*TF + (i-1)*T : (g-1)*TF + i*T]  .-  wc_expand  #  Tx(Sw*Sc) 
               aa = NH_like(0.5, 1.1, u_it) 
               bb = prod(aa, dims=1)   # 1 X (Sw*Sc) 
               cc = mean(reshape(bb, (S, S)), dims=1)  #  1 X s_w
               Like_g  = Like_g .* cc  # 1 X Sw matrix 
            end  
            lnLike += log(mean((Like_g)))
          end # for_g  
       end # @floop       
end

The biggest thing that will help is to make NH_like a scalar function. Then broadcast over that. Doing so is the easiest way to fix the fact that you are currently missing 7 dots, which means that you are allocating a ton.

1 Like

Thanks for the suggestion. I followed the suggestion and the memory allocation is cut almost half and the run time improves a lot!

Could you please explain what did you mean by “missing 7 dots”? Even after I followed the suggestion, I did not use/see 7 dots. I am afraid I missed something.

What I meant by 7 missing dots is that

@fastmath NH_like(sigmau::Real, lambdau::Real, uit) = 
   (exp.(-0.5 .* StatsFuns.abs2.( uit./sigmau  )) .* invsqrt2Ď€ ) .* 
   StatsFuns.erfc.(-.(lambdau .* uit ./ sigmau) .* invsqrt2)./2 

would have the same performance (although less idiomatic).

1 Like

I see, thanks! I just compared different versions of the functions, and it seems making the function a scalar (hence without any dot) still has slight advantage over the fully dotted version in terms of speed and allocation.

Thanks again.

1 Like