Speed up a loop with matrix operations

HJWang · September 21, 2021, 3:24pm

Hello:
I am working on a maximum simulated likelihood estimation on a multi-layer (group/firm/time) model. There are three random variables and I have to do simulation draws for two of them. The estimation is very slow because of the simulations; one small sample took about 8 minutes to estimate via MSLE. The speed becomes a hurdle for a Monte Carlo exercise where the model is to be estimated thousands of times.

I took out the time-consuming part of the log-likelihood function and made it into the following MWE. Any suggestion on how to further speed up the run time will be much appreciated.

using Random, Distributions, FLoops, StatsFuns

 @fastmath NH_like(sigmau::Real, lambdau::Real, uit) = 
   (exp.(-0.5 * StatsFuns.abs2.( uit/sigmau  )) * invsqrt2π ) .* 
   StatsFuns.erfc.(-(lambdau * uit / sigmau) * invsqrt2)/2  


function test1(; S=200, G=100, F=3, T=2)
    # S=Sw=Sc: simulation draws; G:groups; F:firms;  T:times;
    TF = T*F
    ww = rand(S)
    cc = rand(S)  

    wc = (repeat(ww, inner=S) + repeat(cc, outer=S))'  # 1x(Sc*Sw)
    wc_expand = repeat(wc, outer=(T,1))  # Tx(Sc*Sw)

    epsi = rand(G*F*T, 1) # (G*F*T x 1)
    lnLike = 0.0  

@floop begin    
@inbounds for g in 1:G
            Like_g = ones(1, S)
@inbounds   for i in 1:F
               @views u_it = epsi[1 + (g-1)*TF + (i-1)*T : (g-1)*TF + i*T]  .-  wc_expand  #  Tx(Sw*Sc) 
               aa = NH_like(0.5, 1.1, u_it) 
               bb = prod(aa, dims=1)   # 1 X (Sw*Sc) 
               cc = mean(reshape(bb, (S, S)), dims=1)  #  1 X s_w
               Like_g  = Like_g .* cc  # 1 X Sw matrix 
            end  
            lnLike += log(mean((Like_g)))
          end # for_g  
       end # @floop       
end

Oscar_Smith · September 21, 2021, 3:45pm

The biggest thing that will help is to make NH_like a scalar function. Then broadcast over that. Doing so is the easiest way to fix the fact that you are currently missing 7 dots, which means that you are allocating a ton.

HJWang · September 21, 2021, 4:21pm

Thanks for the suggestion. I followed the suggestion and the memory allocation is cut almost half and the run time improves a lot!

Could you please explain what did you mean by “missing 7 dots”? Even after I followed the suggestion, I did not use/see 7 dots. I am afraid I missed something.

Oscar_Smith · September 21, 2021, 4:29pm

What I meant by 7 missing dots is that

@fastmath NH_like(sigmau::Real, lambdau::Real, uit) = 
   (exp.(-0.5 .* StatsFuns.abs2.( uit./sigmau  )) .* invsqrt2π ) .* 
   StatsFuns.erfc.(-.(lambdau .* uit ./ sigmau) .* invsqrt2)./2

would have the same performance (although less idiomatic).

HJWang · September 21, 2021, 4:37pm

Oscar_Smith:

@fastmath NH_like(sigmau::Real, lambdau::Real, uit) = 
   (exp.(-0.5 .* StatsFuns.abs2.( uit./sigmau  )) .* invsqrt2π ) .* 
   StatsFuns.erfc.(-.(lambdau .* uit ./ sigmau) .* invsqrt2)./2

I see, thanks! I just compared different versions of the functions, and it seems making the function a scalar (hence without any dot) still has slight advantage over the fully dotted version in terms of speed and allocation.

Thanks again.

Topic		Replies	Views
Any suggestion for speeding up my function? Performance question	11	436	November 17, 2022
Question: Improve the speed of for loop Performance question	20	1575	April 12, 2023
Could this kernel go faster? Performance	3	816	January 4, 2022
Why fewer memory allocations does not necessarily suggest higher speed New to Julia performance , memory-allocation	5	792	June 6, 2021
Octavian Matmul slow in for loop Performance matrices	0	227	April 28, 2023

Speed up a loop with matrix operations

Related topics