Help with for-loops optimization / Translation from Matlab

Benjamin7 · August 14, 2024, 10:02pm

Hi, I have been asked to translate a matlab code to Julia to gain speed, what I have done is to put each for loop with the @inbound, put each array and all important code in a function, I have defined the type of almost all objects (array, arrays, const). The code is already 6 times faster than Matlab, but I know it can be improved even more I just don’t know where or what to do now, I’m stuck. I would be very grateful if you can give me some advice or recommend some reading to improve this code even more. I’m not looking for someone to do it for me, I’ve learned a lot working on this and I intend to keep learning :). Thank you very much.

function CournotCompetitionCES(C,s,GDP) # Cournot competition [finding the fixed point
    X, N = size(C);              # number of competing firms
    iterations = 16;               # number of iterations
    y = zeros(N,iterations+1)
    BRy = zeros(1,N)
    initial_y = ones(1,N)*GDP/(N*maximum(C))
    infimum = (10^-8)*GDP/((maximum(C))^s)
    y[:,1] = initial_y
    Y = sum(y[:,1])
    #diff = zeros(1,iterations)
    for j = 1:iterations
      for i = 1:N
        yni = Y - y[i,j]
        BRy[i] = max.(infimum,(Y*s)*(1-(C[i]*((Y/GDP)^(1/s)))))
        y[i,j+1] = BRy[i]
        Y = yni+BRy[i]
      end
    #diff(j) = 100*sum(abs(y[:,j]-BRy'))/(N*GDP)
    end
    Result = y[:,iterations+1]'
    return Result
end


years::Int64 = 30                      # length of time in years
L::Int64 = 20                           # number of periods in a year
T::Int64 = L*years 
beta::Float64 = 0.97^(1/L)                # discount factor()
growth_rate::Float64 = 0.015/L
r::Float64 = 1-beta                       # discount rate
Ea::Float64 = 0.010                       # gain & advantage of entry over all existing firms
gamma::Float64 = 0.10/L                   # parameters of the entry production function
m::Float64 = 0.030/L                        # merger chance of X# per year
s::Float64 = 1.75                         # elasticity of substitution
Emerger_synergy::Float64 = 0.050          # percentage reduction in cost from a merger
CostEffort::Float64 = 1.00                   
CostLabor::Float64 = 1.00
psi::Float64 = 3.0                        # Frisch elasticity of substitution for labor/entrepreneurial supply

Eexit_tolerance::Float64 = 0.27           # marginal cost tolerance for exit()

#mu = 0.0031/L;                   # assumes labor productivity growth of frontier firms through learning at the rate of 0.X# per year
sigma::Float64 = 0.095/L                  # standard size of the shocks makes standard deviation of productivity of 33# of average productivity
theta::Float64 = 0.012/L                  # growth rate of TFP of laggard firms in an industry is x# on leader
## industry dynamics

Grid::Int64 = 10
lower_bound::Float64 = 0.450
upper_bound::Float64 = 0.850                                     # size of the grid of effort levels
resolution::Float64 = (upper_bound-lower_bound)/Grid
J::Int64 = 100                        # number of industries
C_max::Int64 = 30                       # maximum number of firms in an industry

function MARCES(years, L, T, beta, growth_rate, r, Ea, gamma, m, s, Emerger_synergy, CostEffort, CostLabor, psi, Eexit_tolerance,
    sigma, theta, Grid, lower_bound, upper_bound, resolution, J, C_max)
    
    TFPaggregate::Matrix{Float64} = zeros(Grid+1,T) 
    Emarkup::Matrix{Float64} = zeros(Grid+1,1)
    
    Effort = (lower_bound:resolution:upper_bound)           # initial allocation of household's labor to production of blueprints
    
    AN::Matrix{Float64} = zeros(1,Grid+1)
    Aexit_rate::Matrix{Float64} = zeros(1,Grid+1)
    Amerger_rate::Matrix{Float64} = zeros(1,Grid+1)
    Amerger_success::Matrix{Float64} = zeros(1,Grid+1)
    Labor_supply::Matrix{Float64} = zeros(1,Grid+1)
    Response_effort::Matrix{Float64} = zeros(1,Grid+1)
    EGDP_growth_rate::Matrix{Float64} = zeros(1,Grid+1)
    EUtility::Matrix{Float64} = zeros(1,Grid+1)
    SD::Matrix{Float64} = zeros(1,Grid+1)
    Alpha::Matrix{Int64} = [1000 1800 2500]
    Beta::Matrix{Int64} = [7500 6000 3000]
    sample::Int64 = 6
    Results::Array{Float64,3} = zeros(6,sample,2)
    Utility::Matrix{Float64} = zeros(1,T)
    Z = zeros(T+1,C_max,J);                                    # grid of productivity levels
    N = ones(1,J);                                           # vector of firms operating in each industry [assume an initial number of 1 firm operating each industry]
    
    Nindustry::Matrix{Float64} = zeros(T,J);                                  # number of firm operating at beginning of each period in each industry
    Exit::Matrix{Float64} = zeros(T,J);                                       # number of firm exits per period in each industry
    Entry::Matrix{Float64} = zeros(T,J);                                      # number of firm enter per period in each industry [binary variable]
    N_merger::Matrix{Float64} = zeros(T,J);                                   # number of mergers per period in each industry [binary variable]
    
    Price::Matrix{Float64} = zeros(T,J)
    Employment::Matrix{Float64} = zeros(T,J)
    Profit::Matrix{Float64} = zeros(T,J)
    
    HHI::Matrix{Float64} = zeros(T,J)
    HHI_post_merger::Matrix{Float64} = zeros(T,J)
    DeltaHHI::Matrix{Float64} = zeros(T,J)
    w::Matrix{Float64} = ones(1,T);         
    GDP::Matrix{Float64} = 1.5*w;           # dumny GDP at each period
    consumption::Matrix{Float64} = zeros(1,J)
    Consumption::Matrix{Float64} = zeros(1,T)
    HHI_index::Matrix{Float64} = zeros(1,J)
    Ugrowth_rate::Matrix{Float64} = zeros(1,T)
    EU::Matrix{Float64} = zeros(1,T)

    Prod::Array{Float64,3} = zeros(T,C_max,J)
    Variance::Array{Float64,3} = zeros(T,C_max,J)
    EP::Matrix{Float64} = zeros(T,J)
    Standard_deviation::Matrix{Float64} = zeros(T,J)
    Standard_deviation2::Matrix{Float64} = zeros(T,J)
    Output::Matrix{Float64} = zeros(T,J) #ya esta afuera
    TFP::Matrix{Float64} = zeros(T,J);  #ya esta afuera del loop de T y J
    Z_merger::Matrix{Float64} = zeros(1,C_max)

    Employ::Matrix{Float64} = zeros(1,T)
    EN::Matrix{Float64} = zeros(1,T)
    AProfit::Matrix{Float64} = zeros(1,T)
    AMarkup::Matrix{Float64} = zeros(1,T)
    Markup::Matrix{Float64} = zeros(1,T)

    Average_Profit::Matrix{Float64} = zeros(1,T)
    Exit_rate::Matrix{Float64} = zeros(1,T)
    Merger_rate::Matrix{Float64} = zeros(1,T)

    @time begin
    @inbounds for guidelines = 1:3
    @inbounds for z = 1:sample
        @inbounds for A = 1:Grid+1
        Z[1,1,:] .= 1; 
        
        @inbounds for j = 1:J
        
        Exit_tolerance = 2*rand()*Eexit_tolerance
        
        @inbounds for t = 1:T
            
             C = w[t]*ones(1,Int(N[j]))
        
             @inbounds for c = 1:Int(N[j])
                C[c] = w[t]/exp(Z[t,c,j]); #set up cost array
            end
            
             Y = CournotCompetitionCES(C,s,GDP[t]); #output array in each period
            Price[t,j] = (GDP[t]/sum(Y))^(1/s)
             employment = zeros(1,Int(N[j]))
             @inbounds for i = 1:Int(N[j])
                employment[i] = Y[i]*C[i]/w[t]
            end
        
            Employment[t,j] = sum(employment)
            Profit[t,j] = GDP[t]/(Price[t,j]^(s-1))-Employment[t,j]*w[t]
        
            @inbounds for  c = 1:Int(N[j]) # firm exit()
                if C[c] .> (1+Exit_tolerance)*Price[t,j]
                    Exit[t,j] = Exit[t,j]+1
                    @inbounds for i = c:Int(N[j])
                        Z[t,i,j] = Z[t,i+1,j]; # exit occurs
                    end
                   N[j] = N[j]-1
                end
            end
        
            if N[j] .> 1
        
                merger = rand();                    # merger opportunity
                
                if merger .< N[j]*m
                   
                   Merger_synergy = 2*Emerger_synergy*rand()
                    i = Int(max(round(merger/m*((N[j]-1)/N[j])),1))
                    C_no_merger = w[t]*ones(1,Int(N[j]))
            
                    @inbounds for k =1:Int(N[j])
                       C_no_merger[k] = w[t]/exp(Z[t,k,j]); #set up cost array
                   end
        
                    Y_no_merger = CournotCompetitionCES(C_no_merger,s,GDP[t]) 
                    profits_no_merger = ((GDP[t]/sum(Y_no_merger))^(1/s)-C_no_merger[i])*Y[i] + ((GDP[t]/sum(Y_no_merger))^(1/s)-C_no_merger[i+1])*Y[i+1]
                   
                   HHI_index = zeros(1,Int(N[j]))
            
                   @inbounds for b = 1:Int(N[j])
                       HHI_index[b] = (100*Y_no_merger[b]/sum(Y_no_merger))^2
                   end
                
                   HHI[t,j] =  sum(HHI_index)
            
                   
                   C_merger = w[t]*ones(1,Int(N[j])-1)
            
                   @inbounds for k = 1:i-1
                       C_merger[k] = C_no_merger[k]
                       Z_merger[k] = Z[t,k,j]
                   end
                   @inbounds for k = i           
                       C_merger[k] = (1-Merger_synergy)*(((HHI_index[k])^(1/2))*C_no_merger[k]+((HHI_index[k+1])^(1/2))*C_no_merger[k+1])/(((HHI_index[k])^(1/2))+((HHI_index[k+1])^(1/2)))
                       Z_merger[k] = log(w[t]/C_merger[k])
                   end
                   @inbounds for k = i+1:Int(N[j])-1
                       C_merger[k] = C_no_merger[k+1]
                       Z_merger[k] = Z[t,k+1,j]
                   end
            
                   Y_merger = CournotCompetitionCES(C_merger,s,GDP[t])
            
                   profits_merger = ((GDP[t]/sum(Y_merger))^(1/s)-C_merger[i])*Y_merger[i]
                   
                   HHI_post_merger_index = zeros(1,Int(N[j])-1)
            
                   @inbounds for b = 1:Int(N[j])-1
                       HHI_post_merger_index[b] = (100*Y_merger[b]/sum(Y_merger))^2
                   end
                
                   HHI_post_merger[t,j] = sum(HHI_post_merger_index)
            
                   DeltaHHI[t,j] = HHI_post_merger[t,j] - HHI[t,j]
        
                   if profits_merger .> profits_no_merger # condition for the merger to occur
                    if HHI_post_merger[t,j] .< Alpha[guidelines]       # condition for the merger to occur
         
                        C = C_merger
                        Z[t,:,j] = Z_merger
                        N[j] = N[j]-1
                        N_merger[t,j] = 1
         
                     else()
                         if HHI_post_merger[t,j] .> Alpha[guidelines] && DeltaHHI[t,j] .< 100+Beta[guidelines]*Merger_synergy          # condition for the merger to occur
                   
                          C = C_merger
                          Z[t,:,j] = Z_merger
                          N[j] = N[j]-1
                          N_merger[t,j] = 1
         
                        end          
        end
        end
        end
        end 
        
        @inbounds for c in 1:Int(N[j]) # productivity process of firms that do not exit
            Z_frontier = maximum(Z[t, :, j])
            if Z[t, c, j] < Z_frontier
                Z[t+1, c, j] = Z[t, c, j] + randn() * sigma + theta
            else
                Z[t+1, c, j] = Z[t, c, j] + randn() * sigma
            end
        end
        
           enter = rand();                                      # entry process
        
        if enter < Effort[A]*gamma                            # probability that entry occurs
           alpha = 2*rand()*Ea
           Z[t+1,Int(N[j])+1,j] = maximum(Z[t,:,j]) + alpha
           N[j] = N[j]+1
           Entry[t,j] = 1
        end
        
        Nindustry[t,j] = N[j]
        
        end
        end

         @inbounds for t =1:T
            Employ[t] = sum(Employment[t,:])/J
            AProfit[t] = sum(Profit[t,:])/J
            AMarkup[t] = AProfit[t]/(AProfit[t]+w[t]*Employ[t])
            EN[t] = sum(Nindustry[t,:])/J
        end
        
         Total_exit = sum(sum(Exit))
         Total_entry = sum(sum(Entry))
        
        
         @inbounds for t = Int(T/2):T
            Markup[t] = AMarkup[t]
        end
        
        Emarkup[A] = sum(Markup)/(T/2+1)
        
        ## value of firms

        @inbounds for t = 1:T
            Average_Profit[t] = sum(Profit[t,:])/(Employ[t]*sum(Nindustry[t,:]));  #normalize by normalized employment which is equal to 1
            Exit_rate[t] = sum(Exit[t,:])/J
            Merger_rate[t] = sum(N_merger[t,:])/J
        end
           
        AN[A] = sum(EN)/T
        Aexit_rate[A] = L*sum(Exit_rate)/T
        Amerger_rate[A] = L*sum(Merger_rate)/T
        Amerger_success[A] = sum(sum(N_merger))/(T*J*AN[A]*m)
        
        V = zeros(1,Int(T/2))
        
        V[Int(T/2)] = Average_Profit[T]/(1-beta*(1+growth_rate))
        
        @inbounds for t = T/2:(T-2)
            V[Int(T-t-1)] = Average_Profit[Int(T-t-1)]+(1+growth_rate)*beta*(1-Exit_rate[Int(T-t-1)]-Merger_rate[Int(T-t-1)]/2)*V[Int(T-t)]
        end
        
        Response_effort[A] = ((gamma*V[1])/(CostEffort*GDP[1]))^psi; # disulity of entrepreneurial effort is CostEffort*GDP*(Effort^(1+1/psi))/(1+1/psi)
        Labor_supply[A] = ((1-Emarkup[A])/CostLabor)^psi; 
        @inbounds for t = 1:T
            @inbounds for j = 1:J
                Output[t,j] = GDP[t]/(Price[t,j]^s)
                TFP[t,j] = Output[t,j]/Employment[t,j]
            end
            TFPaggregate[A,t] = sum(Output[t,:])/sum(Employment[t,:])
        end

        ## GDP index [chained index]
        
        GDP = 100*ones(1,T)
        
        @inbounds for t in 1:T-1
            GDP[t+1] = Employ[t]/Employ[t+1] * GDP[t] * ((sum(Price[t,:] .* Output[t+1,:]))/(sum(Price[t,:] .* Output[t,:])))^(1/2) * ((sum(Price[t+1,:] .* Output[t+1,:]))/(sum(Price[t+1,:] .* Output[t,:])))^(1/2)
        end
        
        # growth rate [after T/2 to measure growth after the economy reached balanced growth path]
        
        #GDP_growth_rate = zeros(1,Int(T/2)) claro que no funcionara porque es una matriz de 1x30, y despues se le asigna valores desde el indice 31, el cual no existe
        GDP_growth_rate = zeros(1,T)
        
        @inbounds for t = Int(T/2):T-1
            GDP_growth_rate[t] = 100*((GDP[t+1]/GDP[t])^(L-1))
        end
        
        EGDP_growth_rate[A] = sum(GDP_growth_rate)/(Int(T/2))


        @inbounds for t = 1:T
            @inbounds for j = 1:J
                consumption[j] = (Output[t,j]/Employ[t])^((s-1)/s)
            end
        
        Consumption[t] = Labor_supply[A]*((sum(consumption)/J)^(s/(s-1)))
        
        Utility[t] = Consumption[t]-((CostEffort/(1+1/psi))*(Effort[A]^(1+1/psi))+(CostLabor/(1+1/psi))*(Labor_supply[A]^(1+1/psi)))*Consumption[t]
        
        end
        
        @inbounds for t = Int(T/2):T
            Ugrowth_rate[t] = Utility[t]/Utility[t-1]-1
            EU[t] = (beta^(t-T/2))*Utility[t]
        end
        
        EUtility[A] = sum(EU)+(beta^(T/2))*EU[T]/(1-beta-Ugrowth_rate[T])

        
        @inbounds for j = 1:J
            @inbounds for t = Int(T/2):T
                @inbounds for i = 1:Int(Nindustry[t,j])
                    Prod[t,i,j] = exp(Z[t,i,j])
                end
                EP[t,j] = sum(Prod[t,:,j])/Nindustry[t,j]
                for i = 1:Int(Nindustry[t,j])
                    Variance[t,i,j] = (Prod[t,i,j]/EP[t,j]-1)^2
                end
                Standard_deviation[t,j] = (sum(Variance[t,:,j])/Nindustry[t,j])^(.5)
            end
        end
        
         SD[A] = sum(sum(Standard_deviation))/(J*(Int(T/2)+1))

        end
end
end
end
end

stevengj · August 14, 2024, 10:40pm

All these declarations do nothing for performance; they just clutter your code and make it harder to make generic (e.g. ideally the same function implementation will work with both double and single precision, depending on the argument types). You don’t need to worry too much about writing generic code for now, but you should definitely gain some understanding of what declarations do and don’t do before you start sprinkling them everywhere.

Benjamin7 · August 14, 2024, 10:43pm

Thank you!
I’ll consider it, do you recommend any resources to understand what declarations do and don’t?

stevengj · August 14, 2024, 10:45pm

e.g. See Argument-Type declarations in the manual.

stevengj · August 14, 2024, 10:52pm

Honestly, your code is too long for me (and probably most others) to want to help you optimize it. I can see lots of things that can be improved (e.g. arrays that seem to be allocated just to sum the contents, rather than accumulating the sum as the quantities are computed; use of x^(1/2) rather than sqrt(x); allocation of Matlab-style 1-row or 1-column matrices rather than 1d arrays; re-computation of sum( Y_no_merger) of the same array over and over in a loop, confusing use of broadcasting calls like .> on scalar operands, allocating slices like Z[t, :, j] without using @views, …), but I feel like I’m just looking at random lines.

It would be more productive to have a self-contained representative example function < 50 lines, along with example data that lets people run it, that people can give you advice on, and then you can carry that advice over to longer code.

Satvik · August 14, 2024, 11:19pm

If you haven’t looked into profiling, that’s the next step.

My general process for optimizing code like this is to profile it with TimerOutputs.jl. You can put @timeit calls around functions or lines that you suspect might be slow, and drill down with more of those calls to find out exactly what’s taking a lot of time or using a lot of allocations. Then you can ask more specific questions about how to optimize those sections.

Noel_Araujo · August 15, 2024, 3:14am

Just to add an extra commend on the profiling side

VSCode is indicating that the operation ^ is consuming half of your execution time. Changing a^(1/2) to sqrt(a) or using other manipulations to avoid this operation is the first advice.

DNF · August 15, 2024, 7:42am

I wouldn’t use @inbounds at all, or at least only do that after you have fixed all other performance issues. But if you do, know that you only need to put @inbounds in front of the outermost loop, then it will apply to all nested loops. This:

@inbounds for j = 1:J
    @inbounds for t = Int(T/2):T
        @inbounds for i = 1:Int(Nindustry[t,j])

is the same as this:

@inbounds for j = 1:J
    for t = Int(T/2):T
        for i = 1:Int(Nindustry[t,j])

Topic		Replies	Views
Matlab versus Julia General Usage	33	4942	July 15, 2021
Optimizing performance (newcomer from Matlab) General Usage first-steps	19	1217	November 8, 2019
MATLAB outperforms Julia (20 times faster) running this nested loop Performance question , performance , matlab , time , finitediff	20	5345	January 10, 2023
Optimize benchmark-code Performance question	11	680	December 10, 2019
Julia vs Matlab - building a Jacobian matrix New to Julia	30	4224	March 28, 2019

Help with for-loops optimization / Translation from Matlab

Related topics