# Help with for-loops optimization / Translation from Matlab

Hi, I have been asked to translate a matlab code to Julia to gain speed, what I have done is to put each for loop with the @inbound, put each array and all important code in a function, I have defined the type of almost all objects (array, arrays, const). The code is already 6 times faster than Matlab, but I know it can be improved even more I just don’t know where or what to do now, I’m stuck. I would be very grateful if you can give me some advice or recommend some reading to improve this code even more. I’m not looking for someone to do it for me, I’ve learned a lot working on this and I intend to keep learning :). Thank you very much.

``````function CournotCompetitionCES(C,s,GDP) # Cournot competition [finding the fixed point
X, N = size(C);              # number of competing firms
iterations = 16;               # number of iterations
y = zeros(N,iterations+1)
BRy = zeros(1,N)
initial_y = ones(1,N)*GDP/(N*maximum(C))
infimum = (10^-8)*GDP/((maximum(C))^s)
y[:,1] = initial_y
Y = sum(y[:,1])
#diff = zeros(1,iterations)
for j = 1:iterations
for i = 1:N
yni = Y - y[i,j]
BRy[i] = max.(infimum,(Y*s)*(1-(C[i]*((Y/GDP)^(1/s)))))
y[i,j+1] = BRy[i]
Y = yni+BRy[i]
end
#diff(j) = 100*sum(abs(y[:,j]-BRy'))/(N*GDP)
end
Result = y[:,iterations+1]'
return Result
end

years::Int64 = 30                      # length of time in years
L::Int64 = 20                           # number of periods in a year
T::Int64 = L*years
beta::Float64 = 0.97^(1/L)                # discount factor()
growth_rate::Float64 = 0.015/L
r::Float64 = 1-beta                       # discount rate
Ea::Float64 = 0.010                       # gain & advantage of entry over all existing firms
gamma::Float64 = 0.10/L                   # parameters of the entry production function
m::Float64 = 0.030/L                        # merger chance of X# per year
s::Float64 = 1.75                         # elasticity of substitution
Emerger_synergy::Float64 = 0.050          # percentage reduction in cost from a merger
CostEffort::Float64 = 1.00
CostLabor::Float64 = 1.00
psi::Float64 = 3.0                        # Frisch elasticity of substitution for labor/entrepreneurial supply

Eexit_tolerance::Float64 = 0.27           # marginal cost tolerance for exit()

#mu = 0.0031/L;                   # assumes labor productivity growth of frontier firms through learning at the rate of 0.X# per year
sigma::Float64 = 0.095/L                  # standard size of the shocks makes standard deviation of productivity of 33# of average productivity
theta::Float64 = 0.012/L                  # growth rate of TFP of laggard firms in an industry is x# on leader
## industry dynamics

Grid::Int64 = 10
lower_bound::Float64 = 0.450
upper_bound::Float64 = 0.850                                     # size of the grid of effort levels
resolution::Float64 = (upper_bound-lower_bound)/Grid
J::Int64 = 100                        # number of industries
C_max::Int64 = 30                       # maximum number of firms in an industry

function MARCES(years, L, T, beta, growth_rate, r, Ea, gamma, m, s, Emerger_synergy, CostEffort, CostLabor, psi, Eexit_tolerance,
sigma, theta, Grid, lower_bound, upper_bound, resolution, J, C_max)

TFPaggregate::Matrix{Float64} = zeros(Grid+1,T)
Emarkup::Matrix{Float64} = zeros(Grid+1,1)

Effort = (lower_bound:resolution:upper_bound)           # initial allocation of household's labor to production of blueprints

AN::Matrix{Float64} = zeros(1,Grid+1)
Aexit_rate::Matrix{Float64} = zeros(1,Grid+1)
Amerger_rate::Matrix{Float64} = zeros(1,Grid+1)
Amerger_success::Matrix{Float64} = zeros(1,Grid+1)
Labor_supply::Matrix{Float64} = zeros(1,Grid+1)
Response_effort::Matrix{Float64} = zeros(1,Grid+1)
EGDP_growth_rate::Matrix{Float64} = zeros(1,Grid+1)
EUtility::Matrix{Float64} = zeros(1,Grid+1)
SD::Matrix{Float64} = zeros(1,Grid+1)
Alpha::Matrix{Int64} = [1000 1800 2500]
Beta::Matrix{Int64} = [7500 6000 3000]
sample::Int64 = 6
Results::Array{Float64,3} = zeros(6,sample,2)
Utility::Matrix{Float64} = zeros(1,T)
Z = zeros(T+1,C_max,J);                                    # grid of productivity levels
N = ones(1,J);                                           # vector of firms operating in each industry [assume an initial number of 1 firm operating each industry]

Nindustry::Matrix{Float64} = zeros(T,J);                                  # number of firm operating at beginning of each period in each industry
Exit::Matrix{Float64} = zeros(T,J);                                       # number of firm exits per period in each industry
Entry::Matrix{Float64} = zeros(T,J);                                      # number of firm enter per period in each industry [binary variable]
N_merger::Matrix{Float64} = zeros(T,J);                                   # number of mergers per period in each industry [binary variable]

Price::Matrix{Float64} = zeros(T,J)
Employment::Matrix{Float64} = zeros(T,J)
Profit::Matrix{Float64} = zeros(T,J)

HHI::Matrix{Float64} = zeros(T,J)
HHI_post_merger::Matrix{Float64} = zeros(T,J)
DeltaHHI::Matrix{Float64} = zeros(T,J)
w::Matrix{Float64} = ones(1,T);
GDP::Matrix{Float64} = 1.5*w;           # dumny GDP at each period
consumption::Matrix{Float64} = zeros(1,J)
Consumption::Matrix{Float64} = zeros(1,T)
HHI_index::Matrix{Float64} = zeros(1,J)
Ugrowth_rate::Matrix{Float64} = zeros(1,T)
EU::Matrix{Float64} = zeros(1,T)

Prod::Array{Float64,3} = zeros(T,C_max,J)
Variance::Array{Float64,3} = zeros(T,C_max,J)
EP::Matrix{Float64} = zeros(T,J)
Standard_deviation::Matrix{Float64} = zeros(T,J)
Standard_deviation2::Matrix{Float64} = zeros(T,J)
Output::Matrix{Float64} = zeros(T,J) #ya esta afuera
TFP::Matrix{Float64} = zeros(T,J);  #ya esta afuera del loop de T y J
Z_merger::Matrix{Float64} = zeros(1,C_max)

Employ::Matrix{Float64} = zeros(1,T)
EN::Matrix{Float64} = zeros(1,T)
AProfit::Matrix{Float64} = zeros(1,T)
AMarkup::Matrix{Float64} = zeros(1,T)
Markup::Matrix{Float64} = zeros(1,T)

Average_Profit::Matrix{Float64} = zeros(1,T)
Exit_rate::Matrix{Float64} = zeros(1,T)
Merger_rate::Matrix{Float64} = zeros(1,T)

@time begin
@inbounds for guidelines = 1:3
@inbounds for z = 1:sample
@inbounds for A = 1:Grid+1
Z[1,1,:] .= 1;

@inbounds for j = 1:J

Exit_tolerance = 2*rand()*Eexit_tolerance

@inbounds for t = 1:T

C = w[t]*ones(1,Int(N[j]))

@inbounds for c = 1:Int(N[j])
C[c] = w[t]/exp(Z[t,c,j]); #set up cost array
end

Y = CournotCompetitionCES(C,s,GDP[t]); #output array in each period
Price[t,j] = (GDP[t]/sum(Y))^(1/s)
employment = zeros(1,Int(N[j]))
@inbounds for i = 1:Int(N[j])
employment[i] = Y[i]*C[i]/w[t]
end

Employment[t,j] = sum(employment)
Profit[t,j] = GDP[t]/(Price[t,j]^(s-1))-Employment[t,j]*w[t]

@inbounds for  c = 1:Int(N[j]) # firm exit()
if C[c] .> (1+Exit_tolerance)*Price[t,j]
Exit[t,j] = Exit[t,j]+1
@inbounds for i = c:Int(N[j])
Z[t,i,j] = Z[t,i+1,j]; # exit occurs
end
N[j] = N[j]-1
end
end

if N[j] .> 1

merger = rand();                    # merger opportunity

if merger .< N[j]*m

Merger_synergy = 2*Emerger_synergy*rand()
i = Int(max(round(merger/m*((N[j]-1)/N[j])),1))
C_no_merger = w[t]*ones(1,Int(N[j]))

@inbounds for k =1:Int(N[j])
C_no_merger[k] = w[t]/exp(Z[t,k,j]); #set up cost array
end

Y_no_merger = CournotCompetitionCES(C_no_merger,s,GDP[t])
profits_no_merger = ((GDP[t]/sum(Y_no_merger))^(1/s)-C_no_merger[i])*Y[i] + ((GDP[t]/sum(Y_no_merger))^(1/s)-C_no_merger[i+1])*Y[i+1]

HHI_index = zeros(1,Int(N[j]))

@inbounds for b = 1:Int(N[j])
HHI_index[b] = (100*Y_no_merger[b]/sum(Y_no_merger))^2
end

HHI[t,j] =  sum(HHI_index)

C_merger = w[t]*ones(1,Int(N[j])-1)

@inbounds for k = 1:i-1
C_merger[k] = C_no_merger[k]
Z_merger[k] = Z[t,k,j]
end
@inbounds for k = i
C_merger[k] = (1-Merger_synergy)*(((HHI_index[k])^(1/2))*C_no_merger[k]+((HHI_index[k+1])^(1/2))*C_no_merger[k+1])/(((HHI_index[k])^(1/2))+((HHI_index[k+1])^(1/2)))
Z_merger[k] = log(w[t]/C_merger[k])
end
@inbounds for k = i+1:Int(N[j])-1
C_merger[k] = C_no_merger[k+1]
Z_merger[k] = Z[t,k+1,j]
end

Y_merger = CournotCompetitionCES(C_merger,s,GDP[t])

profits_merger = ((GDP[t]/sum(Y_merger))^(1/s)-C_merger[i])*Y_merger[i]

HHI_post_merger_index = zeros(1,Int(N[j])-1)

@inbounds for b = 1:Int(N[j])-1
HHI_post_merger_index[b] = (100*Y_merger[b]/sum(Y_merger))^2
end

HHI_post_merger[t,j] = sum(HHI_post_merger_index)

DeltaHHI[t,j] = HHI_post_merger[t,j] - HHI[t,j]

if profits_merger .> profits_no_merger # condition for the merger to occur
if HHI_post_merger[t,j] .< Alpha[guidelines]       # condition for the merger to occur

C = C_merger
Z[t,:,j] = Z_merger
N[j] = N[j]-1
N_merger[t,j] = 1

else()
if HHI_post_merger[t,j] .> Alpha[guidelines] && DeltaHHI[t,j] .< 100+Beta[guidelines]*Merger_synergy          # condition for the merger to occur

C = C_merger
Z[t,:,j] = Z_merger
N[j] = N[j]-1
N_merger[t,j] = 1

end
end
end
end
end

@inbounds for c in 1:Int(N[j]) # productivity process of firms that do not exit
Z_frontier = maximum(Z[t, :, j])
if Z[t, c, j] < Z_frontier
Z[t+1, c, j] = Z[t, c, j] + randn() * sigma + theta
else
Z[t+1, c, j] = Z[t, c, j] + randn() * sigma
end
end

enter = rand();                                      # entry process

if enter < Effort[A]*gamma                            # probability that entry occurs
alpha = 2*rand()*Ea
Z[t+1,Int(N[j])+1,j] = maximum(Z[t,:,j]) + alpha
N[j] = N[j]+1
Entry[t,j] = 1
end

Nindustry[t,j] = N[j]

end
end

@inbounds for t =1:T
Employ[t] = sum(Employment[t,:])/J
AProfit[t] = sum(Profit[t,:])/J
AMarkup[t] = AProfit[t]/(AProfit[t]+w[t]*Employ[t])
EN[t] = sum(Nindustry[t,:])/J
end

Total_exit = sum(sum(Exit))
Total_entry = sum(sum(Entry))

@inbounds for t = Int(T/2):T
Markup[t] = AMarkup[t]
end

Emarkup[A] = sum(Markup)/(T/2+1)

## value of firms

@inbounds for t = 1:T
Average_Profit[t] = sum(Profit[t,:])/(Employ[t]*sum(Nindustry[t,:]));  #normalize by normalized employment which is equal to 1
Exit_rate[t] = sum(Exit[t,:])/J
Merger_rate[t] = sum(N_merger[t,:])/J
end

AN[A] = sum(EN)/T
Aexit_rate[A] = L*sum(Exit_rate)/T
Amerger_rate[A] = L*sum(Merger_rate)/T
Amerger_success[A] = sum(sum(N_merger))/(T*J*AN[A]*m)

V = zeros(1,Int(T/2))

V[Int(T/2)] = Average_Profit[T]/(1-beta*(1+growth_rate))

@inbounds for t = T/2:(T-2)
V[Int(T-t-1)] = Average_Profit[Int(T-t-1)]+(1+growth_rate)*beta*(1-Exit_rate[Int(T-t-1)]-Merger_rate[Int(T-t-1)]/2)*V[Int(T-t)]
end

Response_effort[A] = ((gamma*V[1])/(CostEffort*GDP[1]))^psi; # disulity of entrepreneurial effort is CostEffort*GDP*(Effort^(1+1/psi))/(1+1/psi)
Labor_supply[A] = ((1-Emarkup[A])/CostLabor)^psi;
@inbounds for t = 1:T
@inbounds for j = 1:J
Output[t,j] = GDP[t]/(Price[t,j]^s)
TFP[t,j] = Output[t,j]/Employment[t,j]
end
TFPaggregate[A,t] = sum(Output[t,:])/sum(Employment[t,:])
end

## GDP index [chained index]

GDP = 100*ones(1,T)

@inbounds for t in 1:T-1
GDP[t+1] = Employ[t]/Employ[t+1] * GDP[t] * ((sum(Price[t,:] .* Output[t+1,:]))/(sum(Price[t,:] .* Output[t,:])))^(1/2) * ((sum(Price[t+1,:] .* Output[t+1,:]))/(sum(Price[t+1,:] .* Output[t,:])))^(1/2)
end

# growth rate [after T/2 to measure growth after the economy reached balanced growth path]

#GDP_growth_rate = zeros(1,Int(T/2)) claro que no funcionara porque es una matriz de 1x30, y despues se le asigna valores desde el indice 31, el cual no existe
GDP_growth_rate = zeros(1,T)

@inbounds for t = Int(T/2):T-1
GDP_growth_rate[t] = 100*((GDP[t+1]/GDP[t])^(L-1))
end

EGDP_growth_rate[A] = sum(GDP_growth_rate)/(Int(T/2))

@inbounds for t = 1:T
@inbounds for j = 1:J
consumption[j] = (Output[t,j]/Employ[t])^((s-1)/s)
end

Consumption[t] = Labor_supply[A]*((sum(consumption)/J)^(s/(s-1)))

Utility[t] = Consumption[t]-((CostEffort/(1+1/psi))*(Effort[A]^(1+1/psi))+(CostLabor/(1+1/psi))*(Labor_supply[A]^(1+1/psi)))*Consumption[t]

end

@inbounds for t = Int(T/2):T
Ugrowth_rate[t] = Utility[t]/Utility[t-1]-1
EU[t] = (beta^(t-T/2))*Utility[t]
end

EUtility[A] = sum(EU)+(beta^(T/2))*EU[T]/(1-beta-Ugrowth_rate[T])

@inbounds for j = 1:J
@inbounds for t = Int(T/2):T
@inbounds for i = 1:Int(Nindustry[t,j])
Prod[t,i,j] = exp(Z[t,i,j])
end
EP[t,j] = sum(Prod[t,:,j])/Nindustry[t,j]
for i = 1:Int(Nindustry[t,j])
Variance[t,i,j] = (Prod[t,i,j]/EP[t,j]-1)^2
end
Standard_deviation[t,j] = (sum(Variance[t,:,j])/Nindustry[t,j])^(.5)
end
end

SD[A] = sum(sum(Standard_deviation))/(J*(Int(T/2)+1))

end
end
end
end
end

``````

All these declarations do nothing for performance; they just clutter your code and make it harder to make generic (e.g. ideally the same function implementation will work with both double and single precision, depending on the argument types). You don’t need to worry too much about writing generic code for now, but you should definitely gain some understanding of what declarations do and don’t do before you start sprinkling them everywhere.

3 Likes

Thank you!
I’ll consider it, do you recommend any resources to understand what declarations do and don’t?

e.g. See Argument-Type declarations in the manual.

1 Like

Honestly, your code is too long for me (and probably most others) to want to help you optimize it. I can see lots of things that can be improved (e.g. arrays that seem to be allocated just to sum the contents, rather than accumulating the sum as the quantities are computed; use of `x^(1/2)` rather than `sqrt(x)`; allocation of Matlab-style 1-row or 1-column matrices rather than 1d arrays; re-computation of `sum( Y_no_merger)` of the same array over and over in a loop, confusing use of broadcasting calls like `.>` on scalar operands, allocating slices like `Z[t, :, j]` without using `@views`, …), but I feel like I’m just looking at random lines.

It would be more productive to have a self-contained representative example function < 50 lines, along with example data that lets people run it, that people can give you advice on, and then you can carry that advice over to longer code.

7 Likes

If you haven’t looked into profiling, that’s the next step.

My general process for optimizing code like this is to profile it with TimerOutputs.jl. You can put `@timeit` calls around functions or lines that you suspect might be slow, and drill down with more of those calls to find out exactly what’s taking a lot of time or using a lot of allocations. Then you can ask more specific questions about how to optimize those sections.

2 Likes

Just to add an extra commend on the profiling side

VSCode is indicating that the operation `^` is consuming half of your execution time. Changing `a^(1/2)` to `sqrt(a)` or using other manipulations to avoid this operation is the first advice.

4 Likes

I wouldn’t use `@inbounds` at all, or at least only do that after you have fixed all other performance issues. But if you do, know that you only need to put `@inbounds` in front of the outermost loop, then it will apply to all nested loops. This:

``````@inbounds for j = 1:J
@inbounds for t = Int(T/2):T
@inbounds for i = 1:Int(Nindustry[t,j])
``````

is the same as this:

``````@inbounds for j = 1:J
for t = Int(T/2):T
for i = 1:Int(Nindustry[t,j])
``````
3 Likes