# Very slow loop

Hi all,

I am trying to speed up this loop. I am providing a MWE and I wanted to know if any of you could spot any obvious performance improvementes.

``````using LinearAlgebra, Statistics, Random

C=rand([0,1],120,13,8000)

D = rand(120)

E=findall(sum(sum(C,dims=1), dims=2)[1,1,:].>0)

P=rand(120,8000)
H=(P.>0.5)
H_D=(D.>0.5)
G_C=(D.<0.05)
A=ones(8000,13)
NF=8000

@views function fun1(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
J=size(C)
Tmax=size(C)
T=size(C)-1
N=size(C)
num_large0 = zeros(2)
num_small0 = zeros(2)
num_large1 = zeros(2)
num_small1 = zeros(2)
num2 = zeros(1)
num_b0=zeros(1)
num_b1=zeros(1)
K = size(E)

if N>NF
A= vcat(A,A,A)
H = cat(H ,H ,H ,dims=2)
P= cat(P,P,P,dims=2)
end

@fastmath @inbounds for n=1:K, t=2:Tmax, j=1:J
c0 = C[j,t,E[n]]*A[E[n],t]
c1 = C[j,t-1,E[n]]*(C[j,t,E[n]])*A[E[n],t]

num_large0 = num_large0 .+ c0.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small0 = num_small0 .+ c0.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]

num_large1 = num_large1 .+ c1.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small1 = num_small1 .+ c1.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]

num_b0 = num_b0 .+ c0.*(1-G_C[j])*P[j,E[n]]*D[j]
num_b1 = num_b1 .+ c1.*(1-G_C[j])*P[j,E[n]]*D[j]

if  H_D[j]==1
num_large0 = num_large0 .+ c0.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small0 = num_small0 .+ c0.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]

num_large1 = num_large1 .+ c1.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small1 = num_small1 .+ c1.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]
end
end
return vcat(num_large0./(J*N),  num_large1./(J*N), num_small0./(J*N), num_small1./(J*N), num_b0./(J*N), num_b1./(J*N))
end

TEST=@time fun1(C,D,E,P,H,H_D)

``````

Performance result:

`91.233291 seconds (1.18 G allocations: 23.470 GiB, 1.82% gc time)`

EDIT: Provided MWE. Please disregard how the data is being generated.

In these two lines you seem to be creating new vectors `c0` and `c1` at every iteration of the loop. If you can predict their size in advance, create them before the loop and mutate their values, that will be better. If those vectors are small, you may want to assign the values to immutable types (tuples, for example) instead of arrays.

2 Likes

Hi @lmiq, thanks for the reply!

Actually, those are scalars. Can you be more precise on how to write this?

That is the problem of not providing a MWE The effort required to understand what you are doing is much greater.

I don’t understand why you are defining many variables as `zeros(1)`, for example. If those re scalars, why not just simply use scalars?

5 Likes

What is the type of the elements in `num_large0`? It seems to `Int`s, but at the same time, in the line you use `.+` as you were adding `Vector`s elementwise.

Also what’s the purpose of expanding A, H and P artificially? It’s just three times the same in a new dimension…

You are totally right, I added a MWE above!

It might be the case that the matrix C is three times larger than of A,H and P. Since these are data, I need to replicate these matrices three times. Nonetheless, as you can see in the MWE I am providing above NF==N.

I just added a MWE above to make these things clearer!

That’s still not a real MWE, because it contains several variables and is not really clear…
However, there are many operations which can be merged:

``````num_large0 = num_large0 .+ c0.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small0 = num_small0 .+ c0.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]

num_large1 = num_large1 .+ c1.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small1 = num_small1 .+ c1.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]
``````

Can be rewritten to:

``````c0_n = c0 .* P[j,E[n]]*D[j]
c1_n = c1.*P[j,E[n]]*D[j]

H2 = (H[j,E[n]]==2)
H1 = (H[j,E[n]]==1)

num_large0 += c0n * H2
num_small0 += c0n * H1

num_large1 += c1n * H2
num_small1 += c1n * H1
``````

Such transformations make your code easier to read, modify and is also slightly faster because you’re reusing computations. Also by using `.+=` we are operating in place instead of creating a new array.

Without knowing the problem we don’t know whether these 3 nested loops are the best approach or not. You are calculating the inner part already often:

``````julia> size(C) * size(C) * size(E)
12480000
``````
1 Like

Removing the unnecessary broadcast operations and adding tests of the “H”, as above, reduced the time here from 69 to 14 seconds:

Code
``````function fun1(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
J=size(C)
Tmax=size(C)
T=size(C)-1
N=size(C)
num_large0 = zeros(2)
num_small0 = zeros(2)
num_large1 = zeros(2)
num_small1 = zeros(2)
num_b0=0.
num_b1=0.
K = size(E)

if N>NF
A= vcat(A,A,A)
H = cat(H ,H ,H ,dims=2)
P= cat(P,P,P,dims=2)
end

for n=1:K, t=2:Tmax, j=1:J
c0 = C[j,t,E[n]]*A[E[n],t]
c1 = C[j,t-1,E[n]]*(C[j,t,E[n]])*A[E[n],t]

Htest1 = (H[j,E[n]]==1)
Htest2 = (H[j,E[n]]==2)

num_large0 = num_large0 + c0*Htest2*P[j,E[n]]*D[j]
num_small0 = num_small0 + c0*Htest1*P[j,E[n]]*D[j]

num_large1 = num_large1 + c1*Htest2*P[j,E[n]]*D[j]
num_small1 = num_small1 + c1*Htest1*P[j,E[n]]*D[j]

num_b0 = num_b0 + c0*(1-G_C[j])*P[j,E[n]]*D[j]
num_b1 = num_b1 + c1*(1-G_C[j])*P[j,E[n]]*D[j]

if  H_D[j]==1
num_large0 = num_large0 + c0*Htest2*P[j,E[n]]*D[j]
num_small0 = num_small0 + c0*Htest1*P[j,E[n]]*D[j]

num_large1 = num_large1 + c1*Htest2*P[j,E[n]]*D[j]
num_small1 = num_small1 + c1*Htest1*P[j,E[n]]*D[j]
end
end
return vcat(num_large0./(J*N),  num_large1./(J*N), num_small0./(J*N), num_small1./(J*N), num_b0/(J*N), num_b1/(J*N))
end

``````

The other important thing is that you are running the function with:

``````@time fun1(C,D,E,P,H,H_D)
``````

But the function expects a series of optional parameters, which are not being passed:

``````function fun1(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
``````

In this case `G_C=G_C`, for example, does not mean anything concrete. But when you run it it is taking `G_C` from the global scope, making it type-unstable. Either you want to initialize the optional `G_C` with a concrete value (`GC=(D.<0.05`) or pass it as a parameter when calling the function. If you do that, with:

``````julia> @time fun1(C,D,E,P,H,H_D,G_C=G_C,A=A,NF=NF)
12.283477 seconds (682.22 M allocations: 10.170 GiB, 7.61% gc time)

``````

time is reduced further. (the same arguments are valid for the other optional parameters you are passing to the function). You can find out that your function is type-unstable with the previous function call with:
(edit: I ran a wrong version before and reported a much smaller time).

``````julia> @code_warntype fun1(C,D,E,P,H,H_D)
Variables
#self#::Core.Compiler.Const(fun1, false)
C::Array{Int64,3}
D::Array{Float64,1}
E::Array{Int64,1}
P::Array{Float64,2}
H::BitArray{2}
H_D::BitArray{1}
G_C::Any
A::Any
NF::Any

....

``````

I have a code that run on 7s in my machine. Not sure I did not mess something up, good to check if it is the same answer.

``````using LinearAlgebra, Statistics, Random

C=rand([0,1],120,13,8000)

D = rand(120)

E=findall(sum(sum(C,dims=1), dims=2)[1,1,:].>0)

P=rand(120,8000)
H=convert.(Float64, (P.>0.5))
H_D=convert.(Float64, (D.>0.5))
G_C=convert.(Float64, (D.<0.05))
A=ones(8000,13)
NF=8000

@views function fun1(C,D,E,P,H,H_D,G_C,A,NF)
J=size(C)
Tmax=size(C)
T=size(C)-1
N=size(C)
num_large0 = zeros(2)
num_large1 = zeros(2)
num_small0 = zeros(2)
num_small1 = zeros(2)
# num2 = zeros(1) # not used?
num_b0 = num_b1 = 0 # changed from 1-element vectors to scalars
K = size(E)

if N>NF
A= vcat(A,A,A)
H = cat(H ,H ,H ,dims=2)
P= cat(P,P,P,dims=2)
end

@fastmath @inbounds for n=1:K, t=2:Tmax, j=1:J
c0 = C[j,t,E[n]]*A[E[n],t]
c1 = C[j,t-1,E[n]]*C[j,t,E[n]]*A[E[n],t] # removed unnecessary ()

HjEn2 = convert(Float64, H[j,E[n]]==2)
HjEn1 = convert(Float64, H[j,E[n]]==1)
PjEnDj = P[j,E[n]] * D[j]
term0 = c0 * PjEnDj
term1 = c1 * PjEnDj
c0H1 = term0 * HjEn1
c0H2 = term0 * HjEn2
c1H1 = term1 * HjEn1
c1H2 = term1 * HjEn2

num_large0 += c0H2
num_small0 += c0H1

num_large1 += c1H2
num_small1 += c1H1

num_b0 += c0 * (1.0 - G_C[j]) * PjEnDj
num_b1 += c1 * (1.0 - G_C[j]) * PjEnDj

if  H_D[j]==1
num_large0 += c0H2
num_small0 += c0H1

num_large1 += c0H2
num_small1 += c0H1
end
end
return vcat(num_large0./(J*N),  num_large1./(J*N), num_small0./(J*N), num_small1./(J*N), num_b0./(J*N), num_b1./(J*N))
end

@time fun1(C,D,E,P,H,H_D, G_C, A, NF)
@time fun1(C,D,E,P,H,H_D, G_C, A, NF)
``````
1 Like

This part of the function is hugely detrimental to performance:

``````    if N>NF
A= vcat(A,A,A)
H = cat(H ,H ,H ,dims=2)
P= cat(P,P,P,dims=2)
end

``````

All those matrices become type-unstable if you leave that there. The time goes to 0.2 seconds here if I comment that, and the result is the same.

Thus, my suggestion would be to put that in an outer function, and nest the call to the main function within it:

``````function outer(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
N=size(C)
if N>NF
A= vcat(A,A,A)
H = cat(H ,H ,H ,dims=2)
P= cat(P,P,P,dims=2)
end
fun1(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
end

``````

Result:

``````julia> @time outer(C,D,E,P,H,H_D,G_C=G_C,A=A,NF=NF)
0.231706 seconds (113 allocations: 3.781 KiB)

``````

Here is the complete code I am running:

Code
``````using LinearAlgebra, Statistics, Random

C=rand([0,1],120,13,8000)

D = rand(120)

E=findall(sum(sum(C,dims=1), dims=2)[1,1,:].>0)

P=rand(120,8000)
H=(P.>0.5)
H_D=(D.>0.5)
G_C=(D.<0.05)
A=ones(8000,13)
NF=8000

function outer(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
N=size(C)
if N>NF
A= vcat(A,A,A)
H = cat(H ,H ,H ,dims=2)
P= cat(P,P,P,dims=2)
end
fun1(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
end

function fun1(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
J=size(C)
Tmax=size(C)
T=size(C)-1
N=size(C)
num_large0 = zeros(2)
num_small0 = zeros(2)
num_large1 = zeros(2)
num_small1 = zeros(2)
num_b0=0.
num_b1=0.
K = size(E)

for n=1:K, t=2:Tmax, j=1:J
c0 = C[j,t,E[n]]*A[E[n],t]
c1 = C[j,t-1,E[n]]*(C[j,t,E[n]])*A[E[n],t]

Htest1 = (H[j,E[n]]==1)
Htest2 = (H[j,E[n]]==2)

num_large0 = num_large0 + c0*Htest2*P[j,E[n]]*D[j]
num_small0 = num_small0 + c0*Htest1*P[j,E[n]]*D[j]

num_large1 = num_large1 + c1*Htest2*P[j,E[n]]*D[j]
num_small1 = num_small1 + c1*Htest1*P[j,E[n]]*D[j]

num_b0 = num_b0 + c0*(1-G_C[j])*P[j,E[n]]*D[j]
num_b1 = num_b1 + c1*(1-G_C[j])*P[j,E[n]]*D[j]

if  H_D[j]==1
num_large0 = num_large0 + c0*Htest2*P[j,E[n]]*D[j]
num_small0 = num_small0 + c0*Htest1*P[j,E[n]]*D[j]

num_large1 = num_large1 + c1*Htest2*P[j,E[n]]*D[j]
num_small1 = num_small1 + c1*Htest1*P[j,E[n]]*D[j]
end
end
return vcat(num_large0./(J*N),  num_large1./(J*N), num_small0./(J*N), num_small1./(J*N), num_b0/(J*N), num_b1/(J*N))
end

``````

Here is your original code with minimal modifications, just adding this outer function, it runs in 0.5s:

Code
``````using LinearAlgebra, Statistics, Random

C=rand([0,1],120,13,8000)

D = rand(120)

E=findall(sum(sum(C,dims=1), dims=2)[1,1,:].>0)

P=rand(120,8000)
H=(P.>0.5)
H_D=(D.>0.5)
G_C=(D.<0.05)
A=ones(8000,13)
NF=8000

function outer(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
N=size(C)
if N>NF
A= vcat(A,A,A)
H = cat(H ,H ,H ,dims=2)
P= cat(P,P,P,dims=2)
end
fun1(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
end

function fun1(C,D,E,P,H,H_D;G_C=G_C,A=A,NF=NF)
J=size(C)
Tmax=size(C)
T=size(C)-1
N=size(C)
num_large0 = zeros(2)
num_small0 = zeros(2)
num_large1 = zeros(2)
num_small1 = zeros(2)
num2 = zeros(1)
num_b0=zeros(1)
num_b1=zeros(1)
K = size(E)

for n=1:K, t=2:Tmax, j=1:J
c0 = C[j,t,E[n]]*A[E[n],t]
c1 = C[j,t-1,E[n]]*(C[j,t,E[n]])*A[E[n],t]

num_large0 = num_large0 .+ c0.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small0 = num_small0 .+ c0.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]

num_large1 = num_large1 .+ c1.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small1 = num_small1 .+ c1.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]

num_b0 = num_b0 .+ c0.*(1-G_C[j])*P[j,E[n]]*D[j]
num_b1 = num_b1 .+ c1.*(1-G_C[j])*P[j,E[n]]*D[j]

if  H_D[j]==1
num_large0 = num_large0 .+ c0.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small0 = num_small0 .+ c0.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]

num_large1 = num_large1 .+ c1.*(H[j,E[n]]==2)*P[j,E[n]]*D[j]
num_small1 = num_small1 .+ c1.*(H[j,E[n]]==1)*P[j,E[n]]*D[j]
end
end
return vcat(num_large0./(J*N),  num_large1./(J*N), num_small0./(J*N), num_small1./(J*N), num_b0./(J*N), num_b1./(J*N))
end

@time outer(C,D,E,P,H,H_D,G_C=G_C,A=A,NF=NF)
``````
5 Likes

The code contains the equality, `H[j,E[n]]==2`, however, the array is defined as `H=(P.>0.5)`. In this case, `H` can only be true or false (0 or 1), so the equality is always false. Therefore, all the lines that depend on that equality are also zero. When I run the code the first four elements of the output are all `0.0`, and those are the elements that depend on that equality statement.

If this is a bug, then you could try to fix the equality for what you really need. Or, if it’s not a bug, then you could simply remove all the lines that depend on it, saving a ton of processing.

Edit: Running @lmiq’s latest code with the lines commented out that are always zero, I get an additional 30% speed-up. Also, this was mentioned, but not very explicitly, it doesn’t really make sense to make `G_C`, `A`, and `NF` keywords, which are typically optional. In this function, they are definitely not optional parameters, so it would be simpler to just have them as regular variables.

2 Likes

Dear Leandro,

Thanks for the response. Could you please explain when is it okay to pass optional parameters to a function? and when is it better to pass them directly as arguments of the function?

Is there any case in which I can pass the parameters after the semi-colon without affecting performance?

When I tested it, having as `G_C` , `A` , as `NF` keywords versus standard variables didn’t result in a speed up, but was more about practice. You just don’t want to have non-constant global variables within your function, that is when things slow down.

The problem there was that you were doing something like:

``````julia> function f(;x=x)
2*x
end
f (generic function with 1 method)

julia> x = 1
1

julia> @code_warntype f()
Variables
#self#::Core.Compiler.Const(f, false)
x::Any

Body::Any
1 ─ %1 = Main.x::Any
│        (x = %1)
│   %3 = Main.:(var"#f#1")(x, #self#)::Any
└──      return %3

``````

`x=x` there does not really attribute any default value to x, and thus that value will be determined from the global variable `x`, which is not constant. Those `Any`s mean that the types are not stable, and things will be slow.

If you want to pass `x` as optional but attribute a default value to it, use an actual concrete value,

``````julia> function g(;x=1)
2*x
end
g (generic function with 1 method)

julia> @code_warntype g()
Variables
#self#::Core.Compiler.Const(g, false)

Body::Int64
1 ─ %1 = Main.:(var"#g#2")(1, #self#)::Core.Compiler.Const(2, false)
└──      return %1

``````

Note that in the first option, your function simply cannot be run without explicitly passing `x`, if `x` was not set globally before (new Julia session):

``````julia> function f(;x=x)
2*x
end
f (generic function with 1 method)

julia> f()
ERROR: UndefVarError: x not defined
Stacktrace:
 f() at ./REPL:2
 top-level scope at REPL:1

``````
1 Like

The problem was not the use of keyword variables, it was not initializing them properly with default values neither passing the values.

3 Likes

Thanks so much. I see. If I were to define x as a constant outside of the function, would I have a problem?

If you mean:

``````julia> f(;x=x) = 2*x
f (generic function with 1 method)

julia> const x = 1
1

julia> @code_warntype f()
Variables
#self#::Core.Compiler.Const(f, false)
x::Int64

Body::Int64
1 ─ %1 = Main.x::Core.Compiler.Const(1, false)
│        (x = %1)
│   %3 = Main.:(var"#f#1")(x::Core.Compiler.Const(1, false), #self#)::Core.Compiler.Const(2, false)
└──      return %3

``````

That will perform well. Yet, using `x=x` does not really make sense. You can initialize the variable with something meaningful, and dependent on the other variables, if you want:

``````julia> g(y;x=2*y) = x + y
g (generic function with 1 method)

julia> g(1)
3

``````

or, more generally:

``````julia> initx(y) = 3*y
initx (generic function with 1 method)

julia> h(y;x=initx(y)) = x + y
h (generic function with 1 method)

julia> h(1)
4

julia> h(1,x=5)
6

``````