# For loop Performance

Can somebody help me in speeding up following function

function mean_loop(p::AbstractArray,q::AbstractArray)
a = p;
a1 = q;
z=0.0;
for i in 1:length(a1)
s=0.0;
@inbounds s= mean(a .> a1[i]);
z+= s;
s=0.0
end
return z
end
@time mean_loop(rand(10^6),rand(10^6)) :902.126519 secs

This isnâ€™t horribly slow, but I think it can be much more compact:

``````julia> mean_cast(p,q) = sum(p .> q') / length(p);

julia> mean_iter(p,q) = sum(p1>q1 for p1 in p, q1 in q) / length(p);

julia> using BenchmarkTools

julia> @btime mean_loop(\$(rand(10^4)),\$(rand(10^4)));
107.303 ms (30000 allocations: 54.17 MiB)

julia> @btime mean_cast(\$(rand(10^4)),\$(rand(10^4)));
128.281 ms (4 allocations: 11.93 MiB)

julia> @btime mean_iter(\$(rand(10^4)),\$(rand(10^4)));
76.649 ms (7 allocations: 160 bytes)
``````

Try this:

``````function mean_loop(p::AbstractArray, q::AbstractArray)
z = 0.0
for x in q
z += mean(t -> t > x, p)
end
return z
end
``````

Nicely done, thatâ€™s about 7x faster than my `mean_iter`. Which is a little surprising to me, donâ€™t they iterate through the same things?

Welcome to the JuliaLang discourse @sam1988!

Iâ€™d recommend reading PSA: make it easier to help you for advice on how to format your code and ask questions in a way that makes it more likely for you to get high quality responses which are also useful to others in the future (thought it appears youâ€™ve already got some good replies!).

Furthermore, when benchmarking your code, using `@time` can be very misleading. In the example you gave, `@time mean_loop(rand(10^6),rand(10^6))` will include the time it takes to compile the function `mean_loop` and the time it takes to create the arrays `rand(10^6)` and `rand(10^6)`, so most of the time youâ€™re measuring is not actually code relevant to `mean_loop`. Instead, I would use the `@btime` macro from BenchmarkTools.jl to properly measure performance like this:

``````using BenchmarkTools
@btime mean_loop(\$(rand(10^6)), \$(rand(10^6)))
``````

The `@btime` macro will not include compile time in itâ€™s benchmarks, and it runs the benchmarking function many times so it can give statistically significant results). When I write `\$(rand(10^6))` inside `@btime`, that means that I am actually precomputing the array `rand(10^6)` and pasting the value in as a constant to the function before the benchmark loop is run.

1 Like

I guess thereâ€™s some kind of overhead to the nested loop generator. Splitting it into two generators seems faster.

``````mean_iter2(p, q) = sum(sum(p1 > q1 for p1 in p) for q1 in q) / length(p)
``````

You can go a bit faster with integer arithmetic:

``````function mean_loop_int(p::AbstractArray, q::AbstractArray)
z = 0
for x in q
z += count(>(x), p)
end
return z / length(p)
end
``````
``````julia> x = rand(10^4);

julia> y = rand(10^4);

julia> @btime mean_loop(\$x, \$y)
19.034 ms (0 allocations: 0 bytes)
4968.276699999993

julia> @btime mean_loop_int(\$x, \$y)
15.797 ms (0 allocations: 0 bytes)
4968.2767
``````

But if you change the algorithm, you can get significant speedups:

``````function mean_sorted(p, q)
ps = sort(p)
Lq = length(q)
z = 0
for x in q
ind = searchsortedfirst(ps, x)
z += Lq - ind + 1
end
return z / length(p)
end
``````
``````julia> @btime mean_sorted(\$x, \$y)
1.512 ms (2 allocations: 78.20 KiB)
4968.2767
``````

Possibly, it can be even faster if we sort both arrays and search for the next larger at each iteration, but I couldnâ€™t immediately find a `searchsortednext` function.

This also scales well for longer vectors. For 10^6 length I get:

``````julia> @btime mean_sorted(x, y) setup=(x=rand(10^6); y=rand(10^6))
395.748 ms (2 allocations: 7.63 MiB)
500344.792228
``````

That is, N \log N complexity, rather than N^2.

1 Like

How about using `view`?

Yeah, I guess.