For loop Performance

Can somebody help me in speeding up following function

function mean_loop(p::AbstractArray,q::AbstractArray)
a = p;
a1 = q;
z=0.0;
for i in 1:length(a1)
s=0.0;
@inbounds s= mean(a .> a1[i]);
z+= s;
s=0.0
end
return z
end
@time mean_loop(rand(10^6),rand(10^6)) :902.126519 secs

This isn’t horribly slow, but I think it can be much more compact:

julia> mean_cast(p,q) = sum(p .> q') / length(p);

julia> mean_iter(p,q) = sum(p1>q1 for p1 in p, q1 in q) / length(p);

julia> using BenchmarkTools

julia> @btime mean_loop($(rand(10^4)),$(rand(10^4)));
  107.303 ms (30000 allocations: 54.17 MiB)

julia> @btime mean_cast($(rand(10^4)),$(rand(10^4)));
  128.281 ms (4 allocations: 11.93 MiB)

julia> @btime mean_iter($(rand(10^4)),$(rand(10^4)));
  76.649 ms (7 allocations: 160 bytes)

Try this:

function mean_loop(p::AbstractArray, q::AbstractArray)
    z = 0.0
    for x in q
        z += mean(t -> t > x, p)
    end
    return z
end

Nicely done, that’s about 7x faster than my mean_iter. Which is a little surprising to me, don’t they iterate through the same things?

Welcome to the JuliaLang discourse @sam1988!

I’d recommend reading Please read: make it easier to help you for advice on how to format your code and ask questions in a way that makes it more likely for you to get high quality responses which are also useful to others in the future (thought it appears you’ve already got some good replies!).

Furthermore, when benchmarking your code, using @time can be very misleading. In the example you gave, @time mean_loop(rand(10^6),rand(10^6)) will include the time it takes to compile the function mean_loop and the time it takes to create the arrays rand(10^6) and rand(10^6), so most of the time you’re measuring is not actually code relevant to mean_loop. Instead, I would use the @btime macro from BenchmarkTools.jl to properly measure performance like this:

using BenchmarkTools
@btime mean_loop($(rand(10^6)), $(rand(10^6)))

The @btime macro will not include compile time in it’s benchmarks, and it runs the benchmarking function many times so it can give statistically significant results). When I write $(rand(10^6)) inside @btime, that means that I am actually precomputing the array rand(10^6) and pasting the value in as a constant to the function before the benchmark loop is run.

1 Like

I guess there’s some kind of overhead to the nested loop generator. Splitting it into two generators seems faster.

mean_iter2(p, q) = sum(sum(p1 > q1 for p1 in p) for q1 in q) / length(p)

You can go a bit faster with integer arithmetic:

function mean_loop_int(p::AbstractArray, q::AbstractArray)
    z = 0
    for x in q
        z += count(>(x), p)
    end
    return z / length(p) 
end
julia> x = rand(10^4);

julia> y = rand(10^4);

julia> @btime mean_loop($x, $y)
  19.034 ms (0 allocations: 0 bytes)
4968.276699999993

julia> @btime mean_loop_int($x, $y)
  15.797 ms (0 allocations: 0 bytes)
4968.2767

But if you change the algorithm, you can get significant speedups:

function mean_sorted(p, q)
    ps = sort(p)
    Lq = length(q)
    z = 0
    for x in q
        ind = searchsortedfirst(ps, x)
        z += Lq - ind + 1
    end
    return z / length(p)
end
julia> @btime mean_sorted($x, $y)
  1.512 ms (2 allocations: 78.20 KiB)
4968.2767

Possibly, it can be even faster if we sort both arrays and search for the next larger at each iteration, but I couldn’t immediately find a searchsortednext function.

This also scales well for longer vectors. For 10^6 length I get:

julia> @btime mean_sorted(x, y) setup=(x=rand(10^6); y=rand(10^6))
  395.748 ms (2 allocations: 7.63 MiB)
500344.792228

That is, N \log N complexity, rather than N^2.

1 Like

How about using view?

Yeah, I guess.