What makes Julia loops so fast?

I did some test myself as below, it turns out the loop version of my code is close to 15 times faster.

Why? What made Julia so fast when going through a loop?

My code:

A = rand(1:100, 10^5,1);

@time Ind1 = A .> 35;

function f(x)
    Ind = fill(NaN, 10^5,1);
    for i in 1:10^5
        Ind[i] = x[i]>35;
    end
    return Ind;
end

@time Ind2 = f(A);

The results:

0.205660 seconds (1.00 M allocations: 50.089 MiB, 2.73% gc time, 99.87% compilation time)
0.015716 seconds (78.34 k allocations: 5.216 MiB, 96.73% compilation time)

https://docs.julialang.org/en/v1.7/manual/performance-tips/#Avoid-global-variables

1 Like

Note that in both cases compilation time makes for more than 95% of the time measured. If you run the same time measurements a second time:


julia> @time Ind1 = A .>35;
  0.000219 seconds (5 allocations: 16.672 KiB)

julia> @time Ind2 = f(A);
  0.001386 seconds (2 allocations: 781.328 KiB)

2 Likes

While the advice about global variables is generally true and very useful, I think in this case it doesn’t make much difference since both are ultimately dealing with the global variable A anyway. The loop version actually turns out to be slower as I posted above, but that’s mostly due to the bounds checking in the loop. This version is much faster:

julia> function f(x)
           Ind = Vector{Bool}(undef, 10^5);
           for i in 1:10^5
               @inbounds Ind[i] = x[i]>35;
           end
           return Ind;
       end

julia> @time Ind2 = f(A); # compilation run
  0.081470 seconds (46.49 k allocations: 3.090 MiB, 99.79% compilation time)

julia> @time Ind2 = f(A);
  0.000136 seconds (2 allocations: 97.766 KiB)

2 Likes

Many thanks!

It seems that your system is significantly faster than mine. If I may ask, I wonder what kind of computer you are using?

The best result I get is ~0.01 seconds. I’m using a Macbook Pro with Intel Core i9, 2.4 GHz 8 core and 64 GB memory). Thanks.

I was just thinking that my computer must be much slower (and it likely is), because in your original code (measuring the compilation times), where you get 0.2 and 0.01 seconds, my system needed 2 and 0.06 seconds respectively.

Mine is a measly AMD A10 2.4 GHz/4 GB RAM machine running Linux.

1 Like

Interesting. I see you can get to 0.0001 seconds for @time Ind2 = f(A) which is very impressive. I can never get to that kind of speed.

Maybe Linux is indeed faster.

Make sure to use BenchmarkTools to benchmark your expression, interpolating variables into your benchmark expressions using $

julia> using BenchmarkTools

julia> @btime $A .> 35;
  56.285 μs (3 allocations: 16.61 KiB)

julia> @btime f($A)
  429.864 μs (2 allocations: 781.30 KiB)

see

https://juliaci.github.io/BenchmarkTools.jl/stable/

3 Likes

Yeah, I just ran some (with a setup to avoid any caching shenanigans from reusing the same vector A). Getting rid of the global (and using a let-local vector) helps a lot as expected, and the @inbounds version and the broadcast .> version are pretty close:


julia> results = let 
       @benchmark(Ind2 = V .> 35; setup=(V = rand(1:100, 10^5))),
       @benchmark(Ind2 = f_orig(V); setup=(V = rand(1:100, 10^5))),
       @benchmark(Ind2 = f(V); setup=(V = rand(1:100, 10^5)))
       end
(Trial(59.182 μs), Trial(704.202 μs), Trial(53.362 μs))

f_orig is the original function from the first post, f is my version with @inbounds posted above.

1 Like