Efficient looping statement in Julia


#1

Happy new year to all.

I have a quick question about writing effcient looping code. To describe a problem, let me use a simple for looping statement which check if a number is even and store that number into an array (result_arr) if it is even. This is just for example.

I can consider two options: Option 1 and Option 2, see below. Option 1 is always faster than Option 2. It seems because Option 2 creates an extra array from [ … ] operation: 7.13% gc_time. But, I would like to write Option 2 since it is simple one line and easy to read.

Could there be any simple and efficeint Julia code, as simple as Option 2 and as fast as Option 1?

Thank you very much in advance.

result_arr = Array{Float64,1}(undef,10^8)

Option 1

@time begin
for i in 1:10^8
iseven(i) ? result_arr[i] = i : result_arr[i] = 0;
end
end

Option 2

@time [iseven(i) ? result_arr[i] = i : result_arr[i] = 0 for i=1:10^8]

output>
2.463796 seconds (150.00 M allocations: 2.235 GiB, 3.99% gc time)
3.201820 seconds (150.05 M allocations: 2.983 GiB, 7.13% gc time)


#3

Your benchmarks are likely to be inaccurate since your code is not in functions and you are benchmarking in global scope. Read https://docs.julialang.org/en/v1/manual/performance-tips/index.html and try @btime from https://github.com/JuliaCI/BenchmarkTools.jl


#4

The foreach function should do exactly what you’re looking for. You can learn about it by running ?foreach at the Julia terminal.


#5

Thank you for mentioning benchmarking. I will check. But, I am asking that if you should write these kind of codes, which option you are going to choose or both not good, then any alternative? If the performance was correctly recorded, then should Option 1 and Option 2 perform equally good?


#6

Oh, thank you very much. This might be probably what I am looking for. I will check.


#7

No, because your option 2 (an array comprehension) allocates a new array as well as performing the loop.


#8

A for loop is at least as fast as foreach, if not faster, and is probably simpler for things like this.

@chuljang, unlike some other languages, you don’t generally optimize Julia code by digging through the language or standard library to find the best “built-in” construct. Instead, you think about types, algorithms, and memory allocations/access.


#9

For reference, here are slightly cleaned-up versions of @chuljang’s two options, put into functions for performance, along with @rdeitsforeach option 3, and a more optimized option 4 that eliminates the iseven check by performing two loop iterations at once:

function option1!(a)
    for i = 1:length(a)
        a[i] = iseven(i) ? 1 : 0
    end
end

function option2!(a)
    # wasteful: allocates an additional array:
    [iseven(i) ? a[i] = i : a[i] = 0 for i=1:length(a)]
end

function option3!(a)
    foreach(1:length(a)) do i
        a[i] = iseven(i) ? 1 : 0
    end
end

function option4!(a)
    @inbounds for i = 1:2:length(a)-1
        a[i] = 0     # odd i
        a[i+1] = 1   # even i
    end
    if isodd(length(a))
        a[end] = 0
    end
end

If I benchmark them with the BenchmarkTools package (which performs multiple timings and picks the fastest):

a = Array{Int}(undef, 10^6);
@btime option1!($a);
@btime option2!($a);
@btime option3!($a);
@btime option4!($a);

I get

  595.217 μs (0 allocations: 0 bytes) # option 1
  2.326 ms (4 allocations: 7.63 MiB)  # option 2
  638.338 μs (1 allocation: 16 bytes) # option 3
  279.508 μs (0 allocations: 0 bytes) # option 4

#10

It is a bit unfair to put @inbounds in one of the functions and not the other though.

julia> function option3_inb!(a)
           foreach(1:length(a)) do i
               @inbounds a[i] = iseven(i) ? 1 : 0
           end
       end;

julia> # option3_inb!(a) = foreach(i -> (@inbounds a[i] = iseven(i) ? i : 0), 1:length(a)) # same as above

julia> @btime option3!($a);
  695.627 μs (1 allocation: 16 bytes)

julia> @btime option4!($a);
  359.022 μs (0 allocations: 0 bytes)

julia> @btime option3_inb!($a);
  359.901 μs (1 allocation: 16 bytes)

While “you don’t generally optimize Julia code by digging through the language or standard library to find the best “built-in” construct” is true, there is also no need to write C-style code as soon as you want something to be fast. Using higher level functions can make the code simpler to read and should in most cases come with little to no overhead.


#11

Oh, cool, I tried @inbounds with option 1 as well and it also matches option 4. Apparently LLVM does some loop unrolling / strength reduction that eliminates the iseven test?

I agree, but I wouldn’t count foreach vs. for as an improvement in readability or simplicity here — rather the reverse. Higher-level constructs are not (necessarily) slow in Julia, but there is no reason to bend over backwards to avoid writing for loops either.


#12

Thank @stevengj and @kristoffer.carlsson for detailed references. Okay, obviously my Option 2 is not an alternative to Option 1 unless a new array is required as its return variable. Now, I know how to benchmark performance correctly! :grin:


#13

I was curious to see if I could get SIMD to kick in. Simply decorating the loop with @simd and switching to ifelse did not seem to do the trick, but manually unrolling two iterations of the loop seems to yield some minor improvement. Does anyone have a clue as to the difference?

function option1!(a)
    @inbounds for i = 1:length(a)
        a[i] = iseven(i) ? 1 : 0
    end
end

function option5!(a)
    @inbounds @simd for i = 1:2:length(a)
        a[i] = ifelse(iseven(i),1,0)
        a[i+1] = ifelse(iseven(i+1),1,0)
    end
end

a = zeros(10^6);

julia> @benchmark option1!($a)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     366.615 μs (0.00% GC)
  median time:      388.473 μs (0.00% GC)
  mean time:        408.216 μs (0.00% GC)
  maximum time:     951.376 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark option5!($a)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     336.827 μs (0.00% GC)
  median time:      357.260 μs (0.00% GC)
  mean time:        380.683 μs (0.00% GC)
  maximum time:     1.248 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

#14

An alternative to option2 seems to me to be:

julia> @btime (i->ifelse(iseven(i),1,0)).(1:10^6);;
  620.493 μs (2 allocations: 7.63 MiB)

#15

@Jean_Michel, This syntax looks a bit new to me. Could you refer to any section or part in Julia document (https://docs.julialang.org/en/v1/index.html)?


#16

i->ifelse(iseven(i),1,0) is an anonymous function, which you can broadcast (i.e. apply element by element) to the range 1:1_000_000.

This one-liner is thus equivalent to:

f(i) = iseven(i) ? 1 : 0
f.(1:10^6)

which is in turn more or less equivalent to:

f(i) = iseven(i) ? 1 : 0

r = Array{Int,1}(undef, 10^6)
for i in 1:10^6
    r[i] = f(i)
end
r

#17

Amusingly, somewhere in this topic, an incorrect implementation was introduced, with the value 1 for even values instead of the index value. As a result, most implementations above now solve a different problem.

@baggepinnen, your fixed @simd solution (with index values instead of 1) benchmarks like this for me:

function option5!(a)
   @inbounds @simd for i = 1:2:length(a)
       a[i] = ifelse(iseven(i), i, 0)
       a[i+1] = ifelse(iseven(i+1), i, 0)
   end
end

julia> @btime option5!($a)
  432.995 μs (0 allocations: 0 bytes)

I was able to improve this by using SIMD.jl:

using SIMD

function option_simd!(a)
    v = Vec{4,Float64}((0, 2, 0, 4))
    k = Vec{4,Float64}((0, 4, 0, 4))
    lane = VecRange{4}(0)
    @inbounds @simd for i = 1:4:length(a)-3
        a[lane + i] = v
        v += k
    end
    @inbounds for i = (length(a) & ~3)+1:length(a)
        a[i] = iseven(i) ? i : 0
    end
end

julia> @btime option_simd!($a)
  300.023 μs (0 allocations: 0 bytes)

Just unrolling the loop doesn’t help much since the limiting factor seems to be contention between updating v and writing it. We can improve that by using two separate accumulators:

function option_simd_2!(a)
    v1 = Vec{4,Float64}((0, 2, 0, 4))
    v2 = Vec{4,Float64}((0, 6, 0, 8))
    k = Vec{4,Float64}((0, 8, 0, 8))
    lane = VecRange{4}(0)
    @inbounds @simd for i = 1:8:length(a)-7
        a[lane + i] = v1
        v1 += k
        a[lane + i + 4] = v2
        v2 += k
    end
    @inbounds for i = (length(a) & ~7)+1:length(a)
        a[i] = iseven(i) ? i : 0
    end
end

julia> @btime option_simd_2!($a)
  266.161 μs (0 allocations: 0 bytes)