# For loop optimization

Hello,

I’m not sure how to optimize the following for loop.

`array_1` and `array_2` are two `Vector{Vector{Int64}}`s, and I’m interested in evaluating which combinations (of 2 elements) of their vector components has an element-wise sum equal to a specified `sum_target`, which of course is a `Vector{Int64}`.

The code I’m currently using is the following:

``````using BenchmarkTools

function f()

n_elements = 50

array_1 = [Int64.(rand(Int8,500)) for i in 1:n_elements]
array_2 = [Int64.(rand(Int8,500)) for i in 1:n_elements]

sum_target = Int64.(rand(Int8,500))

solutions = Vector{Vector{Int64}}[]

iterator_over_combinations  = Iterators.product(array_1,array_2)

for combination in iterator_over_combinations
if all(sum(combination) .== sum_target)
push!(solutions, combination  )
end
end

return solutions

end
``````

The number of vector elements of `array_1` and `array_2` is here set to `n_elements::Int64 = 50` in order for `@btime` to work, but in the real application I have like `n_elements::Int64 = 10^11` and combine ten arrays instead of just two.

Right now I get:

``````julia> @btime f() ;
2.587 ms (10205 allocations: 21.02 MiB)
``````

Thank you very much in advance.

1 Like

You can (and should) remove all type assertions on the left hand sides. They do not help performance, but make the code harder to read.

1 Like

`all(sum(combination) .== sum_target)` This is really slow because you are computing all the equalities and then seeing if they are all true rather than only computing until you find a place where they aren’t equal.

4 Likes

The main issue is that `all(sum(combination) .== sum_target)` allocates two arrays at each iteration of the `for` loop, which ends up being very costly. You could instead do the comparisons lazily, in a way that fully avoids allocations. Most importantly, as @Oscar_Smith suggested, this would also allow you to terminate the computations as soon as there is a disagreement, instead of having to compute the whole sum before doing the comparisons.

Below are a few alternatives. Note that I have moved the allocation of the large arrays to outside of the function.

``````function f2(array_1, array_2, sum_target)
solutions = Tuple{eltype(array_1), eltype(array_2)}[]

iterator_over_combinations = Iterators.product(array_1, array_2)

for combination in iterator_over_combinations
if all(sum(combination) .== sum_target)
push!(solutions, combination)
end
end

solutions
end

function f3(array_1, array_2, sum_target)
solutions = Tuple{eltype(array_1), eltype(array_2)}[]

iterator_over_combinations = Iterators.product(array_1, array_2)

for combination in iterator_over_combinations
success = all(
vs -> vs[1] + vs[2] == vs[3],
zip(combination..., sum_target),
)
if success
push!(solutions, combination)
end
end

solutions
end

function f4(array_1, array_2, sum_target)
solutions = Tuple{eltype(array_1), eltype(array_2)}[]

iterator_over_combinations = Iterators.product(array_1, array_2)

for combination in iterator_over_combinations
success = true
a, b = combination
for i ∈ eachindex(sum_target)
if a[i] + b[i] ≠ sum_target[i]
success = false
break
end
end
if success
push!(solutions, combination)
end
end

solutions
end

n_elements = 50
m = 500
array_1 = [Int64.(rand(Int8, m)) for i in 1:n_elements]
array_2 = [Int64.(rand(Int8, m)) for i in 1:n_elements]
sum_target = Int64.(rand(Int8, m))

@btime f()  # 5.152 ms (10205 allocations: 20.98 MiB)

@btime f2(\$array_1, \$array_2, \$sum_target)  # 5.142 ms (10001 allocations: 20.52 MiB)
@btime f3(\$array_1, \$array_2, \$sum_target)  # 14.733 μs (1 allocation: 64 bytes)
@btime f4(\$array_1, \$array_2, \$sum_target)  # 10.009 μs (1 allocation: 64 bytes)

``````

Note that `f2` is nearly the same as `f`, and the performance is the same, while the `f3` and `f4` variants are almost 3 orders of magnitude faster. Moreover, `f4` is (in my opinion) more readable and ends up being the fastest option.

5 Likes