For loop optimization

claudio20497 · October 31, 2021, 6:55pm

Hello,

I’m not sure how to optimize the following for loop.

array_1 and array_2 are two Vector{Vector{Int64}}s, and I’m interested in evaluating which combinations (of 2 elements) of their vector components has an element-wise sum equal to a specified sum_target, which of course is a Vector{Int64}.

The code I’m currently using is the following:

using BenchmarkTools

function f()

    n_elements = 50

    array_1 = [Int64.(rand(Int8,500)) for i in 1:n_elements]
    array_2 = [Int64.(rand(Int8,500)) for i in 1:n_elements]

    sum_target = Int64.(rand(Int8,500)) 

    solutions = Vector{Vector{Int64}}[]

    iterator_over_combinations  = Iterators.product(array_1,array_2)

    for combination in iterator_over_combinations
        if all(sum(combination) .== sum_target)
            push!(solutions, combination  )
        end
    end

    return solutions




end

The number of vector elements of array_1 and array_2 is here set to n_elements::Int64 = 50 in order for @btime to work, but in the real application I have like n_elements::Int64 = 10^11 and combine ten arrays instead of just two.

Right now I get:

julia> @btime f() ;
  2.587 ms (10205 allocations: 21.02 MiB)

Thank you very much in advance.

DNF · October 31, 2021, 7:33pm

You can (and should) remove all type assertions on the left hand sides. They do not help performance, but make the code harder to read.

Oscar_Smith · October 31, 2021, 7:36pm

all(sum(combination) .== sum_target) This is really slow because you are computing all the equalities and then seeing if they are all true rather than only computing until you find a place where they aren’t equal.

jipolanco · October 31, 2021, 7:39pm

The main issue is that all(sum(combination) .== sum_target) allocates two arrays at each iteration of the for loop, which ends up being very costly. You could instead do the comparisons lazily, in a way that fully avoids allocations. Most importantly, as @Oscar_Smith suggested, this would also allow you to terminate the computations as soon as there is a disagreement, instead of having to compute the whole sum before doing the comparisons.

Below are a few alternatives. Note that I have moved the allocation of the large arrays to outside of the function.

function f2(array_1, array_2, sum_target)
    solutions = Tuple{eltype(array_1), eltype(array_2)}[]

    iterator_over_combinations = Iterators.product(array_1, array_2)

    for combination in iterator_over_combinations
        if all(sum(combination) .== sum_target)
            push!(solutions, combination)
        end
    end

    solutions
end

function f3(array_1, array_2, sum_target)
    solutions = Tuple{eltype(array_1), eltype(array_2)}[]

    iterator_over_combinations = Iterators.product(array_1, array_2)

    for combination in iterator_over_combinations
        success = all(
            vs -> vs[1] + vs[2] == vs[3],
            zip(combination..., sum_target),
        )
        if success
            push!(solutions, combination)
        end
    end

    solutions
end

function f4(array_1, array_2, sum_target)
    solutions = Tuple{eltype(array_1), eltype(array_2)}[]

    iterator_over_combinations = Iterators.product(array_1, array_2)

    for combination in iterator_over_combinations
        success = true
        a, b = combination
        for i ∈ eachindex(sum_target)
            if a[i] + b[i] ≠ sum_target[i]
                success = false
                break
            end
        end
        if success
            push!(solutions, combination)
        end
    end

    solutions
end

n_elements = 50
m = 500
array_1 = [Int64.(rand(Int8, m)) for i in 1:n_elements]
array_2 = [Int64.(rand(Int8, m)) for i in 1:n_elements]
sum_target = Int64.(rand(Int8, m))

@btime f()  # 5.152 ms (10205 allocations: 20.98 MiB)

@btime f2($array_1, $array_2, $sum_target)  # 5.142 ms (10001 allocations: 20.52 MiB)
@btime f3($array_1, $array_2, $sum_target)  # 14.733 μs (1 allocation: 64 bytes)
@btime f4($array_1, $array_2, $sum_target)  # 10.009 μs (1 allocation: 64 bytes)

Note that f2 is nearly the same as f, and the performance is the same, while the f3 and f4 variants are almost 3 orders of magnitude faster. Moreover, f4 is (in my opinion) more readable and ends up being the fastest option.

Topic		Replies	Views
Accumulation in the for loop General Usage question	17	2133	January 21, 2022
Sum isn't faster than a loop? New to Julia	11	1852	June 24, 2020
Question: Improve the speed of for loop Performance question	20	1580	April 12, 2023
Dot vs. array comprehension vs. loop New to Julia performance , integral	4	901	July 4, 2021
Summing a vector of vectors in a non-allocating way Performance memory-allocation , arrays	10	2619	November 27, 2021

For loop optimization

Related topics