I have a number of functions that perform updates on a number of large arrays in the same loop. Previously, I learned that this is often a good idea for performance. Now, I am facing another issue: I need a few variants of these functions that do the updates only slightly differently. Here is a generic example:
module SomeModule
# Some declarations and preallocations of variable arrays, 
# let's say the following arrays came out of this:
#   A, B, C, D, and E. 
# All have the same size.
function modify_arrays1!(A::T, B::T, C::S, D::T, E::T) where {
    T <: Array{Float64,2},
    S <: BitArray{2}
    }
    @inbounds @simd for i in eachindex(A)
        # Update A[i], B[i], C[i], D[i], and E[i] with some long (but
        #   simple) arithmetic and boolean checks between them.
    end
end
function modify_arrays2!(A::T, B::T, C::S, D::T, E::T) where {
    T <: Array{Float64,2},
    S <: BitArray{2}
    }
    @inbounds @simd for i in eachindex(A)
        # Same updates as in `modify_arrays1`, 
        #   but with a two-line difference, 
        #   out of, say, around 10 or 20.
    end
end
function modify_arrays3!(A::T, B::T, C::S, D::T) where {
    T <: Array{Float64,2},
    S <: BitArray{2}
    }
    @inbounds @simd for i in eachindex(A)
        # Still the same updates as in `modify_arrays1`, 
        #   but now E is absent from the process, leading
        #   to another 1 or 2 lines of difference in code, 
        #   out of, again, around 10 or 20.
    end
end
These functions have to be run many thousands of times as part of a bigger iteration process. Is it better to factor out the few different lines (which, by the way, still involves 2 or 3 of the input arrays) into their own functions (e.g. function few_lines(A, B, D) ... end), and call them from a unified modify_arrays! function (perhaps with E becoming an optional argument); OR to leave the functions as is with lots of duplicate lines between them?
Or, is the answer: “It depends”?
I have also tried declaring the different lines as expressions (Expr) and passing them as arguments to a unified modify_arrays! function, but it occasionally (after a few hundred successful iterations) give me an UndefVarError on some variable name I used in the expression, and I couldn’t figure out why. I was not yet able to find out at that point if it gave any performance (dis-)advantage.
(Side question: Is it better or worse, in terms of performance, to “combine” modify_arrays2! and modify_arrays3! to make use of multiple dispatch instead?)
Please kindly advise. Any additional side comments about other aspects of the generic example above will also be very welcomed! Thank you!
