Can I make this function even faster?

I have the following function which will be evaluated thousands of times in a for loop. This function will generate a vector. Each element in this vector is computed by some algebra of some elements in two different matrices, which are decided by the index in H_sim_market. I wonder if there is a smarter way to code this function?

function fun_1_b(z_agent_k_matrix::Matrix{Float64}, z_agent_kk_matrix::Matrix{Float64}, H_sim_market::Vector{CartesianIndex{2}})
    z_1_b = Vector{Float64}(undef, length(H_sim_market))
    counter = 1
    for (i,j) in Tuple.(H_sim_market)
        z_1_b[counter] = z_agent_k_matrix[i,j] * (sum(z_agent_kk_matrix[:,j])-z_agent_kk_matrix[i,j])
        counter += 1
    end
    return z_1_b
end

One quick performance tip is to move the allocation of z_1_b outside of the function and pass it in to fun_1_b! (note: rename it so you know it’s mutating). Your allocations should go down dramatically, and you should see some gain (please post benchmarks!)

Advantage is that since you’re mutating z_1_b you don’t even have to return it :slight_smile:, though this shouldn’t impact performance.

2 Likes

You should split the two loops and compute the sum sum(z_agent_kk_matrix[:,j]) using a view ( sum(view(z_agent_kk_matrix,:,j))) just once per j in the outer loop.

1 Like

You should provide example input for the function to make it possible to test, and thereby much easier to improve.

(Also, long and similar-looking variable names make the code hard to read: z_agent_k_matrix and z_agent_kk_matrix are almost visually indistinguishable. Short variable names are preferable unless the names provide some insight into the algorithm, which does not seem to be the case here.)

2 Likes

The usual advice is to reduce allocations, especially in a loop. An array slice a[i:j] is, by default, allocating and makes a copy of data rather than view into the parent array.

function fun_1_b(z_agent_k_matrix::Matrix{Float64}, z_agent_kk_matrix::Matrix{Float64}, H_sim_market::Vector{CartesianIndex{2}})
    # assuming that we *want* to allocate the output
    z_1_b = similar(H_sim_market, Float64)
    # pre-allocate views for columns
    z_agent_kk_cols = eachcol(z_agent_kk_matrix)
    for counter in eachindex(H_sim_market)
        # destructure indices here to avoid allocation in `Tuple.(H)`
        i, j = Tuple(H_sim_market[counter])
        z_1_b[counter] = z_agent_k_matrix[i,j] * (sum(z_agent_kk_cols[j])-z_agent_kk_matrix[i,j])
    end
    return z_1_b
end

If H_sim_market contains all indices of z_agent_kk_matrix, or at least each column multiple times, then it makes sense to precompute all column sums:

...
z_agent_kk_sums = sum(z_agent_kk_matrix, dims=1)
for ...
    ...
    z_1_b[counter] = z_agent_k_matrix[i,j] * (z_agent_kk_sums[j]-z_agent_kk_matrix[i,j])
end

I took the liberty to rewrite the function with shorter variable names, just to show how much easier it is to read. I couldn’t see what it was doing before I replaced the variable names (I also changed H’s type to AbstractVector, since otherwise it won’t accept CartesianIndices):

function foo(A::Matrix, B::Matrix, H::AbstractVector{CartesianIndex{2}})
    z = similar(A, size(H))
    for (i, ind) in enumerate(H)
        z[i] = A[ind] * ( sum(B[:, ind[2]]) - B[ind] )
    end
    return z
end

Note that you don’t have to unpack CartesianIndex into a tuple. This is the point of CartesianIndex really, to make multidimensional indexing simpler and more elegant.

Or, maybe just:

function bar(A, B, indices)
    [A[ind] * (sum(B[:, ind[2]]) - B[ind]) for ind in indices]
end
1 Like