Can I make this function even faster?

I have the following function which will be evaluated thousands of times in a for loop. This function will generate a vector. Each element in this vector is computed by some algebra of some elements in two different matrices, which are decided by the index in `H_sim_market`. I wonder if there is a smarter way to code this function?

``````function fun_1_b(z_agent_k_matrix::Matrix{Float64}, z_agent_kk_matrix::Matrix{Float64}, H_sim_market::Vector{CartesianIndex{2}})
z_1_b = Vector{Float64}(undef, length(H_sim_market))
counter = 1
for (i,j) in Tuple.(H_sim_market)
z_1_b[counter] = z_agent_k_matrix[i,j] * (sum(z_agent_kk_matrix[:,j])-z_agent_kk_matrix[i,j])
counter += 1
end
return z_1_b
end
``````

One quick performance tip is to move the allocation of `z_1_b` outside of the function and pass it in to `fun_1_b!` (note: rename it so you know itâ€™s mutating). Your allocations should go down dramatically, and you should see some gain (please post benchmarks!)

Advantage is that since youâ€™re mutating `z_1_b` you donâ€™t even have to return it , though this shouldnâ€™t impact performance.

2 Likes

You should split the two loops and compute the sum `sum(z_agent_kk_matrix[:,j])` using a view ( `sum(view(z_agent_kk_matrix,:,j))`) just once per `j` in the outer loop.

1 Like

You should provide example input for the function to make it possible to test, and thereby much easier to improve.

(Also, long and similar-looking variable names make the code hard to read: `z_agent_k_matrix` and `z_agent_kk_matrix` are almost visually indistinguishable. Short variable names are preferable unless the names provide some insight into the algorithm, which does not seem to be the case here.)

2 Likes

The usual advice is to reduce allocations, especially in a loop. An array slice `a[i:j]` is, by default, allocating and makes a copy of data rather than view into the parent array.

``````function fun_1_b(z_agent_k_matrix::Matrix{Float64}, z_agent_kk_matrix::Matrix{Float64}, H_sim_market::Vector{CartesianIndex{2}})
# assuming that we *want* to allocate the output
z_1_b = similar(H_sim_market, Float64)
# pre-allocate views for columns
z_agent_kk_cols = eachcol(z_agent_kk_matrix)
for counter in eachindex(H_sim_market)
# destructure indices here to avoid allocation in `Tuple.(H)`
i, j = Tuple(H_sim_market[counter])
z_1_b[counter] = z_agent_k_matrix[i,j] * (sum(z_agent_kk_cols[j])-z_agent_kk_matrix[i,j])
end
return z_1_b
end
``````

If `H_sim_market` contains all indices of `z_agent_kk_matrix`, or at least each column multiple times, then it makes sense to precompute all column sums:

``````...
z_agent_kk_sums = sum(z_agent_kk_matrix, dims=1)
for ...
...
z_1_b[counter] = z_agent_k_matrix[i,j] * (z_agent_kk_sums[j]-z_agent_kk_matrix[i,j])
end
``````

I took the liberty to rewrite the function with shorter variable names, just to show how much easier it is to read. I couldnâ€™t see what it was doing before I replaced the variable names (I also changed `H`'s type to `AbstractVector`, since otherwise it wonâ€™t accept `CartesianIndices`):

``````function foo(A::Matrix, B::Matrix, H::AbstractVector{CartesianIndex{2}})
z = similar(A, size(H))
for (i, ind) in enumerate(H)
z[i] = A[ind] * ( sum(B[:, ind[2]]) - B[ind] )
end
return z
end
``````

Note that you donâ€™t have to unpack `CartesianIndex` into a tuple. This is the point of `CartesianIndex` really, to make multidimensional indexing simpler and more elegant.

Or, maybe just:

``````function bar(A, B, indices)
[A[ind] * (sum(B[:, ind[2]]) - B[ind]) for ind in indices]
end
``````
1 Like