I have the following function which will be evaluated thousands of times in a for loop. This function will generate a vector. Each element in this vector is computed by some algebra of some elements in two different matrices, which are decided by the index in H_sim_market. I wonder if there is a smarter way to code this function?
function fun_1_b(z_agent_k_matrix::Matrix{Float64}, z_agent_kk_matrix::Matrix{Float64}, H_sim_market::Vector{CartesianIndex{2}})
z_1_b = Vector{Float64}(undef, length(H_sim_market))
counter = 1
for (i,j) in Tuple.(H_sim_market)
z_1_b[counter] = z_agent_k_matrix[i,j] * (sum(z_agent_kk_matrix[:,j])-z_agent_kk_matrix[i,j])
counter += 1
end
return z_1_b
end
One quick performance tip is to move the allocation of z_1_b outside of the function and pass it in to fun_1_b! (note: rename it so you know it’s mutating). Your allocations should go down dramatically, and you should see some gain (please post benchmarks!)
Advantage is that since you’re mutating z_1_b you don’t even have to return it , though this shouldn’t impact performance.
You should split the two loops and compute the sum sum(z_agent_kk_matrix[:,j]) using a view ( sum(view(z_agent_kk_matrix,:,j))) just once per j in the outer loop.
You should provide example input for the function to make it possible to test, and thereby much easier to improve.
(Also, long and similar-looking variable names make the code hard to read: z_agent_k_matrix and z_agent_kk_matrix are almost visually indistinguishable. Short variable names are preferable unless the names provide some insight into the algorithm, which does not seem to be the case here.)
The usual advice is to reduce allocations, especially in a loop. An array slice a[i:j] is, by default, allocating and makes a copy of data rather than view into the parent array.
function fun_1_b(z_agent_k_matrix::Matrix{Float64}, z_agent_kk_matrix::Matrix{Float64}, H_sim_market::Vector{CartesianIndex{2}})
# assuming that we *want* to allocate the output
z_1_b = similar(H_sim_market, Float64)
# pre-allocate views for columns
z_agent_kk_cols = eachcol(z_agent_kk_matrix)
for counter in eachindex(H_sim_market)
# destructure indices here to avoid allocation in `Tuple.(H)`
i, j = Tuple(H_sim_market[counter])
z_1_b[counter] = z_agent_k_matrix[i,j] * (sum(z_agent_kk_cols[j])-z_agent_kk_matrix[i,j])
end
return z_1_b
end
If H_sim_market contains all indices of z_agent_kk_matrix, or at least each column multiple times, then it makes sense to precompute all column sums:
...
z_agent_kk_sums = sum(z_agent_kk_matrix, dims=1)
for ...
...
z_1_b[counter] = z_agent_k_matrix[i,j] * (z_agent_kk_sums[j]-z_agent_kk_matrix[i,j])
end
I took the liberty to rewrite the function with shorter variable names, just to show how much easier it is to read. I couldn’t see what it was doing before I replaced the variable names (I also changed H’s type to AbstractVector, since otherwise it won’t accept CartesianIndices):
function foo(A::Matrix, B::Matrix, H::AbstractVector{CartesianIndex{2}})
z = similar(A, size(H))
for (i, ind) in enumerate(H)
z[i] = A[ind] * ( sum(B[:, ind[2]]) - B[ind] )
end
return z
end
Note that you don’t have to unpack CartesianIndex into a tuple. This is the point of CartesianIndex really, to make multidimensional indexing simpler and more elegant.
Or, maybe just:
function bar(A, B, indices)
[A[ind] * (sum(B[:, ind[2]]) - B[ind]) for ind in indices]
end