Can I make this function even faster?

zxjroger · March 10, 2020, 9:05pm

I have the following function which will be evaluated thousands of times in a for loop. This function will generate a vector. Each element in this vector is computed by some algebra of some elements in two different matrices, which are decided by the index in H_sim_market. I wonder if there is a smarter way to code this function?

function fun_1_b(z_agent_k_matrix::Matrix{Float64}, z_agent_kk_matrix::Matrix{Float64}, H_sim_market::Vector{CartesianIndex{2}})
    z_1_b = Vector{Float64}(undef, length(H_sim_market))
    counter = 1
    for (i,j) in Tuple.(H_sim_market)
        z_1_b[counter] = z_agent_k_matrix[i,j] * (sum(z_agent_kk_matrix[:,j])-z_agent_kk_matrix[i,j])
        counter += 1
    end
    return z_1_b
end

anon94023334 · March 10, 2020, 9:10pm

One quick performance tip is to move the allocation of z_1_b outside of the function and pass it in to fun_1_b! (note: rename it so you know it’s mutating). Your allocations should go down dramatically, and you should see some gain (please post benchmarks!)

Advantage is that since you’re mutating z_1_b you don’t even have to return it , though this shouldn’t impact performance.

jebej · March 10, 2020, 9:44pm

You should split the two loops and compute the sum sum(z_agent_kk_matrix[:,j]) using a view ( sum(view(z_agent_kk_matrix,:,j))) just once per j in the outer loop.

DNF · March 10, 2020, 10:03pm

You should provide example input for the function to make it possible to test, and thereby much easier to improve.

(Also, long and similar-looking variable names make the code hard to read: z_agent_k_matrix and z_agent_kk_matrix are almost visually indistinguishable. Short variable names are preferable unless the names provide some insight into the algorithm, which does not seem to be the case here.)

Vasily_Pisarev · March 10, 2020, 10:05pm

The usual advice is to reduce allocations, especially in a loop. An array slice a[i:j] is, by default, allocating and makes a copy of data rather than view into the parent array.

function fun_1_b(z_agent_k_matrix::Matrix{Float64}, z_agent_kk_matrix::Matrix{Float64}, H_sim_market::Vector{CartesianIndex{2}})
    # assuming that we *want* to allocate the output
    z_1_b = similar(H_sim_market, Float64)
    # pre-allocate views for columns
    z_agent_kk_cols = eachcol(z_agent_kk_matrix)
    for counter in eachindex(H_sim_market)
        # destructure indices here to avoid allocation in `Tuple.(H)`
        i, j = Tuple(H_sim_market[counter])
        z_1_b[counter] = z_agent_k_matrix[i,j] * (sum(z_agent_kk_cols[j])-z_agent_kk_matrix[i,j])
    end
    return z_1_b
end

If H_sim_market contains all indices of z_agent_kk_matrix, or at least each column multiple times, then it makes sense to precompute all column sums:

...
z_agent_kk_sums = sum(z_agent_kk_matrix, dims=1)
for ...
    ...
    z_1_b[counter] = z_agent_k_matrix[i,j] * (z_agent_kk_sums[j]-z_agent_kk_matrix[i,j])
end

DNF · March 10, 2020, 10:43pm

I took the liberty to rewrite the function with shorter variable names, just to show how much easier it is to read. I couldn’t see what it was doing before I replaced the variable names (I also changed H’s type to AbstractVector, since otherwise it won’t accept CartesianIndices):

function foo(A::Matrix, B::Matrix, H::AbstractVector{CartesianIndex{2}})
    z = similar(A, size(H))
    for (i, ind) in enumerate(H)
        z[i] = A[ind] * ( sum(B[:, ind[2]]) - B[ind] )
    end
    return z
end

Note that you don’t have to unpack CartesianIndex into a tuple. This is the point of CartesianIndex really, to make multidimensional indexing simpler and more elegant.

Or, maybe just:

function bar(A, B, indices)
    [A[ind] * (sum(B[:, ind[2]]) - B[ind]) for ind in indices]
end

Topic		Replies	Views
Assistance in avoiding allocations in summations Performance	8	790	March 18, 2019
Reduce memory allocated in array view and in place sum Performance question	12	692	November 10, 2023
Why these two functions have different allocations? General Usage	11	420	July 13, 2021
Reducing allocations and run time General Usage	8	393	February 12, 2021
Operation on columns of a matrix Performance matrices	3	416	November 13, 2020

Can I make this function even faster?

Related topics