Most performant way to perform matrix and vector calculations in a function

Hi there everone. Currently I have the following function:

function start_constraints(S, B, M, V, R, X, values_matrix, risk_vec)
    for i in 1:S
        for m in 1:M
            values_matrix[i, m] = sum(X[i, j] * V[m][j] for j in 1:B)
        end
        risk_vec[i] = sum(X[i, j] * R[j] for j in 1:B)
    end
    return values_matrix, risk_vec
end

Which receives some sizes (S and B), a vector of vectors of ints (V) of size B, a vector of ints (R) of size B, a binary matrix X (which has S rows, B columns) and the results are stored in a matrix (values_matrix) that stores the results for each i in S and m in M and a vector of size S.
All of the values are integers. M is a small number (3), the individual entries on V and R are not very large integers (ranging from 20-200).

I am trying to speed up this calculation and have arrived at the following:

function start_constraints_optimized_v3(S, B, M, V, R, X, values_matrix, risk_vec)
    @inbounds for i in 1:S
        risk_vec[i] = 0
        for m in 1:M
            tmp = 0
            @simd for j in 1:B
                tmp += X[i, j] * V[m][j]
            end
            values_matrix[i, m] = tmp
        end

        @simd for j in 1:B
            risk_vec[i] += X[i, j] * R[j]
        end
    end
    return values_matrix, risk_vec
end

I must say that due to some other parts of the code, values_matrix and risk_vec are “polluted” so thats why they are setted to 0 in this version of the function. I have achieved some gains with this approach, however I am wondering if this is the absolute best I can do. I tried using LoopVectorization but I am doing something wrong because it doesn’t work:


@inline function prepare_values_risk_vec(S, M, values_matrix, risk_vec)
    @inbounds for i in 1:S
        risk_vec[i] = 0
        for m in 1:M
            values_matrix[i, m] = 0
        end
    end
    return values_matrix, risk_vec
end

function start_constraints_optimized_v4(S, B, M, V, R, X, values_matrix, risk_vec)
    values_matrix, risk_vec = prepare_values_risk_vec(S, M, values_matrix, risk_vec)
    @turbo for i in 1:S
        for j in 1:B
            x_val = X[i, j]
            risk_vec[i] += x_val * R[j]
            for m in 1:M
                values_matrix[i, m] += x_val * V[m][j]
            end
        end
    end
    return values_matrix, risk_vec
end

Any other performance tips or tricks or a rewrite? I can provide some working values for all of the parameters of the function if needed. Thank you very much!

I have only skimmed your code, so I might be missing something, but why do

instead of using matrix-vector and matrix-matrix BLAS routines
mul!(risk_vec, X, R) and so on?

Secondly, passing the dimensions of the arrays explicitly seems superfluous, when they can be determined with size, but that of course won’t help improve performance, just usability.

2 Likes

Should I then change the structure of my values_matrix? I can’t seem to find how to fit the mul! function in with that. mul! in risk_vec does make the code marginally faster but I assume most gains would be achieved in the values_matrix calculation.

I would try (essentially doing what @skleinbo suggested)

function start_constraints(S, B, M, V, R, X, values_matrix, risk_vec)
    for m in eachindex(V)
        mul!(view(values_matrix,:,m), X, V[m])
    end
    mul!(risk_vec, X, R)
    return values_matrix, risk_vec
end

This kind of raises the question why V is a vector of vectors and not a matrix in the first place. If V were a vector, we could just do mul!(values_matrix, X, V) without needing any loop.

2 Likes

Thank you very much, that definitely improves performance!
As to why, well I guess I’d have to ask to my version of roughly 2 years ago. I guess I found it more ergonomic at the time to represent some constraints like that and all of my problem instances have been generated with the V parameter as a vector of vectors so I guess it’s too late now to change it.
I will evaluate changing V to a matrix in only this specific context to see if it’s worth it. But that definitely introduces a very good improvement and it’s quite a big step in the right direction. Thank you so much once again.

In the end I rewrote my code to remove the call to this function. After all, the fastest computation is the one you don’t perform at all.