# Most performant way to perform matrix and vector calculations in a function

Hi there everone. Currently I have the following function:

``````function start_constraints(S, B, M, V, R, X, values_matrix, risk_vec)
for i in 1:S
for m in 1:M
values_matrix[i, m] = sum(X[i, j] * V[m][j] for j in 1:B)
end
risk_vec[i] = sum(X[i, j] * R[j] for j in 1:B)
end
return values_matrix, risk_vec
end
``````

Which receives some sizes (S and B), a vector of vectors of ints (V) of size B, a vector of ints (R) of size B, a binary matrix X (which has S rows, B columns) and the results are stored in a matrix (values_matrix) that stores the results for each i in S and m in M and a vector of size S.
All of the values are integers. M is a small number (3), the individual entries on V and R are not very large integers (ranging from 20-200).

I am trying to speed up this calculation and have arrived at the following:

``````function start_constraints_optimized_v3(S, B, M, V, R, X, values_matrix, risk_vec)
@inbounds for i in 1:S
risk_vec[i] = 0
for m in 1:M
tmp = 0
@simd for j in 1:B
tmp += X[i, j] * V[m][j]
end
values_matrix[i, m] = tmp
end

@simd for j in 1:B
risk_vec[i] += X[i, j] * R[j]
end
end
return values_matrix, risk_vec
end
``````

I must say that due to some other parts of the code, values_matrix and risk_vec are â€śpollutedâ€ť so thats why they are setted to 0 in this version of the function. I have achieved some gains with this approach, however I am wondering if this is the absolute best I can do. I tried using LoopVectorization but I am doing something wrong because it doesnâ€™t work:

``````
@inline function prepare_values_risk_vec(S, M, values_matrix, risk_vec)
@inbounds for i in 1:S
risk_vec[i] = 0
for m in 1:M
values_matrix[i, m] = 0
end
end
return values_matrix, risk_vec
end

function start_constraints_optimized_v4(S, B, M, V, R, X, values_matrix, risk_vec)
values_matrix, risk_vec = prepare_values_risk_vec(S, M, values_matrix, risk_vec)
@turbo for i in 1:S
for j in 1:B
x_val = X[i, j]
risk_vec[i] += x_val * R[j]
for m in 1:M
values_matrix[i, m] += x_val * V[m][j]
end
end
end
return values_matrix, risk_vec
end
``````

Any other performance tips or tricks or a rewrite? I can provide some working values for all of the parameters of the function if needed. Thank you very much!

I have only skimmed your code, so I might be missing something, but why do

instead of using matrix-vector and matrix-matrix BLAS routines
`mul!(risk_vec, X, R)` and so on?

Secondly, passing the dimensions of the arrays explicitly seems superfluous, when they can be determined with `size`, but that of course wonâ€™t help improve performance, just usability.

2 Likes

Should I then change the structure of my values_matrix? I canâ€™t seem to find how to fit the mul! function in with that. mul! in risk_vec does make the code marginally faster but I assume most gains would be achieved in the values_matrix calculation.

I would try (essentially doing what @skleinbo suggested)

``````function start_constraints(S, B, M, V, R, X, values_matrix, risk_vec)
for m in eachindex(V)
mul!(view(values_matrix,:,m), X, V[m])
end
mul!(risk_vec, X, R)
return values_matrix, risk_vec
end
``````

This kind of raises the question why V is a vector of vectors and not a matrix in the first place. If V were a vector, we could just do `mul!(values_matrix, X, V)` without needing any loop.

2 Likes

Thank you very much, that definitely improves performance!
As to why, well I guess Iâ€™d have to ask to my version of roughly 2 years ago. I guess I found it more ergonomic at the time to represent some constraints like that and all of my problem instances have been generated with the V parameter as a vector of vectors so I guess itâ€™s too late now to change it.
I will evaluate changing V to a matrix in only this specific context to see if itâ€™s worth it. But that definitely introduces a very good improvement and itâ€™s quite a big step in the right direction. Thank you so much once again.

In the end I rewrote my code to remove the call to this function. After all, the fastest computation is the one you donâ€™t perform at all.