Hi there everone. Currently I have the following function:
function start_constraints(S, B, M, V, R, X, values_matrix, risk_vec)
for i in 1:S
for m in 1:M
values_matrix[i, m] = sum(X[i, j] * V[m][j] for j in 1:B)
end
risk_vec[i] = sum(X[i, j] * R[j] for j in 1:B)
end
return values_matrix, risk_vec
end
Which receives some sizes (S and B), a vector of vectors of ints (V) of size B, a vector of ints (R) of size B, a binary matrix X (which has S rows, B columns) and the results are stored in a matrix (values_matrix) that stores the results for each i in S and m in M and a vector of size S.
All of the values are integers. M is a small number (3), the individual entries on V and R are not very large integers (ranging from 20-200).
I am trying to speed up this calculation and have arrived at the following:
function start_constraints_optimized_v3(S, B, M, V, R, X, values_matrix, risk_vec)
@inbounds for i in 1:S
risk_vec[i] = 0
for m in 1:M
tmp = 0
@simd for j in 1:B
tmp += X[i, j] * V[m][j]
end
values_matrix[i, m] = tmp
end
@simd for j in 1:B
risk_vec[i] += X[i, j] * R[j]
end
end
return values_matrix, risk_vec
end
I must say that due to some other parts of the code, values_matrix and risk_vec are “polluted” so thats why they are setted to 0 in this version of the function. I have achieved some gains with this approach, however I am wondering if this is the absolute best I can do. I tried using LoopVectorization but I am doing something wrong because it doesn’t work:
@inline function prepare_values_risk_vec(S, M, values_matrix, risk_vec)
@inbounds for i in 1:S
risk_vec[i] = 0
for m in 1:M
values_matrix[i, m] = 0
end
end
return values_matrix, risk_vec
end
function start_constraints_optimized_v4(S, B, M, V, R, X, values_matrix, risk_vec)
values_matrix, risk_vec = prepare_values_risk_vec(S, M, values_matrix, risk_vec)
@turbo for i in 1:S
for j in 1:B
x_val = X[i, j]
risk_vec[i] += x_val * R[j]
for m in 1:M
values_matrix[i, m] += x_val * V[m][j]
end
end
end
return values_matrix, risk_vec
end
Any other performance tips or tricks or a rewrite? I can provide some working values for all of the parameters of the function if needed. Thank you very much!