Ayo ayo, time for everyone’s favorite Julia game: MakeItFast!
I’m calculating a discounted value given a controlled Markov process, and for that I construct a matrix betaFP to be used in the context (I-betaFP)\u. There are J choices, nX states, conditional transition matrices F[j] nX-by-nX and conditional choice probabilities P[j] nX-by-1. betaFP is calculated according to the formula: beta(P_1F_1+P_2F_2+…+P_JF_J), where [*] is the hadamard product (element by element, such that P[j][x] is multiplied across all columns in row x of F[j]. For sparse F[j] (large problems) mapreduce seems to be the best bet, but for dense F[j] it isn’t.
Below are two possibilities. I multiply beta directly to P[j] as these are only J*nX multiplications where it would be nX^2 if I multiplied it to the matrix in the end.
Note, my life does not depend on the efficiency of this call (as you saw above, I’m calling \ right after this!), but I just got curious, as I got a vast improvement by doing f1! over
sum(beta*P[j].*F[j] for j in eachindex(P, F)) (would love to implement a
sum!(betaFP, beta*P[j].*F[j] for j in eachindex(P, F)) but I’m not sure if that is wanted in Base, and I havn’t looked at that part of the code base.
function f1!(βFP, β, P, F) J = length(P) βFP .= β*P.*F @inbounds for a = 2:J βFP .+= β*P[a].*F[a] end end function f2!(βFP, β, P, F) J = length(P) nX = length(P) fill!(βFP, 0.0) @inbounds for j = 1:J bpj = β*P[j] Fj = F[j] for x′ = 1:nX for x = 1:nX βFP[x, x′] += bpj[x]*Fj[x, x′] end end end end J=2 nX=400 B = rand(nX,nX) b=0.99 P=[rand(nX) for i=1:J] F=[rand(nX,nX) for i=1:J] f1!(B, b, P, F)
The unrolled loop a bit faster, sometimes! I’m currently using a heuristic like
if nX < 700 || J > 10 call
f2! else call
Is there anything that is obviously stupid in your eyes? I would love to use sum and a generator but that is too slow (100-200% longer runtimes in some cases). @simd does nothing, I guess it is automatically applied here?
I guess there’s simply a limit to how fast a computer can compute this, I was just curious if anyone had any insights. Thanks