Optimizing matrix multiplication code and being compatible with autodiff

I see. My experience with Turing models has been that I can allocate an Array{T} and update its elements each iteration, with reversediff. But the same model fails with zygote.