It’s possible there is some type instability happening. Here is your original code in a function:
function foo!(B, y, By)
N = length(y)
for i in 1:N
By[i] = 0.0
for j in 1:N
By[i] += B[i, j] * y[i]
end
end
end
And I get:
julia> @time foo!(B, y, By)
0.956673 seconds
Now let’s make it better. First, I simply change B[i, j]
to B[j, i]
to match Julia’s column-oriented matrices. Then I have:
julia> @time foo!(B, y, By)
0.031369 seconds
Now let me rewrite it a little:
- Instead of accessing
y[i]
and By[i]
in the inner loop, I pull them out to the outer loop so only B
needs to be accessed in the inner loop
- You don’t need to multiply by
y[i]
at each iteration, just multiply the sum by it, it’s the same
- I use
@inbounds @simd
to enable SIMD and disable bounds checks. This requires me to be a little more strict with the types it accepts, and to have some checks at the top of the code.
The code is now
function foo2!(B::Matrix, y::Vector, By::Vector)
N = length(y)
size(B) == (N, N) || error()
@inbounds for i in eachindex(y, By)
Byi = 0.0
@inbounds @simd for j in 1:N
Byi += B[j, i]
end
By[i] = Byi * y[i]
end
end
And the timing:
julia> @time foo2!(B, y, By)
0.014844 seconds
Of course, you could also just write it simpler, like this:
function foo3!(B, y, By)
columns = eachcol(B)
for i in eachindex(columns, y, By)
By[i] = sum(columns[i]; init=zero(eltype(B))) * y[i]
end
end
However, this will be somewhat slower due to the views and the pairwise summation strategy:
julia> @time foo3!(B, y, By)
0.029927 seconds