Julia is surprisingly slow for some simple iteration

It’s possible there is some type instability happening. Here is your original code in a function:

function foo!(B, y, By)
    N = length(y)
    for i in 1:N
        By[i] = 0.0
        for j in 1:N
            By[i] += B[i, j] * y[i]
        end
    end
end

And I get:

julia> @time foo!(B, y, By)
  0.956673 seconds

Now let’s make it better. First, I simply change B[i, j] to B[j, i] to match Julia’s column-oriented matrices. Then I have:

julia> @time foo!(B, y, By)
  0.031369 seconds

Now let me rewrite it a little:

  • Instead of accessing y[i] and By[i] in the inner loop, I pull them out to the outer loop so only B needs to be accessed in the inner loop
  • You don’t need to multiply by y[i] at each iteration, just multiply the sum by it, it’s the same
  • I use @inbounds @simd to enable SIMD and disable bounds checks. This requires me to be a little more strict with the types it accepts, and to have some checks at the top of the code.

The code is now

function foo2!(B::Matrix, y::Vector, By::Vector)
    N = length(y)
    size(B) == (N, N) || error()
    @inbounds for i in eachindex(y, By)
        Byi = 0.0
        @inbounds @simd for j in 1:N
            Byi += B[j, i]
        end
        By[i] = Byi * y[i]
    end
end

And the timing:

julia> @time foo2!(B, y, By)
  0.014844 seconds

Of course, you could also just write it simpler, like this:

function foo3!(B, y, By)
    columns = eachcol(B) 
    for i in eachindex(columns, y, By)
        By[i] = sum(columns[i]; init=zero(eltype(B))) * y[i]
    end
end

However, this will be somewhat slower due to the views and the pairwise summation strategy:

julia> @time foo3!(B, y, By)
  0.029927 seconds
11 Likes