Improving performance elegantly - type stability

One thing I’d question in some of the implementations above, is the duplication of logic (vector dot product). If these two cases (vector and matrix) are supposed to use the same underlying logic, the safest way to achieve that is to use the same code (commonly known as DRY code).

For example, let’s say that you realize that the repeated summing leads to numerical stability issues, and you want to change to pairwise or Kahan summation. It’s much better then (less risk of bugs, less coding) to just have to change a single implementation, than multiple ones.

Perhaps it doesn’t always make sense to do this in practice due to performance reasons, but I think it’s worth trying hard (even sacrifice some performance), and in this case I think it can be done quite well:

@inline function sum_generic(a::AbstractVector{T}, M::AbstractVector{T}) where {T}
    c = zero(T)
    @inbounds @simd for i = 1:length(M)
        c += a[i]*M[i]
    end
    return c
end

function sum_generic(a::AbstractMatrix{T}, M::AbstractVector{T}) where {T}
    map(j -> sum_generic(view(a, :, j), M), 1:size(a,2))
end

By adding @inbounds and @simd, the code can also be sped up considerably. Sample timings:

julia> M = rand(1000); a = rand(1000);

julia> @btime sum_generic($a, $M);
  84.707 ns (0 allocations: 0 bytes)    # without simd: 1.138 μs

julia> M = rand(1000); a = rand(1000,1000);

julia> @btime sum_generic($a, $M);
  230.650 μs (3 allocations: 8.00 KiB)  # without simd: 1.130 ms
2 Likes