Matrix multiplication and vector multiplication give different results!

Yes, this is covered in that PSA a bit further down:

With lots of stuff in hardware, doing things like vectorization or reassociation (changing the order of operations) will change results.

julia> v = rand(10000);

julia> sum(v)
5004.669545612722

julia> let s = 0.0
           for x in v
               s += x
           end
           s
       end
5004.669545612701

Here’s an easy way to see why this happens:

julia> (1 + 1e20) - 1e20
0.0

julia> 1 + (1e20 - 1e20)
1.0

Matrix multiplication does all sorts of stuff that can change the order of operations, or even use reduced / improved accuracy fma operations, etc. This will naturally cause discrepancies like those you saw.

This is why people are advised to use isapprox () instead of equality:

julia> m12[:, 400] == m12col400
false

julia> m12[:, 400] ≈ m12col400
true

These two results are equal up to the sorts of resolution guarenteed by the algorithms we use.