Matrix-vector product faster than matrix addition?

you should expect matrix addition to be roughly 3x slower because you will be bottlenecked by memory bandwidth and adding 2 matrices into a 3rd requires looking at 3x as much memory.