you should expect matrix addition to be roughly 3x slower because you will be bottlenecked by memory bandwidth and adding 2 matrices into a 3rd requires looking at 3x as much memory.
you should expect matrix addition to be roughly 3x slower because you will be bottlenecked by memory bandwidth and adding 2 matrices into a 3rd requires looking at 3x as much memory.