Why is this simple function twice as slow as its Python version

After all I have learned here, my opinion on numpy has increased considerably.
It seems that numpy interprets tmp2[i*n:(i+1)*n,j*n:(j+1)*n] = t@tmp1 as
np.matmul(t,tmp1, out=tmp2[i*n:(i+1)*n,j*n:(j+1)*n]) without me having to think about it. That’s very nice.

1 Like