Orders of magnitude runtime difference in row-wise norm

Preventing broadcast fusing contains more detailed discussion about the broadcasting fusing slowdown (a failure of LICM for a theoretically pure norm function? EDIT: maybe not, because LICM can’t create temporary arrays I suppose?)

1 Like