In the SciML book, parameter estimation problems are defined with a loss function that uses the norm of the residual, not the norm-squared as typical in least-squares formulations (see here). Is this a typo or is there more behind it? As the square is convex, I presume the minima do not change, but I would expect a different behavior of gradient descend in both cases.