It’s been the default choice since the 70’s, with Gear, Shampine’s RK codes, Hairer’s RKs, etc. all doing this or similar. If you want DP5 to give the same exact results as dopri5 (which it does, up to floating point errors), you have to use this norm (well, other than a typo Hairer had in the initial dt application of the norm, but… typos in ancient programs are just a fun fact to note). The reason is the assumption that large sets of equations are more the same than different. This is true in a lot of cases that grow big, notably discretizations of partial differential equations have a repeated structure.
But also, this gives the property that if a user repeats the equations, say uses a larger vector in order to parallelize over parameters with GPUs, the stepping is not necessarily impacted. If you do not do this division by N, then repeating the same equation twice will simply decrease the time steps, which given you’re solving the same exact numerical system that seems odd. This, plus the aforementioned differentiability of this norm choice, make it a fairly safe and natural choice for many applications.
That said, if you want the maximum norm, feel free to pass it in via internalnorm.