With this default choice, it means that if only one component has significant error that error is allowed to grow to sqrt(N) (N vector dimension) times the tolerance on that component before the step is reduced.

I would like to understand the rationale for this choice of default norm. The java math library Hipparchus https://www.hipparchus.org/ uses the same so there must be a good reason but intuitively I would have chosen

errorNorm = \max_i(error_i)

to ensure all components error estimates stay below their respective tolerances.

If N is small, the choices are essentially the same. The L2 norm which is in most codes I know of may well be there because “It’s always been this way.” That what Gear did in his code and that has likely propagated into the codes from the US. Europe, which is more RK centric, may do something else.

If you went with max, your time steps would be smaller and the benefit, if any, might not be worth it. The old codes were designed for computers with less power than a modern garage door opener, so that could have been a factor.

The “maximum norm” that you’re suggesting is often called the L^\infty norm, denoted \Vert x \Vert_\infty and has various names. The default norm is the Euclidean norm, also called the L^2 norm, denoted \Vert x \Vert_2. Together with the L^1 norm \Vert x \Vert_1 = \sum_k |x_k|, these are the three most common vector norms and have lots of interesting applications, and there are many other useful norms as well.

The Euclidean norm tends to be the default choice — not only is it the oldest and most well known, but it also differentiable everywhere except x=0 (unlike L^1 and L^\infty), and it has the closest connections to linear algebra (since \Vert x \Vert_2 = \sqrt{\overline{x^T} x}) … hence, for example, one gets the useful property that \Vert x \Vert_2 is invariant under rotations.

But it is definitely not required for error analysis, and you’re right that sometimes it is convenient to use a different norm for convergence criteria. Some Julia libraries (e.g. QuadGK.jl) allow the caller to drop-in a norm replacement, but I’m not sure whether this is possible for DifferentialEquations.jl without overloading a bunch of methods?

Note that all finite-dimensional norms are “equivalent” in the sense that they differ by at most a constant factor. However, as you noted in your example, the constant factor often grows with the dimension N.

It’s been the default choice since the 70’s, with Gear, Shampine’s RK codes, Hairer’s RKs, etc. all doing this or similar. If you want DP5 to give the same exact results as dopri5 (which it does, up to floating point errors), you have to use this norm (well, other than a typo Hairer had in the initial dt application of the norm, but… typos in ancient programs are just a fun fact to note). The reason is the assumption that large sets of equations are more the same than different. This is true in a lot of cases that grow big, notably discretizations of partial differential equations have a repeated structure.

But also, this gives the property that if a user repeats the equations, say uses a larger vector in order to parallelize over parameters with GPUs, the stepping is not necessarily impacted. If you do not do this division by N, then repeating the same equation twice will simply decrease the time steps, which given you’re solving the same exact numerical system that seems odd. This, plus the aforementioned differentiability of this norm choice, make it a fairly safe and natural choice for many applications.

That said, if you want the maximum norm, feel free to pass it in via internalnorm.