Julia is Eight times Slower than Go on Numerical Scheme

The present time-stepping scheme is the 4th order Gauss-Legendre. The 1e-10 tolerance is not the desired accuracy of the solution but used for the Newton iterations which solve the implicit scheme to machine precision in order to exactly preserve the energy of the conservative dynamical system being integrated.

The goal is to illustrate how one can obtain an engineering tolerance approximation that still preserves energy without needing to compute a high precision solution. In particular, all approximations in the time resolution study preserve energy to within rounding error even the inaccurate ones.

However, I’m admittedly a beginner with IRK methods.

I have looked through your code for the 8-stage IRK scheme. Since I’m integrating a system of three ODE’s my understanding is that this would lead to 24 variables that need to be implicitly solved to machine precision at each time step. The system isn’t particularly stiff, but the idea is to relax accuracy requirements as much as possible while continuing to exactly preserve energy.

I’ll see what I can do with the code you linked. These mathematical pointers and ideas are greatly appreciated. Thanks!

That’s a simple speedup. Thanks for testing this!

1 Like

It looks like I didn’t fully understand the performance implications of const and other things to the JIT. As I had just watched the conference talk

by Erik Engheim, I may have been lulled into thinking everything would just work out without additional “annotations” which seems not the case.

Again, I appreciate all the helpful comments and friendly community. It looks like I’ll soon be able to know Julia as my friend. Thanks!

2 Likes

I’d definitely suggest reading Performance Tips · The Julia Language if you want to learn more about high-performance Julia. The very first rule is about avoiding (non-constant) global variables, and I think that covers 90% of new user performance issues that I’ve seen. Fortunately, it’s easy to fix and easy to avoid once you know to pay attention.

9 Likes

You are right that the global const is the first thing in performance tips. I think the reason it trips people up so frequently is because that mistake is so simple that they immediately focus on the other performance tips. It’s also astonishing how much of a difference it makes compared to other languages.

Suppose I wanted to run my program checking a parameterized family of different IRK methods for a certain aspect of performance. In this case, I might want to change the global variables a, b and c in an outside loop and then perform the same test for each choice. I think this means a, b and c can no longer be const globals (since I want to keep changing them in the outside loop).

What would be the work around? Do I allocate a, b and c locally and then pass them in as arguments to all the functions? Could I retain performance by using nested function definitions (all inside my main function) or closures?

Yes, that (you may use a struct or a tuple to organize things). You could also use closures, or functors. Here many alternatives were dicussed: Define data-dependent function, various ways

1 Like

You might want to look into alternative ManifoldProjection schemes and such for preserving energy. Doing one Newton iteration each time might still be more than you need. Also, at that level it can be quasi-Newton where you save pre-factorized DGi. Anyways, there’s a ton you can do to the algorithm here so this in general isn’t an efficient way to compute this, and that’s disregarding the fact that the implementation is allocating a lot and such.

1 Like

You can change the values of the contents of a, b, and c to have a family of parameters. It doesn’t matter that they are defined as constants.

In that link you can see:

Note that “constant-ness” does not extend into mutable containers; only the association between a variable and its value is constant. If x is an array or dictionary (for example) you can still modify, add, or remove elements.

The important thing about defining a container (as an array) as a constant is that the compiler can be certain that the type of the variable will not change.

In your case you have global const b=[1/2,1/2]. This makes the compiler aware that the variable b will always be a container of type typeof(b) = Vector{Float64}. However the contents of the container can be changed:

julia> global const b = [1/2,1/2]
2-element Vector{Float64}:
 0.5
 0.5

julia> typeof(b)
Vector{Float64} (alias for Array{Float64, 1})

julia> b[1] = 1/3
0.3333333333333333

julia> b
2-element Vector{Float64}:
 0.3333333333333333
 0.5

So you can indeed change the contents of a and b without problems, as long as you put objects of the same type in there, and you will not lose performance.

However, because b will always be a container of objects of type Float64 (because you defined it as a constant), you cannot put a String into it:

julia> b[2] = "tata"
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
  ...
Stacktrace:
 [1] setindex!(A::Vector{Float64}, x::String, i1::Int64)
   @ Base ./array.jl:839
 [2] top-level scope
   @ REPL[32]:1

This constant-ness of the type of the object is what allows the compiler to produce efficient code.

4 Likes

Woohoo! Being able to change a, b and c even though declared as global constants means I can have my cake and eat it too. Thanks, this just saved me from all of the workarounds I was considering!

1 Like

I think you are right about the method and how it could be improved. This code was the result of a 1-hour lecture demonstration for a numerical analysis course (including the crazy CAS stuff in Mathematica to create the functions fG and fDG).

Since this is only a 6x6 matrix, it’s not clear to me whether switching to a quasi-Newton’s method would be more efficient, especially as the goal is to solve for the k1 and k2 in the IRK step to machine precision in order to preserve energy.

Do you have a good reference for manifold-projection schemes? My impression is that methods like IRK Gauss-Legendre also work well when a dissipative perturbation is added to the dynamics. Again thanks for the ideas.

My main concern was the programming language, but I’ve learned much more!

Damn right!

Most of the implicit IVP codes I know about compute and factor Jacobians only when the number of Newton iterations per time step exceeds 3. I think the number 3 came from Gear’s code decades ago. You adjust time step and order for stability and accuracy and fiddle with the nonlinear solver to avoid Jacobian work.

My recollection is that I need about 3 to 4 Newton iterations to solve k1 and k2 to machine precision. If I don’t, then energy is not conserved to machine precision, which was the point in the first place.

In particular, the tradeoff in the precision of the computation of k1 and k2 relates not just to the precision of the resulting approximation y, but also to the precision with which energy is conserved. From what I understand, by using a conservative IRK scheme, these two aspects of the approximate solution can be decoupled.

Just be careful because while that can be efficient, it makes the code harder to debug and less portable, since the behaviour of your functions will depend on the global variables. Parameters, closures, etc, are safer in that way.

5 Likes

Could it be that your time steps are too long and/or your predictor is having trouble? Four Newtons/time step is not bad if there is a reason.

In many, but not all, cases one would reduce the time step if the nonlinear solver was taking too many steps.

I’ll have to instrument the code again to see what’s actually happening. In this case it’s possible my starting point for k1 and k2 is poor as I’m using an explicit RK2 method for the predictor. What’s the correct way to do this?

https://diffeq.sciml.ai/stable/features/callback_library/#Manifold-Conservation-and-Projection

is our current implementation, but it’s a bit crude. We can probably do a few more Jacobian saving tricks in there. It’s a good undergrad or new contributor project so I just have left it open to improve (:wink:). The best tome for this whole subject is Hairer’s geometric integration book which details the convergence with post-step projections, but then a bunch of schemes to avoid having to do it.

Yeah that was a GEAR thing that was preserved by LSODE and that whole series of codes, but the IRK codes have a lot more convergence rate measures that are used. But indeed, coupling adaptive time stepping with the Newton method gives wild gains, because it allows you to take a YOLO time step and then pull back if Newton isn’t converging, etc. which ultimately drops the number of calculations substantially.

Also, you might want to look at the SciMLBenchmarks, specifically:

https://benchmarks.sciml.ai/html/DynamicalODE/Henon-Heiles_energy_conservation_benchmark.html

https://benchmarks.sciml.ai/html/DynamicalODE/Quadrupole_boson_Hamiltonian_energy_conservation_benchmark.html

@SebastianM-C did a bunch of work there and showed that using really high order methods at low tolerance, and only projecting when the energy error is sufficiently high, can be one of the most efficient ways to solve such a problem. If that’s what you’re aiming for, this combination of techniques can be more efficient than a symplectic integrator.

1 Like

IIRC the Gauss-Legendre methods come with a collocation polynomial, and extrapolating that polynomial is the common predictor.

Also, note that for fully implicit Runge-Kutta methods you can exploit the structure of the A matrix by doing the Newton iteration in the eigenspace of A. Checkout Hairer II page 121 and 127 for more details.

The stage predictor for collocation RKs is the formula (8.5’) on page 120 of Hairer II.

1 Like

I mean, this isn’t really true. If you did the same thing in any other language it would be just as much of a performance issue. It’s just that most dynamic languages are so slow that it doesn’t matter and static languages don’t have any way of declaring an untyped global. So the “problem” that Julia has here is that it makes it easy to create an untyped non-constant global — because that’s a thing that people expect to work in a dynamic language — but unlike other dynamic languages, Julia is fast enough that it matters. It would be possible to fix this with some compiler work, but it’s a low priority because it’s easy to avoid once you know how, and non-constant globals are a bad idea anyway.

22 Likes

What you say makes sense. Having some background with C and Fortran as well as recent experience with Go (which conveys strong typing implicitly through a definition such as a:=1), I hadn’t thought about how some people expect to have non-typed variables in a read, evaluate, print loop. I’m also not sure dynamic typing is really needed for a REPL, as one of the first examples of an interactive language used suffixes such as $ and % to infer type.

Even though there are things that might be improved, it seems to me the most important thing for Julia right now is to stop changing. Over half the advice I find when performing a web search doesn’t work anymore on recent versions of Julia. Fortunately, this forum seems full of helpful and friendly people.

Does the const keyword essentially mean constant type rather than value?