Zombies in biological ODE : why is my solver not sticking to zero?

It’s essentially a discrete way of implementing the “attracting” modification I proposed before. But I think it makes your system more realistic. When the number of rhinos goes to 1, then the rhino becomes extinct due to inability to reproduce…

In any case, I found in my zombies model that explicit timestepping Tsit5 worked well and a callback with Z<1e-6 zeroing out the zombie population made sense (imagine zombies measured as fraction of initial population of 1M… Z below this threshold means less than 1 zombie.

No need for the zombie voodoo. I’ve already written down what’s happening here for your colleague. See the issue:

Basically, the issue is that LU-factorization based solving, i.e. A\b, is not guaranteed to preserve zeros. This means that there can be a true zero, but A\b can give 2e-16 or something close to eps(Float64) like that. When that happens, you get a ki = 2e-16 where it should be zero, and that gives a positive (or negative) derivative to a “dead” term.

There are many solutions to it. For one, if you use an explicit integrator then LU-factorization is never used so you’re good. Or if you change to a Krylov method like @isaacsas mentioned then you’re fine. Or you can use the trick from the issue to use mutation in the f function:

    for i in (num_nutrient+1):(index_vessel-1)
        if iszero(p[1][i])
            bioS[i] = 0.0 #safety against https://github.com/SciML/OrdinaryDiffEq.jl/issues/1651
        end
    end

In almost all cases, mutating u is a bad idea. In this specific case, it’s actually a good way to improve numerical accuracy.

5 Likes

Hi again, and many thanks to @kcqpo and @ChrisRackauckas for nailing this down <3

So IIUC now, here are the eventual answers to our questions:

  • dBdt! gets called with nonzero values of B[2] because some internal operation used by solve() is not zero-stable (for performance reasons I guess), namely something called “LU-factorization / A\b” that we need to get familiar with. This being said:

    • The the non-zero-stability only affects LU-factorization based solvers (or at least some of them). Explicit solvers and alternate methods like Krilov are exempt from (at least this kind of) non-zero-stability.

    • The undesired non-zero values (aka. zombies) resulting from this phenomenon are guaranteed to be small, for some eps(Float64)-related definition of “small”. However, should we be willing to embrace them, then we would be in charge of not letting them creeping up and invade the simulation.

  • Mutating DiffeqCallbacks are not the worst tools for the job, but definitely not the only ones. Alternate approaches to avoiding zombies in our situation include:

    • Forbiding the whole class of implicit LU-factorization-based solving methods to our users (like ImplicitEuler, RosenBrock23, Rodas4, KenCarp47, RadauIIA3, PDIRK44…?), hoping that no other kind of non-zero stability exists in explicit solvers and/or alternate solving methods.

    • Record extinction events by raising special flags within p, add a branch into dBdt! to check these flags like if is_species_extinct(p, i) and force B[i] = 0 when this condition is triggered. This effectively mutates the state handed to dBdt! by solve(). If I correctly understand what you are explaining here @ChrisRackauckas, this is usually considered unsafe practice because it violates one of the major assumptions of DifferentialEquations. About this point specifically, I just want to get 100% rid of one critical concern:

      • Is this approach “unsafe” in the soft sense that it only violates high-level assumptions of DifferentialEquations? So mutating B in the general case could corrupt the solver’s internal state and lead to reproducible wrong simulations or crashes of DifferentialEquations? If yes, then I trust you that we find ourselves in a particular situation where we can safely assume that mutating an artefactual, small positive number, resulting from only LU-factorization to zero is not actually state-breaking (at least in the current implementation) and we can go for it.

      • Or is this approach “unsafe” in the very hard sense that it eventually violates low-level assumptions of Julia’s compiler, which in turn invokes LLVM’s Undefined Behaviour? So mutating B in any situation (including ours) could corrupt low-level code optimizations and lead to non-reproducible crashes of the REPL, memory corruption, unexplainable bugs or even silent wrong trajectories? If yes, then I’m afraid there is no way we should use it at all in professional software :scream:

  • Stop the solver as soon as B[i] < 0 is detected and spawn a new simulation with the variable B[i] removed. This would trigger a freeze on every extinction event. But in large systems with len(B) > 100 and massive extinctions, performances would actually increase as extinctions go because the system would shrink with time and dBdt! would not remain busy herding zeroes.

Not sure exactly which approach to pick now, but we do have very interesting new chunks of information to crunch. In any case KUDOS to y’all folks, you are what makes this community awesome! <3

I’m going to hazard a strong guess and say you don’t need to worry about this. The issue is a mathematical one not memory safety etc.

Based on what we’ve learned here I would say you should use explicit methods, and for realism also have a callback that checks to see whether any population falls below a threshold for “one organism” (however you define this in your parameterization) and sets those parameters to 0.0. this is an understandable scientifically motivated process. It’s not just that extinct species should stay extinct but also that species should go explicitly extinct if their population falls below 1 organism.

1 Like

No, I never said any of this and this is simply not true. I don’t know where this idea would even come from.

Yes, it’s soft in this sense. It violates the assumptions of any adaptive ODE solver. I show exactly this point in my JuliaCon talk about debugging simulation codes. Here’s where I give a high level overview of the adaptivity process:

Because you do not always go forward in time due to this rejection behavior, but mutation actually modifies the caches, this means that after rejection you will have mutated the value of u would could, in theory, change the behavior of the equation. In a general sense, this could lead to cases where a single f is not well-defined in the sense of u' = f(u) since the way that you have accepted/rejected steps changes what values come out of f. That is then precisely the issue I talk about at this point of the talk:

where I say, such a case does not actually define an ODE, since u' = f(u) does not exist because it’s not f(u) but rather some f(how steps are computed) and does not have a generally unique answer (and I show a case where the ODE does not have a unique answer from ODE solvers).

However, this specific case of mutation is deterministic in the sense that for any reasonable dt, it will always send the value down to zero, and therefore you get the same solution whether you previously rejected a step or if the last step was an acceptance. So while it does not necessarily define a strict u' = f(u) ODE, it is “close enough” (for reasonable dt) that it then ends up defining an ODE with a unique solution in practice, which then fixes the solver behavior.

2 Likes

Great ^ ^ Sorry I had this slight worry just because of some “lack of safety in mutation” phrase used there. (context: I am personally coming from a Rust background where associating the two words “mutate” and “unsafe” really means scary things ^ ^")

Happy to read that this is nothing to worry about :slight_smile: Thank you again! And especially for this fascinating talk <3