Delay Differential Equations Networks with different delay per edge

solve(prob, MethodOfSteps(BS3()), saveat=0.01, reltol=0, abstol=1e-5). rtol and atol are not options, and you should almost never force_dtmin. I should really just remove that argument. If you’re hitting it, then your definition is probably wrong: the warning even says to think of dtmin as divergence for a reason described in PSA: How to help yourself debug differential equation solving issues .

And so the first thing is to notice is that these are not the same DDE being solved. In fact, the Julia one isn’t even a DDE: it’s not even a well-defined equation!

h(p, t; idxs=nothing) = typeof(idxs) <: Number ? 1.0 : rand(Uniform(0, 2π), n)

The past does not have a value: every single time the past is called the value changes. So what’s happening is that the starting iterations are unable to converge because there is no sense of convergence: there are no values!

But it’s even worse then that: you’re saying that "the u(t) for t<0 is a random value at every t, but u(t)[1] = 1.0 for any past value. So that’s even worse, it’s not even consistent :scream:. And u(t)[1] = 1.0 does not match what is in the Python code which uses a uniform random value for the past.

The fixed code is:

using DelayDiffEq
using Distributions
using Random

function f!(dθ, θ, h, p, t)
    ω, A = p
    n = length(θ)
    lags = reshape(lag, n,n)
    @inbounds for j in 1:n
        coupling = 0.0
        @inbounds for i in 1:n
            coupling += A[i,j]*sin(h(p, t-lags[i,j]; idxs=i) - θ[j])
        end
        dθ[j] = ω[j] + coupling
    end
    nothing
end

n = 10
Random.seed!(1)
ω = rand(n)
A = rand(n,n)
const lag = rand(n*n)
θ₀ = rand(Uniform(0, 2π), n)
p = (ω, A)
const past = rand(Uniform(0, 2π), n)
h(p, t; idxs=nothing) = typeof(idxs) <: Number ? past[idxs] : past

prob = DDEProblem(f!, θ₀, h, (0.0, 1.0), p, constant_lags=lag)

using Profile
@profile solve(prob, MethodOfSteps(BS3()), saveat=0.01, reltol=0, abstol=1e-5)
Juno.profiler()

After fixing that it’s still slow though. The issue seems to be that Julia’s inference is bugging in f!. This could be https://github.com/JuliaLang/julia/issues/35800 or https://github.com/SciML/OrdinaryDiffEq.jl/pull/1473

3 Likes

Oh interesting, one thing that is happening is Julia’s function handling heuristics are getting in the way.

function f!(dθ, θ, h::H, p, t) where H
    ω, A = p
    n = length(θ)
    lags = reshape(lag, n,n)
    @inbounds for j in 1:n
        coupling = 0.0
        @inbounds for i in 1:n
            coupling += A[i,j]*sin(h(p, t-lags[i,j]; idxs=i) - θ[j])
        end
        dθ[j] = ω[j] + coupling
    end
    nothing
end

is 6 times faster. Before:

julia> @time solve(prob, MethodOfSteps(BS3()), saveat=0.01, reltol=0, abstol=1e-5)
 56.409539 seconds (1.96 G allocations: 32.454 GiB, 7.17% gc time)

After:

julia> @time solve(prob, MethodOfSteps(BS3()), saveat=0.01, reltol=0, abstol=1e-5)
 12.284248 seconds (5.79 M allocations: 455.702 MiB, 1.67% gc time)

https://github.com/SciML/SciMLBase.jl/pull/110 fixes that.

3 Likes

Keep in mind JiTCDDE is itself generating and compiling C-code based on user-specified symbolic expressions, and then just putting a Python wrapper around that code. So it might be pretty efficient to begin with. The DiffEq solvers shouldn’t be lots slower, but for the same basic numerical method it isn’t clear that this is a case where one would expect them to be noticeably faster.

1 Like

Thanks for your reply!

I noticed that the function was not type stable when not declaring lag as const, which I think was expected because the lag was only being declared in the constant_lags keyword argument but the function itself didn’t have information about it. This is also related with what Chris noticed about the Julia heuristics on h function.

I think I’m comparing with the same method.

I’m sorry, I left the dtmin = true from some previous trials with more realistic connectivities and delays which where raising some warnings but it is not necessary in this test case. As Chris said I should just remove it and probably in the other case I was getting the warning because the problem was not well defined.

Thanks a lot Chris!! And I’m sorry for such embarrassing mistakes :sweat_smile:

I know, I was biased by the Python notation that I had just looked at and made the mistake :sweat_smile:

About force_dtmin, I left it from a previous trial with 180 oscillators and more realistic connectivities and delays, which was raising some warning. It was not necessary in this case. And furthermore, as you say, the warnings might have been due to the wrongly defined problem and I shouldn’t have use it in the first place. Thanks for the link!

Thanks for noticing and explaining this! This was the first time I was using DDE and I clearly misunderstood how h function was being used.

Thanks a lot, Chris!! This fix indeed increases significantly the performance! However it still much less performant than the Python version and scales really badly with the size of the system. For example, for n = 20 :

julia> @time solve(prob, MethodOfSteps(BS3()), saveat=0.01, reltol=0, abstol=1e-5)
119.585544 seconds (11.80 M allocations: 1.706 GiB, 3.26% gc time)

while in the Python version it takes ~ 0.007 seconds.

My real need is to run it for n = 180 so I’ll need to improve the performance much more.

Yes, something is happening with the interpolation and I won’t be able to look at that today

1 Like

No problem, I truly appreciate all the help that you gave me so far!! :slight_smile:

Running into similar problems at a large scale too (>100 oscillators), and cannot really compute the solution before timeout on my HPC. Is there a way to help solve this issue?

Have you confirmed that the same issue is occurring?

Yes, and since I’m actually interested in the same system that is being discussed here, I used the fixed version proposed by @chrisrackaukas (so it would be the same issue). I’d love to use DifferentialEquations.jl and all its goodies to solve this problem but the current prohibitive performance issue is making me fall back to Python and JITCODE.

3 Likes

Same problem here. I implemented a quite general interface for networks with heterogeneous delays in NetworkDynamics.jl but unless this issue gets fixed it’s pretty useless: https://github.com/PIK-ICoNe/NetworkDynamics.jl/pull/104

Alright, but it’s a Base inference issue so I don’t know what to say. Help fix Base and that would be a good starting point.

Is there a bug filed against Base?

Yes, it’s already posted in this thread.

1 Like

Here I reported that for some reason when using Julia 1.8.0-beta3 and loading DifferentialEquations.jl, the reproducer of the Base inference issue gives different results (and seems to work well in that case). However, under those conditions the poor performance of the DDE with heterogeneous delays posted here remains the same. So maybe the suggested reproducer is not good for that Base inference issue but it is also possible that the DDE performance problem has another cause.

@ChrisRackauckas, could you run the code that led you to the conclusion that the cause of the DDE issue is the Base inference issue again but this time on Julia 1.8.0-beta3 while loading DifferentialEquations.jl, please?

Dear @ChrisRackauckas, the delayed Kuramoto code is still taking a lot more time using the suggested code than with the Python alternative Jitcdde. Do you know if there is any fix coming soon by any chance?

Can you try the v1.8-beta3 and nightly to see if the allocations from the interpolations are gone?

https://github.com/SciML/OrdinaryDiffEq.jl/issues/1502

Try the example from Significant allocations with Callbacks (Tsit5) . It’s the same problem as callbacks. That interpolation inference bug makes it so every use of the interpolations has allocations. In theory it was fixed by

https://github.com/JuliaLang/julia/pull/43990

it would be good to get a confirmation of that, since it should be on v1.8. If that’s fixed, then the rest should be much simpler.

2 Likes

Thanks for your reply!

TL;DR: Both for the Callbacks example and for the delayed Kuramoto, the allocations are significantly reduced but the computation times remain similar. For the delayed Kuramoto case, unfortunately the Python Jitcdde implementation is still at least 3 orders of magnitude more performant.

You can find the details bellow:

Example from Significant allocations with Callbacks (Tsit5)

Using

[1dea7af3] OrdinaryDiffEq v6.10.0
[6e4b80f9] BenchmarkTools v1.3.1

Running on Julia versions 1.8.0-beta3 and1.9.0-DEV.439 (current nightly)

@btime solve(prob,Tsit5()) # no callback

and

@btime solve(prob,Tsit5(), callback=cellFate) # with callback

I get

| n             | Julia version | time       | allocations |
|---------------|---------------|------------|-------------|
| no callback   | 1.8.0-beta3   | 661.347 μs | 997.27 KiB  |
|               | 1.9.0-DEV.439 | 673.936 μs | 997.27 KiB  |
|---------------|---------------|------------|-------------|
| with callback | 1.8.0-beta3   | 2.288 ms   | 1021.34 KiB |
|               | 1.9.0-DEV.439 | 2.292 ms   | 1021.34 KiB |

So there is no difference in performance nor in allocations between these Julia versions. Compared to the values reported in the original post (before the fix), there is a big improvement in terms of allocations but the performance ratio (time with call back / time without callback) remains similar.

Delayed Kuramoto

Using

[bcd4f6db] DelayDiffEq v5.35.1
[31c24e10] Distributions v0.25.58
[1dea7af3] OrdinaryDiffEq v6.10.0
[6e4b80f9] BenchmarkTools v1.3.1

the following code (a wrap of all your suggestions from the previous replies)

using DelayDiffEq
using Distributions
using Random
using BenchmarkTools

function f!(dθ, θ, h::H, p, t) where H
    ω, A = p
    n = length(θ)
    lags = reshape(lag, n,n)
    @inbounds for j in 1:n
        coupling = 0.0
        @inbounds for i in 1:n
            coupling += A[i,j]*sin(h(p, t-lags[i,j]; idxs=i) - θ[j])
        end
        dθ[j] = ω[j] + coupling
    end
    nothing
end

n = 10
Random.seed!(1)
ω = rand(n)
A = rand(n,n)
const lag = rand(n*n)
θ₀ = rand(Uniform(0, 2π), n)
p = (ω, A)
const past = rand(Uniform(0, 2π), n)
h(p, t; idxs=nothing) = typeof(idxs) <: Number ? past[idxs] : past

prob = DDEProblem(f!, θ₀, h, (0.0, 1.0), p, constant_lags=lag)
@btime solve(prob, MethodOfSteps(BS3()), saveat=0.01, reltol=0.0, abstol=1e-5)

depending on the number of oscillators n, gives the follwing results:

@btime solve(prob, MethodOfSteps(BS3()), saveat=0.01, reltol=0.0, abstol=1e-5)
| n  | Julia version | time      | allocations |
|----|---------------|-----------|-------------|
| 10 | 1.8.0-beta3   | 10.754 s  | 213.43 MiB  |
|    | 1.9.0-DEV.439 | 10.063 s  | 213.43 MiB  |
|----|---------------|-----------|-------------|
| 20 | 1.8.0-beta3   | 112.689 s | 1.74 GiB    |
|    | 1.9.0-DEV.439 | 119.260 s | 1.49 GiB    |

For the case of n = 10, those are half the allocations that you reported above but with not much difference in time.

On the other hand, when using n = 20 oscillators, the results from both Julia versions (which in this case are slightly different) are similar to those I posted above (before the fix):

@time solve(prob, MethodOfSteps(BS3()), saveat=0.01, reltol=0, abstol=1e-5)
119.585544 seconds (11.80 M allocations: 1.706 GiB, 3.26% gc time)

When solving with n = 20, the warning of Interrupted. Larger maxiters is needed. was raised. And I realized that the solver that I was using (BS3, matching the Jitcdde implementation) is a non-stiff solver and probably many Kuramotos with very different frequencies is a stiff system. When using the recommended TRBDF2 solver for stiff ODEs, these are the results:

@btime solve(prob, MethodOfSteps(TRBDF2()), saveat=0.01, reltol=0.0, abstol=1e-5)
| n  | Julia version | min time   | allocations |
|----|---------------|------------|-------------|
| 10 | 1.8.0-beta3   | 993.642 ms | 52.72 MiB   |
|    | 1.9.0-DEV.439 | 1.104 s    | 52.72 MiB   |
|----|---------------|------------|-------------|
| 20 | 1.8.0-beta3   | 193.267 s  | 1.62 GiB    |
|    | 1.9.0-DEV.439 | 1566.749 s | 1.62 GiB    |

So you can see that TRBDF2 greatly improved the time and allocations in the case of the n = 10 but for n = 20 the performance got much worse (with a greater difference in the nightly version). The maxiter warning was still raised for n = 20 when using TRBDF2, so I’m still looking for the cause of that. If you have any insight, I’ll be very grateful.

Unfortunately, the Python implementation with Jitcdde still seems more performant that the Julia implementation. For n = 10 we don’t have the maxiter warning and the best I could get with TRBDF2 was ~ 1 s, while with the Python Jitcdde (even with the default Bogacki–Shampine pair solver) it takes ~ 0.005 s for the n = 10 case, and ~ 0.007 s for n = 20.

By the way, what is the right way to include the jacobian for a DDE? Is it possible to calculate the symbolic jac automatically? I was trying with modelingtoolkitize without success. This won’t solve the probem causeing the maxiter warining but at least I wanted to see how much I could improve the performance for the n = 10 case.

Can you share some flamegraphs with GitHub - tkluck/StatProfilerHTML.jl: Show Julia profiling data in an explorable HTML page ?