# Reduce allocations (increase the running speed) for large scale ODEs (136 equations)

Hi, I am wondering how to improve the speed of large scale ODEs, which includes 136 equations. I already use the staticarray, and for loop instead of broadcast to speed up the program and it’s already 10 times faster than Matlab. I also tested and found that without saving everystep, it only has 53 allocations, which is `tspan` independent, that means the program already quite good. I am just wondering anyone can give me more advices to improve it. Any suggestions are welcome. Here is my program, the `Constants.jl` doesn’t need to change, and the long `Test3DMolasses.jl` can be get in this link
https://github.com/xspeng1999/Test3DMolasses/blob/main/Test3DMolasses
I know it’s quite long, I just wondering whether I make obvious stupid mistakes, thanks in advance~

``````using DifferentialEquations, LinearAlgebra, StaticArrays
using Random, Statistics
include("Test3DMolasses.jl")
include("Constants.jl")

# parameters defination
vx, vy, vz = rand(3)
rx, ry, rz = rand(3)
theta1, theta2 = 2pi * rand(2)
para01 = @SVector [vx, vy, vz, rx, ry, rz, theta1, theta2]

delta = 20rand(4, 12)
delta = SMatrix{4,12,Float64}(delta)

s = @SVector [1.0, 1.0, 1.0, 1.0]
Polar = @SVector [1.0, 1.0, 1.0, -1.0]

Len = length(Polar)
a = complex(zeros(Len))
b = complex(zeros(Len))
c = complex(zeros(Len))

p = (para01, delta, s, Polar, a, b, c, Len)

# initial value
II = zeros(136)
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
II = 1.0 / 12.0
u0 = complex(II)

tend = 1e-5
tstart = 0
tspan = (tstart * Gamma, tend * Gamma)

prob = ODEProblem(Test3DMolasses, u0, tspan, p)
@time y = solve(prob, Tsit5(), save_everystep=false)  # 53 allocations in my computer

``````

The following program is saved in a seperate file `Constants.jl`

``````## Physical constants
const h = 6.626e-34
Mu = 1.660538e-27
Mn = 59
const Mm = Mu * Mn
const lamda = 606.3e-9
const k = 2pi / lamda
const Gamma = 2pi * 8.3e6
const velfact = Gamma / k
ZXtemp = [0, 0, 0, 76.2939, 122.9512, 122.9512, 122.9512, 147.8177, 147.8177, 147.8177, 147.8177, 147.8177]   # importdata('X_ZeemanCaF.mat');    # 0:1:1000G
const ZX = round.((ZXtemp .- ZXtemp) / 8.3, digits=2)   # unit Gamma  relative to the lowest level round to 0.01 Gamma
ZBtemp = [-5 * 2 * pi * 1e6, 0, 0, 0] / Gamma
const ZB = round.(ZBtemp, digits=2)        # round(ZB*100)/100;     # round to 0.01Gamma

#=
#Fp  0           1            1        1
#    13          14          15       16
#mFp 0           1            0       -1
ftemp = [-0.1355 0.4649 -0.4645 0.0000    #  1  1  1
-0.1355 0.4650 -0.0001 -0.4641    #  0  2  1
-0.1354 -0.0000 0.4646 -0.4642    # -1  3  1
-0.0020 -0.4705 0.4714 -0.4722    #  0  4  0
0.5614 0.0000 -0.1853 0.1846     # -1  5  1
-0.5612 0.1865 -0.0010 -0.1832    #  0  6  1
-0.5610 0.1851 -0.1844 0.0000    #  1  7  1
-0.0000 0.0000 -0.0000 0.4083    # -2  8  2
0.0009 -0.0000 0.2880 0.2894     # -1  9  2
-0.0011 -0.1659 -0.3334 -0.1675    #  0  10 2
-0.0009 -0.2880 -0.2894 0.0000    #  1  11 2
0.0000 0.4083 0.0000 0.0000]   #  2  12 2
=#

ftemp = [-0.1354 0.4646 0.4646 0
0.1354 -0.4646 0 0.4646
-0.1354 0 -0.4646 -0.4646
0 0.4714 0.4714 0.4714
0.5612 0 0.1849 0.1849
-0.5612 0.1849 0 -0.1849
0.5612 -0.1849 -0.1849 0
0 0 0 0.4082
0 0 0.2887 -0.2887
0 0.1667 -0.3333 0.1667
0 -0.2887 0.2887 0
0 0.4082 0 0]   #  from Yang's method

for i = 1:4
ftemp[:, i] = ftemp[:, i] ./ sqrt(sum(ftemp[:, i] .^ 2))
end

const f = ftemp
const r = ftemp   # the branching ratio

``````

You have a lot of repeated computations, that could be done only once in your Test3DMolasses function. For example:

``````dy += -factor * s[jj] * (f[1, 1] * a[jj] * cis((delta[jj, 1] + ZB) * t) * conj(y) + f[1, 2] * b[jj] * cis((delta[jj, 1] + ZB) * t) * conj(y) + f[1, 3] * a[jj] * cis((delta[jj, 1] + ZB) * t) * conj(y) - conj(f[1, 1] * a[jj]) * cis(-(delta[jj, 1] + ZB) * t) * y - conj(f[1, 2] * b[jj]) * cis(-(delta[jj, 1] + ZB) * t) * y - conj(f[1, 3] * a[jj]) * cis(-(delta[jj, 1] + ZB) * t) * y)
dy += -factor * s[jj] * (f[1, 1] * a[jj] * cis((delta[jj, 1] + ZB) * t) * conj(y) + f[1, 2] * b[jj] * cis((delta[jj, 1] + ZB) * t) * conj(y) + f[1, 3] * a[jj] * cis((delta[jj, 1] + ZB) * t) * conj(y) - conj(f[2, 1] * b[jj]) * cis(-(delta[jj, 2] + ZB) * t) * y - conj(f[2, 2] * c[jj]) * cis(-(delta[jj, 2] + ZB) * t) * y - conj(f[2, 3] * b[jj]) * cis(-(delta[jj, 2] + ZB) * t) * y - conj(f[2, 4] * a[jj]) * cis(-(delta[jj, 2] + ZB) * t) * y)
...
``````

If you compute the common parts into factors and do not repeat them, your computation will very likely be faster.

Good suggestion，I will try ~

Regarding the inital topic: “reducing allocations”: you already did a pretty good job here - I looked at the code and didn’t see where they could come from. After reading that i’ts only 53 independent of tspan, I think that allocation-wise it is optimal and I guess there are no allocations in your Test3DMolasses. Your 53 allocations have negligible influence on the runtime.

So yeah I would second @lmiq - the main reserves probably are in the pre-calculation of the common subexpressions.

1 Like