Reduce allocations (increase the running speed) for large scale ODEs (136 equations)

xspeng · October 25, 2022, 3:30pm

Hi, I am wondering how to improve the speed of large scale ODEs, which includes 136 equations. I already use the staticarray, and for loop instead of broadcast to speed up the program and it’s already 10 times faster than Matlab. I also tested and found that without saving everystep, it only has 53 allocations, which is tspan independent, that means the program already quite good. I am just wondering anyone can give me more advices to improve it. Any suggestions are welcome. Here is my program, the Constants.jl doesn’t need to change, and the long Test3DMolasses.jl can be get in this link
https://github.com/xspeng1999/Test3DMolasses/blob/main/Test3DMolasses
I know it’s quite long, I just wondering whether I make obvious stupid mistakes, thanks in advance~

using DifferentialEquations, LinearAlgebra, StaticArrays
using Random, Statistics
include("Test3DMolasses.jl")
include("Constants.jl")

# parameters defination
vx, vy, vz = rand(3)
rx, ry, rz = rand(3)
theta1, theta2 = 2pi * rand(2)
para01 = @SVector [vx, vy, vz, rx, ry, rz, theta1, theta2]

delta = 20rand(4, 12)
delta = SMatrix{4,12,Float64}(delta)

s = @SVector [1.0, 1.0, 1.0, 1.0]
Polar = @SVector [1.0, 1.0, 1.0, -1.0]

Len = length(Polar)
a = complex(zeros(Len))
b = complex(zeros(Len))
c = complex(zeros(Len))

p = (para01, delta, s, Polar, a, b, c, Len)

# initial value
II = zeros(136)
II[1] = 1.0 / 12.0
II[13] = 1.0 / 12.0
II[24] = 1.0 / 12.0
II[34] = 1.0 / 12.0
II[43] = 1.0 / 12.0
II[51] = 1.0 / 12.0
II[58] = 1.0 / 12.0
II[64] = 1.0 / 12.0
II[69] = 1.0 / 12.0
II[73] = 1.0 / 12.0
II[76] = 1.0 / 12.0
II[78] = 1.0 / 12.0
u0 = complex(II)

tend = 1e-5
tstart = 0
tspan = (tstart * Gamma, tend * Gamma)

prob = ODEProblem(Test3DMolasses, u0, tspan, p)
@time y = solve(prob, Tsit5(), save_everystep=false)  # 53 allocations in my computer

The following program is saved in a seperate file Constants.jl

## Physical constants
const h = 6.626e-34
Mu = 1.660538e-27
Mn = 59
const Mm = Mu * Mn     
const lamda = 606.3e-9
const k = 2pi / lamda
const Gamma = 2pi * 8.3e6
const velfact = Gamma / k
ZXtemp = [0, 0, 0, 76.2939, 122.9512, 122.9512, 122.9512, 147.8177, 147.8177, 147.8177, 147.8177, 147.8177]   # importdata('X_ZeemanCaF.mat');    # 0:1:1000G
const ZX = round.((ZXtemp .- ZXtemp[1]) / 8.3, digits=2)   # unit Gamma  relative to the lowest level round to 0.01 Gamma
ZBtemp = [-5 * 2 * pi * 1e6, 0, 0, 0] / Gamma
const ZB = round.(ZBtemp, digits=2)        # round(ZB*100)/100;     # round to 0.01Gamma

#=
#Fp  0           1            1        1
#    13          14          15       16
#mFp 0           1            0       -1
ftemp = [-0.1355 0.4649 -0.4645 0.0000    #  1  1  1
    -0.1355 0.4650 -0.0001 -0.4641    #  0  2  1
    -0.1354 -0.0000 0.4646 -0.4642    # -1  3  1
    -0.0020 -0.4705 0.4714 -0.4722    #  0  4  0
    0.5614 0.0000 -0.1853 0.1846     # -1  5  1
    -0.5612 0.1865 -0.0010 -0.1832    #  0  6  1
    -0.5610 0.1851 -0.1844 0.0000    #  1  7  1
    -0.0000 0.0000 -0.0000 0.4083    # -2  8  2
    0.0009 -0.0000 0.2880 0.2894     # -1  9  2
    -0.0011 -0.1659 -0.3334 -0.1675    #  0  10 2
    -0.0009 -0.2880 -0.2894 0.0000    #  1  11 2
    0.0000 0.4083 0.0000 0.0000]   #  2  12 2
=#

ftemp = [-0.1354 0.4646 0.4646 0
    0.1354 -0.4646 0 0.4646
    -0.1354 0 -0.4646 -0.4646
    0 0.4714 0.4714 0.4714
    0.5612 0 0.1849 0.1849
    -0.5612 0.1849 0 -0.1849
    0.5612 -0.1849 -0.1849 0
    0 0 0 0.4082
    0 0 0.2887 -0.2887
    0 0.1667 -0.3333 0.1667
    0 -0.2887 0.2887 0
    0 0.4082 0 0]   #  from Yang's method

for i = 1:4
    ftemp[:, i] = ftemp[:, i] ./ sqrt(sum(ftemp[:, i] .^ 2))
end

const f = ftemp
const r = ftemp   # the branching ratio

lmiq · October 25, 2022, 4:53pm

You have a lot of repeated computations, that could be done only once in your Test3DMolasses function. For example:

dy[1] += -factor * s[jj] * (f[1, 1] * a[jj] * cis((delta[jj, 1] + ZB[1]) * t) * conj(y[89]) + f[1, 2] * b[jj] * cis((delta[jj, 1] + ZB[2]) * t) * conj(y[90]) + f[1, 3] * a[jj] * cis((delta[jj, 1] + ZB[3]) * t) * conj(y[91]) - conj(f[1, 1] * a[jj]) * cis(-(delta[jj, 1] + ZB[1]) * t) * y[89] - conj(f[1, 2] * b[jj]) * cis(-(delta[jj, 1] + ZB[2]) * t) * y[90] - conj(f[1, 3] * a[jj]) * cis(-(delta[jj, 1] + ZB[3]) * t) * y[91])
dy[2] += -factor * s[jj] * (f[1, 1] * a[jj] * cis((delta[jj, 1] + ZB[1]) * t) * conj(y[93]) + f[1, 2] * b[jj] * cis((delta[jj, 1] + ZB[2]) * t) * conj(y[94]) + f[1, 3] * a[jj] * cis((delta[jj, 1] + ZB[3]) * t) * conj(y[95]) - conj(f[2, 1] * b[jj]) * cis(-(delta[jj, 2] + ZB[1]) * t) * y[89] - conj(f[2, 2] * c[jj]) * cis(-(delta[jj, 2] + ZB[2]) * t) * y[90] - conj(f[2, 3] * b[jj]) * cis(-(delta[jj, 2] + ZB[3]) * t) * y[91] - conj(f[2, 4] * a[jj]) * cis(-(delta[jj, 2] + ZB[4]) * t) * y[92])
...

If you compute the common parts into factors and do not repeat them, your computation will very likely be faster.

xspeng · October 25, 2022, 5:11pm

Good suggestion，I will try ~

j-fu · October 25, 2022, 6:54pm

Regarding the inital topic: “reducing allocations”: you already did a pretty good job here - I looked at the code and didn’t see where they could come from. After reading that i’ts only 53 independent of tspan, I think that allocation-wise it is optimal and I guess there are no allocations in your Test3DMolasses. Your 53 allocations have negligible influence on the runtime.

So yeah I would second @lmiq - the main reserves probably are in the pre-calculation of the common subexpressions.

Topic		Replies	Views
Reducing the memory allocation in ODEs Performance question	11	1325	January 22, 2022
Reduce allocations for 3D Arrays General Usage question	10	383	November 2, 2022
Increase the speed of For Loop in julia General Usage question	9	538	November 10, 2022
Advanced tricks to reduce memory allocations of ODE Performance diffeq	7	531	June 27, 2023
Reduce allocations for matrix differential equations function Performance	10	1329	February 8, 2019

Reduce allocations (increase the running speed) for large scale ODEs (136 equations)

Related topics