How to optimize Jacobian ForwardDiff.jl

Hello, I manual calculate Jacobian for the system

function TM(u, p, t)
    U(y) = p[8] + p[9] / ( 1.0 + exp( -50.0 * (y - p[7]) ) )
    σ(x) = 1.0 / ( 1.0 + exp( -20.0 * (x-p[6]) ) )
    du1 = (-u[1] + p[1] * log( 1.0 + exp( (p[5] * U(u[3]) * u[2] * u[1] + p[11]  ) / (p[1]) ) ) ) / p[2]
    du2 = (1.0 - u[2])/p[3] - U(u[3])*u[2]*u[1]
    du3 = (-u[3])/p[4] + p[10] * σ(u[2])
    return SA[du1, du2, du3]
end

Jacobian

U(y, p) = p[8] + p[9] / ( 1.0 + exp( -50.0 * (y - p[7]) ) )
σ(x, p) = 1.0 / ( 1.0 + exp( -20.0 * (x - p[6]) ) )
U_y(y, p) = ( 50.0 * p[9] * exp(-50.0 * (y - p[7])) ) / (1.0 + exp( -50.0*(y - p[7]) ) )^2
g(E, x, y, p) = exp((p[5]  * U(y, p) * x * E + p[11]) / p[1])
σ_der(x, p) = exp( (-20.0) * (x - p[6]) )
function jacob_TM_(u, p, t)
    
    E_E = ( -1.0 + ((J * U(u[3], p) * u[2] * g(u[1], u[2], u[3], p))) / (1.0 + g(u[1], u[2], u[3], p)) ) / p[2]
    E_x = (p[5] * U(u[3], p) * u[1] * g(u[1], u[2], u[3], p)) / (p[2] * (1.0 + g(u[1], u[2], u[3], p) ))
    E_y = ( p[5]  * U_y(u[3], p) * u[2] * u[1] * g(u[1], u[2], u[3], p) ) / (p[2] * (1.0 + g(u[1], u[2], u[3], p)) )
    
    x_E = -U(u[3], p)*u[2]
    x_x = -1.0 / p[3] - U(u[3], p)*u[1]
    x_y = (-U_y(u[3], p)) * u[2] * u[1]
    
    y_x = 20.0 * p[10] * σ_der(u[2], p) / (1.0 + σ_der(u[2], p))^2
    y_y = -1.0/p[4]
    
    SMatrix{3,3}(E_E, x_E, 0.0,
        E_x, x_x, y_x,
        E_y, x_y, y_y)
end

I compared my jacobian and the ForwardDiff jacobian

BenchmarkTools.Trial: 10000 samples with 600 evaluations.
 Range (min … max):  195.833 ns …  40.754 μs  ┊ GC (min … max): 0.00% … 99.38%
 Time  (median):     215.500 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   222.815 ns ± 567.948 ns  ┊ GC (mean ± σ):  3.60% ±  1.41%

         ▄       ▁▂   ▆█                                         
  ▂▂▃▂▄▆▆█▆▂▂▁▁▂▃██▃▄▆██▇▅▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  196 ns           Histogram: frequency by time          262 ns <

 Memory estimate: 80 bytes, allocs estimate: 1.
BenchmarkTools.Trial: 10000 samples with 254 evaluations.
 Range (min … max):  283.858 ns … 135.250 μs  ┊ GC (min … max): 0.00% … 99.67%
 Time  (median):     344.094 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   378.551 ns ±   1.993 μs  ┊ GC (mean ± σ):  9.03% ±  1.72%

  ▆▃▅       ▄█▆▃▅▁                                               
  ███▄▃▂▂▁▂███████▇▄▃▂▂▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  284 ns           Histogram: frequency by time          600 ns <

 Memory estimate: 336 bytes, allocs estimate: 4.

Then i calculate lyapunovspectrum with DynamicalSystem.jld and spectrum with manual Jacobian slower

BenchmarkTools.Trial: 18 samples with 1 evaluation.
 Range (min … max):  277.929 ms … 287.031 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     281.196 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   281.640 ms ±   3.042 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

         █                             ▃                         
  ▇▇▁▁▇▇▁█▁▁▇▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▇█▇▁▁▁▁▁▁▁▇▇▁▁▁▁▁▁▁▁▇▁▁▁▇ ▁
  278 ms           Histogram: frequency by time          287 ms <

 Memory estimate: 3.28 KiB, allocs estimate: 47.
BenchmarkTools.Trial: 26 samples with 1 evaluation.
 Range (min … max):  195.002 ms … 201.918 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     195.455 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   196.571 ms ±   1.996 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂ ▂                                                           
  ████▅▁▅▅▁▁▅▁▁▁▁▁▅▁▅▁▁▁▁▁▁▁▅▁▁▁▅▁▅█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▅ ▁
  195 ms           Histogram: frequency by time          202 ms <

 Memory estimate: 3.28 KiB, allocs estimate: 47.

Why manual Jacobian slower?

1 Like

ForwardDiff is able to do some SIMD that can help in a lot of cases. Also it probably outlined a few things compared to yours, for example, only computing U(u[3], p) once instead of many times.

2 Likes

Thank you!
I managed to optimize the time. Is it still possible to improve the result?
Jacobian:

BenchmarkTools.Trial: 10000 samples with 973 evaluations.
 Range (min … max):  75.026 ns …  24.038 μs  ┊ GC (min … max): 0.00% … 99.52%
 Time  (median):     93.320 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   98.275 ns ± 410.607 ns  ┊ GC (mean ± σ):  7.22% ±  1.72%

  ▅▂▇▅▄▃▂▁▁         ▅▇▅██▅▃▂▂▂▁▁▁▁▁▁▁▁                         ▂
  ██████████▇▆▆▅▅▆▄██████████████████████▇▇▇▇▆▆▆▆▆▇███▇▇▆▆▆▆▆▆ █
  75 ns         Histogram: log(frequency) by time       127 ns <

 Memory estimate: 80 bytes, allocs estimate: 1.
BenchmarkTools.Trial: 10000 samples with 267 evaluations.
 Range (min … max):  286.142 ns … 98.949 μs  ┊ GC (min … max):  0.00% … 99.54%
 Time  (median):     349.813 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):   378.079 ns ±  1.933 μs  ┊ GC (mean ± σ):  10.21% ±  1.99%

  ▆▇▆▅▄▃▂▁     ▂▃▅▃▅▇█▇▅▄▃                       ▁ ▁           ▂
  █████████▇█▇▇████████████▇▇▇▅▆▆▆▅▆▆▇▆▇▇▅▅▆▅▄▇▆█████▇▇▇▆▅▅▅▄▆ █
  286 ns        Histogram: log(frequency) by time       494 ns <

 Memory estimate: 336 bytes, allocs estimate: 4.

lyapunovspectrum:

BenchmarkTools.Trial: 32 samples with 1 evaluation.
 Range (min … max):  157.645 ms … 163.120 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     159.589 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   159.418 ms ±   1.001 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁        ▁  ▄       ██▄▁   ▁                                  
  ▆█▁▆▁▁▁▁▁▁█▁▁█▆▆▁▆▁▁▆████▆▁▆█▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  158 ms           Histogram: frequency by time          163 ms <

 Memory estimate: 3.28 KiB, allocs estimate: 47.
BenchmarkTools.Trial: 26 samples with 1 evaluation.
 Range (min … max):  194.245 ms … 205.892 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     197.141 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   197.773 ms ±   2.971 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁▁           ▁   █                                           
  ▆▁██▆▆▁▆▁▆▁▁▆▆▆█▁▆▆█▁▁▁▁▁▁▁▁▁▁▆▁▆▁▁▁▁▁▁▁▁▆▁▆▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  194 ms           Histogram: frequency by time          206 ms <

 Memory estimate: 3.28 KiB, allocs estimate: 47.

Code:

@inbounds function jacob_TM_(u, p, t)
    
    U(y, p, exp50) = p[8] + p[9] / ( 1.0 + exp50 )
    U_y(y, p, exp50) = (50.0 * p[9] * exp50) / (1.0 + exp50)^2
    g(E, x, y, p, U_) = exp((p[5]  * U_ * x * E + p[11]) / p[1])
    σ_der(x, p) = exp( (-20.0) * (x - p[6]) )
    
    exp50 = exp(-50.0 * (u[3] - p[7]))
    
    U_ = U(u[3], p, exp50)
    Uy = U_y(u[3], p, exp50)
    g_ = g(u[1], u[2], u[3], p, U_)
    σ_deri = σ_der(u[2], p)
    
    g_plus = 1.0 + g_
    g_mult = g_ * U_
    g_plus_mult = p[2] * (g_plus)
    u1p5 = p[5] * u[1]
    Uyu2 = Uy * u[2]
    
    E_E = (-1.0 + ((J * u[2] * g_mult)) / (g_plus) ) / p[2]
    E_x = (u1p5 * g_mult) / (g_plus_mult)
    E_y = (u1p5 * Uyu2 * g_) / (g_plus_mult)
    
    x_E = -U_ * u[2]
    x_x = -1.0 / p[3] - U_ * u[1]
    x_y = -Uyu2 * u[1]
    
    y_x = 20.0 * p[10] * σ_deri / (1.0 + σ_deri)^2
    y_y = -1.0/p[4]
    
    SMatrix{3,3}(E_E, x_E, 0.0,
        E_x, x_x, y_x,
        E_y, x_y, y_y)
end