Hello! Brand new to Julia. I have a very stiff system of ODEs that I’m trying to optimize for large N. I’ve achieved reasonable performance in Python using SciPy+Numba, but I wanted to see if Julia would be superior. So far it’s been 2-3x slower (and worse as N increases to, say, 100), and I can’t tell what’s going wrong. I’ve run time and profile and noticed that the memory allocation is quite severe, which is confusing to me because as far as I can tell I’m just mutating my arrays.

Here is my MWE in Julia:

```
using OrdinaryDiffEq, BenchmarkTools, LSODA, Plots, Profile
N = 20
parameters = [2, N,
-500, 0.2, 0.0, 100.0, 0.7, 0.0, 15.0, 0.24, 0.02, 1700.0, 8.0, 0.19, -0.5,
25.0, 6.0, 12.0, 10.0,
2.0*1.4 .* (rand(Float64, (N, 2)) .- 0.5), 0.5 .* (rand(Float64, (N, N)) .- 0.8), 0.14 .* (rand(Float64, (N, 1)) .- 0.5),
rand(Float64, (2, 4))]
function forward_RHS(du, u, p, t)
v_s = view(u, 1:p[2])
u_s = view(u, p[2]+1:2*p[2])
w_s = view(u, 2*p[2]+1:3*p[2])
s = view(u, 3*p[2]+1:4*p[2])
du[1:p[2]] = ((p[3].*(v_s .- p[4]).^2.0 .+ p[5]).*(v_s .- 1.0) .- p[12].*u_s.*(v_s .- p[14]) .- p[13].*w_s.*(v_s .- p[15]) .+ p[22] .+ p[20]*sin.(p[23][:, 1].*t) .+ p[21]*s)./p[16]
du[p[2]+1:2*p[2]] = (exp.(p[6].*(v_s .- p[7])) .+ p[8] .- u_s)./p[17]
du[2*p[2]+1:3*p[2]] = (p[9].*(v_s .- p[10]).^2.0 .+ p[11] .- w_s)./p[18]
du[3*p[2]+1:4*p[2]] = (-s .+ (v_s .> 0.25).*(v_s .< 0.3).*(du[1:p[2]] .> 0.0).*du[1:p[2]]./0.05)./p[19]
end
prob = ODEProblem(forward_RHS, zeros(4*N), (0.0, 500.0), parameters)
@time solution = solve(prob, lsoda(), saveat=0.1, reltol=1e-8, abstol=1e-8)
plot(solution, vars = (1:N), plotdensity = 10000, lw = 2.0)
```

On my laptop, time gives

“6.487295 seconds (13.95 M allocations: 1.183 GiB, 1.89% gc time)”

Here it is in Python:

```
import numba
import numpy
import time as timer
from scipy import integrate
from matplotlib import pyplot
N = 20
parameters = (0, 2, N,
-500, 0.2, 0.0, 100.0, 0.7, 0.0, 15.0, 0.24, 0.02, 1700.0, 8.0, 0.19, -0.5,
25.0, 6.0, 12.0, 10.0,
2.0*1.4 * (numpy.random.rand(N, 2) - 0.5), 0.5 * (numpy.random.rand(N, N) - 0.8), 0.14 * (numpy.random.rand(N, ) - 0.5),
numpy.random.rand(2, 4))
@numba.njit
def forward_RHS(u, t, *p):
v_s = u[0:p[2]]
u_s = u[p[2]:2*p[2]]
w_s = u[2*p[2]:3*p[2]]
s = u[3*p[2]:4*p[2]]
du = numpy.zeros((4*p[2], ))
du[0:p[2]] = ((p[3]*(v_s - p[4])**2.0 + p[5])*(v_s - 1.0) - p[12]*u_s*(v_s - p[14]) - p[13]*w_s*(v_s - p[15]) + p[22] + p[20]@numpy.sin(p[23][:, 1]*t) + p[21]@s)/p[16]
du[p[2]:2*p[2]] = (numpy.exp(p[6]*(v_s - p[7])) + p[8] - u_s)/p[17]
du[2*p[2]:3*p[2]] = (p[9]*(v_s - p[10])**2.0 + p[11] - w_s)/p[18]
du[3*p[2]:4*p[2]] = (-s + (v_s > 0.25)*(v_s < 0.3)*(du[0:p[2]] > 0.0)*du[0:p[2]]/0.05)/p[19]
return du
tic = timer.time()
y = integrate.odeint(forward_RHS, numpy.zeros(4*N, ), numpy.linspace(0.0, 500.0, 5000), args = parameters, rtol = 1e-8, atol = 1e-8)
print(str(timer.time() - tic) + ' seconds')
pyplot.plot(y[:, 0:N])
```

On my laptop this takes ~2.4 seconds.

I’ve looked through the performance tips, but I remain confused. Am I missing something obvious?