Optimal approach to simulating large system of time-dependent ODEs?

I’m looking to solve a large (N ~ 1000) system of 1st order, linear ODEs with time-dependent coefficients,
\frac{d u}{d t} = A(t) \cdot u(t).
Because both the numerical constants involved in the coefficients and the number of equations N are not fixed, I have to generate them on-the-fly, using Symbolics.jl to generate a function to return the time-varying coefficient matrix A(t).
I have a MWE to do so:

using DifferentialEquations, Symbolics, BenchmarkTools
@variables x
N = 10;

f = zeros(Num, N,N);
for i in 1:N
    for j in 1:i
        f[i, j] = cos((i-j) * x)
        f[j, i] = cos((i-j) * x)
    end
end

h = eval(build_function(f, x, parallel=Symbolics.MultithreadedForm())[2]);

A0 = zeros(ComplexF64, N,N);
function ih(A, u, p, t)
    h(A, t)
    A .*= im
end
ih(A0, nothing, nothing, 0);
A = DiffEqArrayOperator(A0, update_func=ih);
prob = ODEProblem(A, ones(ComplexF64, N), (0., 10.));
@btime solve(prob, saveat = 0.1, abstol = 1e-6, reltol = 1e-6, maxiters=1e8)

which apparently evaluates in times on the order of ms

# N = 10
@btime solve(prob, saveat = 0.1, abstol = 1e-6, reltol = 1e-6, maxiters=1e8)
  5.370 ms (22878 allocations: 3.24 MiB)

# N = 20
@btime solve(prob, saveat = 0.1, abstol = 1e-6, reltol = 1e-6, maxiters=1e8)
  20.012 ms (38128 allocations: 4.68 MiB)

The time seems to increase quadratically with N, at least for small N. Are there any optimizations I can make to speed this up?

Turn off multithreading. That is way too small for multithreading to make sense and will only slow it down.