However, I am referring to the general case where I have a general f! that operates in-place on a D-element Vector.
Now I want to modify this function, so that it operates on k such vectors “in parallel”. These Vectors are currently the columns of a matrix S, but the storage format does not matter and can be changed.
What I Do is
function create_parallel_eom(f!, S)
k = size(S)[2]
veom! = (du, u, p, t) -> begin
for j in 1:k
f!(view(du, :, j), view(u, :, j), p, t)
end
return
end
return veom!
end
and then use this veom! to create a new ODEProblem. If I try to do this for 2 parallel vectors, I get a slowdown of 150x instead of the “theoretical” 2x, simply because the vector field function is absurdly expensive compared to calculating it twice without views.
Any tips for making this more performant? At the moment I am looking at ArrayPartitions.
Huuuuh that is weird? I now run the same example and get almost same speeds with 1st and 2nd approach (the first evolves 1, the second evolves 2 trajectories).
1.178 ms (12613 allocations: 1.33 MiB)
1.762 ms (47011 allocations: 3.14 MiB)
Super confused.
Edit: ArrayPartition:
function g(prob, S)
f! = prob.f
k = size(S)[2]
u0 = ArrayPartition((S[:, i] for i in 1:k)...)
veom! = (du, u, p, t) -> begin
for j in 1:k
f!(du.x[j], u.x[j], p, t)
end
return
end
varprob = ODEProblem{true}(veom!, u0, prob.tspan, prob.p)
end
varprob = g(prob, S)
@btime solve(varprob,Tsit5());
I think that the basic issue is that allocating zillions of tiny view objects in your inner loop is expensive. (This should get better in Julia 0.7, but in general I suspect that you are using a suboptimal data structure.)
I’m also not sure why you think that operating on an array means “in parallel”. I don’t see a @threads or anything in your code that would imply parallelization.
f(u,p) = SVector(p[1]*(u[2]-u[1]), u[1]*(p[2]-u[3])-u[2], u[1]*u[2]-p[3]*u[3])
U,P = ....arrays of SVector u and p
DU = f.(U,P)
DU .= f.(U,P) # update DU in-place
The choice of data structure depends a lot on what sort of computations you need to perform, of course, but as a general rule of thumb I find that problems involving lots of small fixed-size arrays are much faster using StaticArrays.
You are very right. At the moment I have to assume a specific form of equations of motion, and go with it: either in-place using Vector or out-of-place using SVector. I chose to go with the first one, because it is maybe a bit more “scalable”. Maybe it is time to support both versions, but I don’t have so much free time
Your suggestion of a Vector of SVector seems a good one (although I do not know how good are vectors of stuff handled by DiffEq). But I also need an option for the in-place form of equations of motion.
I suspect the same, which is why I posted here!
What I meant is that 2 trajectories are evolved “in parallel”. It didn’t have anything to do with computer parallelization, it was more about the actual physical problem.