Reduce views for a function that is applied piece-wise over an array

I have a function f! that operates in-place in a D dimensional Vector, used in DifferentialEquations. For example

@inline @inbounds function f!(du, u, p, t)
    σ = p[1]; ρ = p[2]; β = p[3]
    du[1] = σ*(u[2]-u[1])
    du[2] = u[1]*(ρ-u[3]) - u[2]
    du[3] = u[1]*u[2] - β*u[3]
    return nothing
end

However, I am referring to the general case where I have a general f! that operates in-place on a D-element Vector.

Now I want to modify this function, so that it operates on k such vectors “in parallel”. These Vectors are currently the columns of a matrix S, but the storage format does not matter and can be changed.

What I Do is

function create_parallel_eom(f!, S)
    k = size(S)[2]

    veom! = (du, u, p, t) -> begin
        for j in 1:k
            f!(view(du, :, j), view(u, :, j), p, t)
        end
        return
    end
    return veom!
end

and then use this veom! to create a new ODEProblem. If I try to do this for 2 parallel vectors, I get a slowdown of 150x instead of the “theoretical” 2x, simply because the vector field function is absurdly expensive compared to calculating it twice without views.

Any tips for making this more performant? At the moment I am looking at ArrayPartitions.

Please provide enough data so that we can actually run the code and reproduce the slowdown.

Sorry, you are very right:

using OrdinaryDiffEq, BenchmarkTools
@inline @inbounds function lorenz63_eom(du, u, p, t)
    σ = p[1]; ρ = p[2]; β = p[3]
    du[1] = σ*(u[2]-u[1])
    du[2] = u[1]*(ρ-u[3]) - u[2]
    du[3] = u[1]*u[2] - β*u[3]
    return nothing
end
p = [10,28,8/3]
u0 = [1.0;0.0;0.0]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz63_eom,u0,tspan,p)
@btime solve(prob,Tsit5())

# Function that evolves `k` orbits in parallel:
S = rand(3, 2)
function f(prob, S)
    D = length(prob.u0)
    f! = prob.f
    k = size(S)[2]

    veom! = (du, u, p, t) -> begin
        for j in 1:k
            f!(view(du, :, j), view(u, :, j), p, t)
        end
        return
    end

    varprob = ODEProblem{true}(veom!, S, prob.tspan, prob.p)
end
varprob = f(prob, S)
@btime solve(varprob,Tsit5());

Huuuuh that is weird? I now run the same example and get almost same speeds with 1st and 2nd approach (the first evolves 1, the second evolves 2 trajectories).

  1. 1.178 ms (12613 allocations: 1.33 MiB)
  2. 1.762 ms (47011 allocations: 3.14 MiB)

Super confused.

Edit: ArrayPartition:

function g(prob, S)
    f! = prob.f
    k = size(S)[2]
    u0 = ArrayPartition((S[:, i] for i in 1:k)...)

    veom! = (du, u, p, t) -> begin
        for j in 1:k
            f!(du.x[j], u.x[j], p, t)
        end
        return
    end
    varprob = ODEProblem{true}(veom!, u0, prob.tspan, prob.p)
end
varprob = g(prob,  S)
@btime solve(varprob,Tsit5());

is super slow and gives 35ms…

I think that the basic issue is that allocating zillions of tiny view objects in your inner loop is expensive. (This should get better in Julia 0.7, but in general I suspect that you are using a suboptimal data structure.)

I’m also not sure why you think that operating on an array means “in parallel”. I don’t see a @threads or anything in your code that would imply parallelization.

If you have a bunch of D-dimensional vectors where D is small (e.g. 2 or 3), I would strongly consider using a collection of SVectors (from GitHub - JuliaArrays/StaticArrays.jl: Statically sized arrays for Julia).

Then you could do, e.g.

f(u,p) = SVector(p[1]*(u[2]-u[1]), u[1]*(p[2]-u[3])-u[2], u[1]*u[2]-p[3]*u[3])

U,P = ....arrays of SVector u and p
DU = f.(U,P)
DU .= f.(U,P)  # update DU in-place

The choice of data structure depends a lot on what sort of computations you need to perform, of course, but as a general rule of thumb I find that problems involving lots of small fixed-size arrays are much faster using StaticArrays.

You are very right. At the moment I have to assume a specific form of equations of motion, and go with it: either in-place using Vector or out-of-place using SVector. I chose to go with the first one, because it is maybe a bit more “scalable”. Maybe it is time to support both versions, but I don’t have so much free time :frowning:

I do use SVectors all the time in Discrete systems however.

Your suggestion of a Vector of SVector seems a good one (although I do not know how good are vectors of stuff handled by DiffEq). But I also need an option for the in-place form of equations of motion.

I suspect the same, which is why I posted here! :smiley:

What I meant is that 2 trajectories are evolved “in parallel”. It didn’t have anything to do with computer parallelization, it was more about the actual physical problem.

You can use Vectors of SVectors with DiffEq just fine. That’s a great suggestion for this case.