Reduce views for a function that is applied piece-wise over an array

Datseris · February 4, 2018, 11:39am

I have a function f! that operates in-place in a D dimensional Vector, used in DifferentialEquations. For example

@inline @inbounds function f!(du, u, p, t)
    σ = p[1]; ρ = p[2]; β = p[3]
    du[1] = σ*(u[2]-u[1])
    du[2] = u[1]*(ρ-u[3]) - u[2]
    du[3] = u[1]*u[2] - β*u[3]
    return nothing
end

However, I am referring to the general case where I have a general f! that operates in-place on a D-element Vector.

Now I want to modify this function, so that it operates on k such vectors “in parallel”. These Vectors are currently the columns of a matrix S, but the storage format does not matter and can be changed.

What I Do is

function create_parallel_eom(f!, S)
    k = size(S)[2]

    veom! = (du, u, p, t) -> begin
        for j in 1:k
            f!(view(du, :, j), view(u, :, j), p, t)
        end
        return
    end
    return veom!
end

and then use this veom! to create a new ODEProblem. If I try to do this for 2 parallel vectors, I get a slowdown of 150x instead of the “theoretical” 2x, simply because the vector field function is absurdly expensive compared to calculating it twice without views.

Any tips for making this more performant? At the moment I am looking at ArrayPartitions.

kristoffer.carlsson · February 4, 2018, 11:46am

Please provide enough data so that we can actually run the code and reproduce the slowdown.

Datseris · February 4, 2018, 11:47am

Sorry, you are very right:

using OrdinaryDiffEq, BenchmarkTools
@inline @inbounds function lorenz63_eom(du, u, p, t)
    σ = p[1]; ρ = p[2]; β = p[3]
    du[1] = σ*(u[2]-u[1])
    du[2] = u[1]*(ρ-u[3]) - u[2]
    du[3] = u[1]*u[2] - β*u[3]
    return nothing
end
p = [10,28,8/3]
u0 = [1.0;0.0;0.0]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz63_eom,u0,tspan,p)
@btime solve(prob,Tsit5())

# Function that evolves `k` orbits in parallel:
S = rand(3, 2)
function f(prob, S)
    D = length(prob.u0)
    f! = prob.f
    k = size(S)[2]

    veom! = (du, u, p, t) -> begin
        for j in 1:k
            f!(view(du, :, j), view(u, :, j), p, t)
        end
        return
    end

    varprob = ODEProblem{true}(veom!, S, prob.tspan, prob.p)
end
varprob = f(prob, S)
@btime solve(varprob,Tsit5());

Datseris · February 4, 2018, 11:58am

Huuuuh that is weird? I now run the same example and get almost same speeds with 1st and 2nd approach (the first evolves 1, the second evolves 2 trajectories).

1.178 ms (12613 allocations: 1.33 MiB)
1.762 ms (47011 allocations: 3.14 MiB)

Super confused.

Edit: ArrayPartition:

function g(prob, S)
    f! = prob.f
    k = size(S)[2]
    u0 = ArrayPartition((S[:, i] for i in 1:k)...)

    veom! = (du, u, p, t) -> begin
        for j in 1:k
            f!(du.x[j], u.x[j], p, t)
        end
        return
    end
    varprob = ODEProblem{true}(veom!, u0, prob.tspan, prob.p)
end
varprob = g(prob,  S)
@btime solve(varprob,Tsit5());

is super slow and gives 35ms…

stevengj · February 4, 2018, 12:22pm

I think that the basic issue is that allocating zillions of tiny view objects in your inner loop is expensive. (This should get better in Julia 0.7, but in general I suspect that you are using a suboptimal data structure.)

I’m also not sure why you think that operating on an array means “in parallel”. I don’t see a @threads or anything in your code that would imply parallelization.

If you have a bunch of D-dimensional vectors where D is small (e.g. 2 or 3), I would strongly consider using a collection of SVectors (from GitHub - JuliaArrays/StaticArrays.jl: Statically sized arrays for Julia).

Then you could do, e.g.

f(u,p) = SVector(p[1]*(u[2]-u[1]), u[1]*(p[2]-u[3])-u[2], u[1]*u[2]-p[3]*u[3])

U,P = ....arrays of SVector u and p
DU = f.(U,P)
DU .= f.(U,P)  # update DU in-place

The choice of data structure depends a lot on what sort of computations you need to perform, of course, but as a general rule of thumb I find that problems involving lots of small fixed-size arrays are much faster using StaticArrays.

Datseris · February 4, 2018, 12:50pm

You are very right. At the moment I have to assume a specific form of equations of motion, and go with it: either in-place using Vector or out-of-place using SVector. I chose to go with the first one, because it is maybe a bit more “scalable”. Maybe it is time to support both versions, but I don’t have so much free time

I do use SVectors all the time in Discrete systems however.

Your suggestion of a Vector of SVector seems a good one (although I do not know how good are vectors of stuff handled by DiffEq). But I also need an option for the in-place form of equations of motion.

I suspect the same, which is why I posted here!

What I meant is that 2 trajectories are evolved “in parallel”. It didn’t have anything to do with computer parallelization, it was more about the actual physical problem.

ChrisRackauckas · February 4, 2018, 3:10pm

You can use Vectors of SVectors with DiffEq just fine. That’s a great suggestion for this case.

Topic		Replies	Views
Parallel implementation is not very effective General Usage question , performance , parallel , multithreading	19	790	October 8, 2022
Making a view of a combination of array indexes and scalars, creating an array to send to a function Performance vector , views	7	271	May 25, 2023
Usage of arrays of static arrays New to Julia performance , staticarrays	16	1109	February 22, 2023
Reduce allocations for shaped views Performance	9	1002	February 13, 2019
Trying to improve readability of an ODE function; attempt using views adds allocations New to Julia	10	628	May 1, 2019

Reduce views for a function that is applied piece-wise over an array

Related topics