Switching out the underlying array in a SubArray

FHell · May 29, 2019, 11:37am

Hi everyone,
I am developing a package for simulating (possibly very heterogeneous) dynamical systems that live on networks:

It’s very early days but usable, we just pushed v0.1. The core of this package is a loop over edges and vertices, and applies a user provided function to views of the appropriate locations in the array of states. I have copy pasted the code below.

I was initially surprised that this code allocates memory, but then I realized that this code creates a whole bunch of views at every iteration, so i assume this is the source of the allocations. It seems that that is the major performance bottleneck for simple network dynamics.

For e_int I control the array and I preallocate the views into memory, this is not possible for x and dx.

So finally, what it seems I would need to do is define a bunch of views into an array, but then be able to switch out the array on the fly. One option for this would be to create a mutable version of SubArray, this would then require mutating the Array indexed in SubArray every iteration. I have tried a bunch of other options but haven’t found anything that works.

Does somebody see a (better) option for this?

Here gs is a struct holding both the internal variable e_int, views into e_int and the index ranges required to get the right views into the state variables x and dx that we get passed from DifferentialEquations.


@with_kw struct nd_ODE_Static
    vertices!
    edges!
    graph
    graph_stucture
end

function (d::nd_ODE_Static)(dx, x, p, t)
    gs = d.graph_stucture
    @views begin
    for i in 1:gs.num_e
        d.edges![i].f!(gs.e_int[gs.e_idx[i]], x[gs.s_idx[i]], x[gs.d_idx[i]], p, t)
    end
    for i in 1:gs.num_v
        d.vertices![i].f!(dx[gs.v_idx[i]], x[gs.v_idx[i]], gs.e_s[i], gs.e_d[i], p, t)
    end
    end
    nothing
end

And here is one attempt at getting around the problem that didn’t work:

mutable struct GS
    internals
    view
    function GS(internals)
        gs = new(internals)
        gs.view = view(gs.internals, 1:3)
        gs
    end
end

arr1 = ones(5) .+ 1.
arr2 = ones(5) .+ 2.

gs = GS(arr1)

gs.view[1] # 2.
gs.internals = arr2
gs.view[1] # still 2.

Raf · May 29, 2019, 1:31pm

The .parent field in the SubArray is specifically referencing the original internals array. And SubArray is immutable so I don’t think you can change that after it’s constructed.

Why not roll your own type that does exactly what you want? SubArray isn’t very complicated, just built the parts you need into GS.

To expand on this, if you have some field that is mutable and can have a different array swapped into it, write a view field that works like SubArray but references the container field, but accesses whatever array is in it just like SubArray accesses the immutable parent field (I assuming thats how SubArray roughly works)

Edit: you will also want to put parametric types on those structs, type instability might be another source of allocations

FHell · May 29, 2019, 1:59pm

Thanks. Yeah I looked into SubArray already, that’s certainly a viable strategy.

The reason I didn’t directly go for this is that the structure of views I am creating is quite complicated (there are potentially hundreds of thousands of them), rather than updating where all of them point all the time I would have liked to have an additional level of indirection where I only have to switch out the common Array that they all look into.

Directly giving GS the right behavior is also tricky, as I have to hand over “gs.edges[i]” and similar things. so gs.edges.[i] should be a pre-allocated object that, when indexed, finds the right element in whatever gs.internal happens to be.

I’ll have to think a bit more about this. Thanks for the input.

Could you expand on your input a bit more, or point me somewhere? I was planning to add type parameters anyway but not really for instability reasons, which I didn’t realize could occur.

Raf · May 29, 2019, 2:05pm

You can just add another level of indirection then? Make the mutable field (or better Ref) in the 10000 GS structs contain another, centralised field that is a reference to the actual array. I’m not sure what the performance cost of that will be.

Also no type on struct fields means Any. ie boxed and unstable. You want parametric types:

struct GS{V,E}
    vertices::V   
    edges::E
end

FHell · May 29, 2019, 2:33pm

I have been trying to find information on how to implement what you suggest (using Ref). I haven’t found anything, could you point me to something for that?

I finally figured out a nice way to achieve everything I wanted by using a self referential struct:

import Base.getindex

struct GS_x_View
    gs
    idx_offset
end

function getindex(gs_x_view::GS_x_View, idx)
    gs_x_view.gs.x[idx + gs_x_view.idx_offset]
end

mutable struct GS
    x
    gs_x_view
    function GS(arr)
        gs = new(arr)
        gs.gs_x_view = GS_x_View(gs, 1)
        gs
    end
end

arr1 = [1.,2.,3.]
arr2 = [1.,2.,3.] .+ 0.5

gs = GS(arr1)

gs.x[1] # 1.
gs.gs_x_view[1] # 2.

gs.x = arr2

gs.x[1] # 1.5
gs.gs_x_view[1] # 2.5

Tamas_Papp · May 29, 2019, 2:37pm

This is surprising (for any nontrivial calculation).

Or, alternatively, allocate a buffer (a dense array) and just copy into that.

Since mutability precludes a lot of compiler optimizations, this may actually be slower. YMMV.

Raf · May 29, 2019, 2:43pm

Maybe not if he has to construct all 10000 nodes to change the array pointer…

FHell · May 29, 2019, 2:44pm

That’s the case here, the complexity is in the network, the rest is just additions in the testcase I am looking at. That said, I don’t trust my ability to use the profiler enough. Seems I have plenty to do to optimize performance on this.

The dense buffer is a strategy I was already using to circumvent this problem, but that adds a lot of copying which I thought I wanted to avoid. I should definitely benchmark these various options.

Raf · May 29, 2019, 2:45pm

Yep benchmark and @code_warntype everything. All the code you have posted will be pretty ugly as is…

kristoffer.carlsson · May 29, 2019, 3:30pm

First make sure that the allocations actually are a bottleneck. It will depend on how much work you do on each view. After that you can try inlining (with @inline) the function where the view is used. If the view is only used in inlined functions its allocation can usually be elided by the compiler. If that doesn’t work you can start looking at stuff like https://github.com/oschulz/UnsafeArrays.jl but bare in mind that erroneous usage will make julia crash or give you silently wrong results.

FHell · May 29, 2019, 4:07pm

I tried that but I am not sure that inlining is sticking, because the function is in a field in a struct in an array when it is called. I could refactor this to have it in an array of functions if that would help. But I will revisit all these things once I have actually fixed the basics.

kristoffer.carlsson · May 29, 2019, 4:46pm

The dynamic dispatch in calling that function might very well dominate the cost of allocating the views.

mbauman · May 29, 2019, 4:51pm

Almost certainly so. Dynamic dispatch also prevents inlining and a whole host of other optimizations. I highly recommend following these tips before trying to optimize your views:

https://docs.julialang.org/en/v1.1/manual/performance-tips/

foobar_lv2 · May 29, 2019, 5:37pm

As all the other people said, your primary issue is the fact that all your things are Any because you did not type the fields of your structs. Make the types parametric or explicit. Inference failure creates a ton of extra allocations, and expensive dynamic dispatch! In fact, the dynamic dispatch is probably more expensive than the allocations and garbage collection. Consider the many allocations as a convenient hint that your code needs better typing.

In many applications, views don’t get allocated at all. It is very unlikely that this is your performance problem.

I recommend to read @code_warntype or even better @descend with Cthulhu.jl, and fix as many typing problems as reasonably possible.

As a next issue, you have d.edges![i].f!(some_args...). This suggests that d.edges! is some kind of vector, and the elements are structs that have a field f! that is callable. Suppose d.edges! <: Vector{edgeT}. Is f! is a closure?

This kind of deep nesting is questionable. Take a piece of paper and sketch the desired memory layout of Vector{edgeT}: Where do you have indirection (pointers)? Where do you have structure padding? What are the alignments?

Then you tweak the your data structures until you are happy, and then you think about whether you can even get good-looking code with this (i.e. rewrite this inner loop to use the new structures). Do not use closures for this, instead use callable structs (i.e. explicit closures): You want to have precise control over your layout in order to avoid unnecessary indirection and structure padding. The compiler is not super good at structure layout for closures.

Could you explain what kind of network models you want to describe here? As in: You have a directed graph with vertices V, and directed edges E (presumably including self-loops). What is your state space? What is the structure of your ODE?

I would guess that state-space is one real dimension for each vertex, and D_t x_i = F_i(neighbors_in(x_i)...)? What are your assumptions, like e.g. that some F_i are the same function, or that F_i has some permutation symmetry or is linear, etc? Things like Kuramoto make crazy strong assumptions with respect to the kind of linearity/symmetry and inhomogeneity (in the sense that different vertices have qualitatively different F_i).

Can you give some hand-wavy examples with more or less assumptions that you care about?

Is your goal a convenience package (user thinks and types in terms of networks and gets to use general purpose ODE solvers) or a package where your algorithms actually use the network structure (e.g. try to precompute stuff from the sparsity structure, or use the fact that the automorphism group of your graph can give you structurally stable equivariance)?

FHell · May 29, 2019, 5:55pm

Well, I did. I profiled, I paid attention to allocations, and I probably, after several hours of studying the profiler and simplified examples, identified a cost that is not dominant (but I measured it, it is significant…).

And I indeed did not realize that the ambigious fields in a container would be a problem if I’m not dispatching on that container (after all typeof(S.x) returns the correct unboxed type rather than Any).

Plus, because the code is rather complex in order to be right, the profiler output and the llvm are pretty much unreadable to me.

Tamas_Papp · May 29, 2019, 6:02pm

Are you using the excellent

? It highlights type unstable method invocations.

FHell · May 29, 2019, 6:08pm

Hi foobar, the goal here is a package that allows people to specify the behaviour of nodes and edges and obtain something that can be passed to DifferentialEquations. Specifically the output is of type ODEFunction. This needs to be able to handle highly heterogeneous dynamics, with a different differential equation at every node/edge, so there is no way to avoid having an array of functions that are called on the variables.

The package already supports an arbitrary number of variables at each edge and vertex. I really need this level of generality or I would not have produced this high level of indirection in the code.

Further the behavior of the edge can be either given by an explicit calculation or by an ODE itself. There are two things for this.

The code contains no closures (unless the user supplies them), it’s all callable structs (as in the first call example). I have a type VertexFunction and a type EdgeFunction that have a field f! that is a plain old function and d.edges! is an Array{EdgeFunction}. It would be of course trivial to extract the functions and have an array of functions. In fact that’s how it was first, but I changed things around for readability and maintainability.

There is the beginning of an idea of documentation here: https://fhell.github.io/NetworkDynamics.jl/

I will get back to working on this second half of next week or so, and try to upload some examples. We just finished getting it to work reliably. The main application for this is to simulate power grid dynamics.

FHell · May 29, 2019, 6:19pm

I did use ProfileView, but found it quite hard to interpret the results. There simply is too much going on in that inner loop to get a clear picture that way. It just marks the whole function red, but all the calls to subarrays are actually not red. And the calls to the edges and vertex functions I couldn’t even find there.

@Everyone, thanks for the input. I will get working on fixing up the Types of everything that’s happening and see where that gets me.

Raf · May 30, 2019, 12:01am

You keep missing this most important point:

@code_warntype

Run it on everything and make sure it’s clean before doing any benchmarking.

FHell · May 30, 2019, 8:43am

Nah, I got it, but that part is fairly obvious how to implement. (Except possibly for the self-referential structure above.) I did not appreciate previously that it causes allocations. So I misinterpreted the allocations as coming from the views. I know there is a lot to do on getting this code fast, I’ll report back when I have made some progress.

Topic		Replies	Views
Array views becoming dominant source of memory allocation Performance	31	5171	November 14, 2018
Passing sub-array by reference / performance Performance	18	4728	April 10, 2019
Is there a way to guarantee 0 allocations when accessing an array? General Usage	18	4308	June 1, 2017
Getting data from and to arrays General Usage	17	1576	September 2, 2017
Why does the following setindex call allocate? Performance indexing , array	6	471	June 12, 2023

Switching out the underlying array in a SubArray

Related topics