# How to represent a DAG?

What is the best way to represent a DAG (Directed Acyclic Graph) in Julia?

I need to have mutable nodes that know who their children and their parents are, so that I can propagate information up and down them. The use case is constraint propagation, which has the same basic structure as reverse-mode automatic differentiation.

What is the best structure for this? The calculations should be as fast as possible.

So far I have something like

``````mutable struct Node
value
left::Union{Void, Node}
right::Union{Void, Node}
parent
end
``````

Nodes can have values which are operations like `+`, variables like `x` or constants like `2`.

Obviously I would like to concretely type the nodes, but I donâ€™t see a way to do that.

Probably the answer is â€śuse Cassette.jlâ€ť, but I would like to try doing it by handâ€¦

1 Like

If you want it to be fast, definitely donâ€™t do the recursive thing. Lay out the `value`s in a flat vector (probably toposorted in the case of a DAG). Iâ€™d recommend storing the vertex/edge relationships separately as integer indices, again in vectors.

The real experts are probably the LightGraphs people; Iâ€™m sure they can help you out with this. In fact you could just use LightGraphs straight up (perhaps with MetaGraphs to store associated data).

Personally, I rolled my own in an effort to keep dependencies for my package to a minimum (which was more of an issue in the past, when LightGraphs was anything but light in terms of dependencies) and because of some missing features at the time. See https://github.com/JuliaRobotics/RigidBodyDynamics.jl/blob/master/src/graphs/directed_graph.jl and https://github.com/JuliaRobotics/RigidBodyDynamics.jl/blob/master/test/test_graph.jl. Itâ€™s reasonably fast, but I actually stopped using it in performance-sensitive code, partly to improve performance.

If I were to reimplement it right now, Iâ€™d do things slightly differently. I thought it would be nice to be able to really treat the data associated with vertices and edges as being identical to the vertices and edges, but that convenience has a performance cost associated with it. In my design for `DirectedGraph{V, E}`, the vertex type `V` and edge type `E` each store an integer index along with the data (your `value`). Relationships between edges and vertices are stored in types like `Vector{V}` and `Vector{Vector{E}}`, which are indexed by these integer vertex and edge indices. This makes it so that graph traversal requires doing a lot of lookups of the integer indices in the (in my case big, heap-allocated) `V`â€™s and `E`â€™s, which is terrible for cache locality. So Iâ€™d store the vertex/edge relationships using integer vertex indices and edge indices if I were to do a second pass. This would make it closer to `MetaGraphs.MetaDiGraph`, but instead of storing the data associated with edges and vertices in `Dict`s, Iâ€™d still use flat `Vector`s indexed by integer vertex index and edge index.

Sorry about the wall of text, I hope that made some sense.

10 Likes

If I remember correctly https://github.com/mlubin/ReverseDiffSparse.jl uses DAGs.

Thanks, that will take a while to process!

I forgot to ask the related question: how to extract the DAG form using Dataflow.jl:

ps://github.com/MikeInnes/DataFlow.jl/issues/14

Iâ€™d second the suggestion to see whether LightGraphs could help you here. The advantage is that if you have to do anything graph-like with the DAG, itâ€™s probably already in the package, and (reasonably) optimized.

If you have one value you need to store per node (or edge), then you might just use a `SimpleGraph` with an external vector mapping vertex to value (and the reverse as a `Dict` if you need it); otherwise, check out `MetaGraphs.jl`.

5 Likes

I am interested too. I am unable to get where the trade off is about this (I am not a CS guy). Why it would be better to have data related to nodes far away in memory from the nodes themselves? Letâ€™s assume I access all the data of a node at each node selection and then i pick another one, isnâ€™t it going to always happen more cache misses, by having the data at another address?
Is the trade off behind how cache is associative and how addresses are virtualized?

Is this related to the current julia status about supporting mutation of immutable types in mutable containers?

Why it would be better to have data related to nodes far away in memory from the nodes themselves?

There are really two primary reasons for this:

If youâ€™re not using the â€śdata related to nodesâ€ť (letâ€™s call this metadata for now) as part of your graph operations, then yes - as you suggest, cache locality can be improved in many cases by simplifying the data passed around within the operation. As an example, consider an algorithm that processes outneighbors of a given vertex. For `LightGraphs.SimpleGraph`, this is simply a (sorted) vector of `Int` (by default; the eltype can be as small as `UInt8`) which can fit into cache in all but the largest / most pathological cases, so when youâ€™re accessing the elements in the vector itâ€™s very quick. If youâ€™re using a larger data structure that canâ€™t fit into cache, you will take a performance hit.

The other reason is historical: that is, `LightGraphs` started with only one graph type, a `SimpleGraph` that could not store any metadata for nodes or edges at all. This approach results in very fast memory-efficient graph operations at the expense of flexibility. We justified this for the performance gains and the facts that 1) itâ€™s fairly easy to create external data structures to hold the metadata you need, and 2) metadata generally plays no part in the graph operations folks tend to use most often. With `MetaGraphs.jl`, weâ€™ve created a generic way to associate metadata with graphs while keeping the graph performance of `SimpleGraphs`. The current significant drawbacks are 1) increased complexity, 2) type instability in the metadata dictionaries, 3) increased memory usage, and 4) the likelihood of bugs due to 1). Thereâ€™s also been an attempt to interface with `JuliaDB` which may be of interest.

Is this related to the current julia status about supporting mutation of immutable types in mutable containers?

Not directly, as far as Iâ€™m aware. At least we havenâ€™t consciously worried about this issue when designing the graph structures.

4 Likes

OH, one other thing I forgot to mention (and I think it deserves its own post which is why I didnâ€™t edit my previous one): given how easy it is to construct your own graph types now in LightGraphs, you might consider doing just that if `MetaGraphs` is too unwieldy for you.