Hi,

can anyone shed me a light on how minibatching in GeometricFlux works? Particularly I am interested in running minibatch on a set of graphs with different edge matrices.

Thanks for an answer,

Tomas

Hi,

can anyone shed me a light on how minibatching in GeometricFlux works? Particularly I am interested in running minibatch on a set of graphs with different edge matrices.

Thanks for an answer,

Tomas

Hi!

The mini-batch training on GPU is only supported for the graph with the same topology. It could be quite hard to batch up adjacency matrices with different graphs. Some effort is required to come out an approach to batch up adjacency matrices with different graphs. I am still currently considering about this problem. If there is any idea or paper, it’s welcome to inform me and I will try implement it.

Hi,

thanks for an answer. I have written a dirty fix, but it is not particularly nice. What I do is that I wrap the graph in a structure containing independent components. My assumption is that GNN operating over two components are independent, since all interaction is between vertices connected by edges. Then I have function reducing embedding of vertices, which understands the components.

Unfortunately, the construction of minibatch is a bit clunky, since I do not understand (read I was lazy / did not have time to) fully the format of SparseGraph and mapping to edges. So what I do I create the minibatch by concatenating `SimpleGraph`

and then make `FeaturedGraph`

out of it. Something along the following line, but note the below code is written for multigraphs, but you will got the idea behind.

```
struct MultiGraph{N,G,T<:AbstractMatrix,C}
graphs::NTuple{N,G}
vprops::T
components::C
end
function Base.reduce(::typeof(Base.cat), mgs::Vector{<:SimpleMultiGraph})
# TODO: consider to add some sane asserts
components = map(g -> g.components === nothing ? [1:size(g.vprops,2)] : g.components, mgs)
fadj = [map(g -> g.graphs[i].fadjlist, mgs) for i in 1:length(first(mgs).graphs)]
offset = 0
for (i, g) in enumerate(mgs)
components[i] = map(c -> c .+ offset, components[i])
for j in 1:length(fadj)
fadj[j][i] = map(c -> c .+ offset, fadj[j][i])
end
offset += size(g.vprops, 2)
end
components = reduce(vcat, components)
fadj = map(fa -> reduce(vcat, fa), fadj)
graphs = tuple(map(fa -> SimpleGraph(sum(length.(fa)), fa), fadj)...)
vprops = reduce(hcat, [g.vprops for g in mgs])
MultiGraph(graphs, vprops, components)
end
function meanmax(g::MultiGraph{<:Any,<:Any,<:Any,<:Vector{<:UnitRange}})
vprops = g.vprops
xx = map(g.components) do c
x = (@view vprops[:,c])
vcat(mean(x, dims = 2), maximum(x, dims = 2))
end
reduce(hcat, xx)
end
```

BTW how TorchGeometric does it? Does it allow to create minibatch also only for fixed graph?

It seems that stacking up adjacency matrices in a diagonal fashion could be a solusion [1].

I think I can take this approach. Thank you for pointing this out.

[1] Advanced Mini-Batching — pytorch_geometric documentation

Happy glad to help,

I have implemented this in past when I have been writing extension of our Mill.jl to work over graphs. The main nuisance was to have full feature parity, which meant writing `getobs`

.

If if would help, I can post the code and you can adapt it.

Tomas

Batching graphs (in the usual way of creating a large graph with disconnected components) has been supported in GraphNeuralNetworks.jl since the very beginning.

You can look at the example on the homepage Home · GraphNeuralNetworks.jl

where batching is done implicitly when iterating a `Dataloader`

created with the option `collate=true`

,

or you can do it explicitly with `g = MLUtils.batch([g1, g2, ...])`

.

It is also compatible with `MLUtils.numobs`

and `MLUtils.getobs`

:

```
julia> using GraphNeuralNetworks, MLUtils
julia> gs = [rand_graph(10, 20) for _=1:10]
10-element Vector{GNNGraph{Tuple{Vector{Int64}, Vector{Int64}, Nothing}}}:
GNNGraph(10, 20)
GNNGraph(10, 20)
GNNGraph(10, 20)
GNNGraph(10, 20)
GNNGraph(10, 20)
GNNGraph(10, 20)
GNNGraph(10, 20)
GNNGraph(10, 20)
GNNGraph(10, 20)
GNNGraph(10, 20)
julia> g = MLUtils.batch(gs)
GNNGraph:
num_nodes = 100
num_edges = 200
num_graphs = 10
julia> getobs(g, 1)
GNNGraph:
num_nodes = 10
num_edges = 20
julia> getobs(g, 1) == gs[1]
true
```

Thanks Carlo, I will take a look!!!

Tomas