Batching in Geometric Flux

Tomas_Pevny · November 11, 2022, 2:10pm

Hi,

can anyone shed me a light on how minibatching in GeometricFlux works? Particularly I am interested in running minibatch on a set of graphs with different edge matrices.

Thanks for an answer,
Tomas

yuehhua · November 14, 2022, 3:10pm

Hi!

The mini-batch training on GPU is only supported for the graph with the same topology. It could be quite hard to batch up adjacency matrices with different graphs. Some effort is required to come out an approach to batch up adjacency matrices with different graphs. I am still currently considering about this problem. If there is any idea or paper, it’s welcome to inform me and I will try implement it.

Tomas_Pevny · November 15, 2022, 8:04pm

Hi,

thanks for an answer. I have written a dirty fix, but it is not particularly nice. What I do is that I wrap the graph in a structure containing independent components. My assumption is that GNN operating over two components are independent, since all interaction is between vertices connected by edges. Then I have function reducing embedding of vertices, which understands the components.

Unfortunately, the construction of minibatch is a bit clunky, since I do not understand (read I was lazy / did not have time to) fully the format of SparseGraph and mapping to edges. So what I do I create the minibatch by concatenating SimpleGraph and then make FeaturedGraph out of it. Something along the following line, but note the below code is written for multigraphs, but you will got the idea behind.

struct MultiGraph{N,G,T<:AbstractMatrix,C}
	graphs::NTuple{N,G}
	vprops::T
	components::C
end

function Base.reduce(::typeof(Base.cat), mgs::Vector{<:SimpleMultiGraph})
	# TODO: consider to add some sane asserts
	components = map(g -> g.components === nothing ? [1:size(g.vprops,2)] : g.components, mgs)
	fadj = [map(g -> g.graphs[i].fadjlist, mgs) for i in 1:length(first(mgs).graphs)]
	offset = 0
	for (i, g) in enumerate(mgs)
		components[i] = map(c -> c .+ offset, components[i])
		for j in 1:length(fadj)
			fadj[j][i] = map(c -> c .+ offset, fadj[j][i])
		end
		offset += size(g.vprops, 2)
	end
	components = reduce(vcat, components)
	fadj = map(fa -> reduce(vcat, fa), fadj)
	graphs = tuple(map(fa -> SimpleGraph(sum(length.(fa)), fa), fadj)...)
	vprops = reduce(hcat, [g.vprops for g in mgs])
	MultiGraph(graphs, vprops, components)
end

function meanmax(g::MultiGraph{<:Any,<:Any,<:Any,<:Vector{<:UnitRange}})
	vprops = g.vprops
	xx = map(g.components) do c 
		x = (@view vprops[:,c])
		vcat(mean(x, dims = 2), maximum(x, dims = 2))
	end
	reduce(hcat, xx)
end

BTW how TorchGeometric does it? Does it allow to create minibatch also only for fixed graph?

yuehhua · November 17, 2022, 2:19am

It seems that stacking up adjacency matrices in a diagonal fashion could be a solusion [1].
I think I can take this approach. Thank you for pointing this out.

[1] Advanced Mini-Batching — pytorch_geometric documentation

Tomas_Pevny · November 17, 2022, 7:22am

Happy glad to help,

I have implemented this in past when I have been writing extension of our Mill.jl to work over graphs. The main nuisance was to have full feature parity, which meant writing getobs.

If if would help, I can post the code and you can adapt it.

Tomas

CarloLucibello · November 17, 2022, 8:35am

Batching graphs (in the usual way of creating a large graph with disconnected components) has been supported in GraphNeuralNetworks.jl since the very beginning.

You can look at the example on the homepage Home · GraphNeuralNetworks.jl
where batching is done implicitly when iterating a Dataloader created with the option collate=true,
or you can do it explicitly with g = MLUtils.batch([g1, g2, ...]).

It is also compatible with MLUtils.numobs and MLUtils.getobs:

julia> using GraphNeuralNetworks, MLUtils

julia> gs = [rand_graph(10, 20) for _=1:10]
10-element Vector{GNNGraph{Tuple{Vector{Int64}, Vector{Int64}, Nothing}}}:
 GNNGraph(10, 20)
 GNNGraph(10, 20)
 GNNGraph(10, 20)
 GNNGraph(10, 20)
 GNNGraph(10, 20)
 GNNGraph(10, 20)
 GNNGraph(10, 20)
 GNNGraph(10, 20)
 GNNGraph(10, 20)
 GNNGraph(10, 20)

julia> g = MLUtils.batch(gs)
GNNGraph:
    num_nodes = 100
    num_edges = 200
    num_graphs = 10

julia> getobs(g, 1)
GNNGraph:
    num_nodes = 10
    num_edges = 20

julia> getobs(g, 1) == gs[1]
true

Tomas_Pevny · November 17, 2022, 11:03am

Thanks Carlo, I will take a look!!!
Tomas

Topic		Replies	Views
Minibatches of graphs in GeometricFlux.jl Machine Learning flux , geometricflux	2	578	October 22, 2021
[ANN] GraphNets.jl - Simple, blazing fast, message-passing graph neural network Package Announcements machine-learning	11	1725	September 19, 2023
Crystal Graphs w/ GeometricFlux.jl Machine Learning geometricflux	0	156	February 6, 2024
GeometricFlux: GCNConv but with different graphs Machine Learning lightgraphs , flux	12	1185	October 2, 2020
[ANN] GraphNeuralNetworks.jl Package Announcements flux , machine-learning , graphs	0	863	November 8, 2021

Batching in Geometric Flux

Related topics