@gdalle @datnamer @bertschi tl;dr: used your advice but haven’t found a speed up yet. Best Julia: 131.717 μs, Best python: 151.624μs (My bad reported wrong in OG post).
- below is implementation for Julia without random number generator
- I actually am interested in backwards pass but I wanted to know what the forward passes clocked in at first
- I have used the torch.compile() on my model but this weirdly didn’t improve performance
- 151.6243975μs (w/o compiled)
- 609.6357μs (w/ compiled)
@bertschi Thanks for the info. I’ve changed the struct to take in type parameters.
Here’s a test for a single Flux Dense layer vs just multiplying the arrays. Dense layer wins.
Wq = Flux.glorot_uniform(n_in, n_neighbours)
Wq_dense = Flux.Dense(n_in => n_neighbours, identity; bias=false, init=Flux.glorot_uniform)
xi = rand(Float32, n_in, 1)
xii = rand(Float32, 1, n_in)
@btime Wq_dense(xi) : 5.329 μs (2 allocations: 288 bytes)
@btime xii * Wq : 6.703 μs (1 allocation: 144 bytes)
For the whole layer, difference is still small.
xj = rand(Float32, n_in, n_neighbours)
xjj = rand(Float32, n_neighbours, n_in)
@btime (Wv_dense(xj)) * (Wk_dense(xj) * Wq_dense(xi)): 128.563 μs (8 allocations: 5.80 KiB)
@btime (xii * Wq) * (xjj * Wk) * (xjj * Wv): 136.162 μs (5 allocations: 3.02 KiB)
For the whole implementation my current implementation w/o the dense layers is quicker…
@btime test_function() : 131.717 μs (14 allocations: 5.42 KiB)
@btime test_function_dense() : 133.867 μs (18 allocations: 10.58 KiB)
using Flux
using Statistics
using LinearAlgebra
using GraphNeuralNetworks
using BenchmarkTools
struct CustomLayer{L} <: GNNLayer
Wq::L
Wk::L
Wv::L
dk::Float64
σ::typeof(softmax)
end
Flux.@functor CustomLayer
function CustomLayer(n_in::Int, n_out::Int, n_neighbors::Int)
Wq = Flux.glorot_uniform(n_in, n_neighbors)
Wk = Flux.glorot_uniform(n_in, n_neighbors)
Wv = Flux.glorot_uniform(n_in, n_out)
dk = √n_in
σ = softmax
CustomLayer(Wq, Wk, Wv, dk, σ)
end
function (m::CustomLayer)(xi, xj)
return transpose(m.σ(((xi * m.Wq) * transpose(xj * m.Wk)) ./ m.dk, dims=2) * (xj * m.Wv))
end
struct CustomDense{Q} <: GNNLayer
Wq::Q
Wk::Q
Wv::Q
dk::Float64
σ::typeof(softmax)
end
Flux.@functor CustomDense
function CustomDense(n_in::Int, n_out::Int, n_neighbours::Int)
Wq = Flux.Dense(n_in => n_neighbours, identity; bias=false, init=Flux.glorot_uniform)
Wk = Flux.Dense(n_in => n_neighbours, identity; bias=false, init=Flux.glorot_uniform)
Wv = Flux.Dense(n_in => n_out, identity; bias=false, init=Flux.glorot_uniform)
dk = √n_in
σ = softmax
CustomDense(Wq, Wk, Wv, dk, σ)
end
function (m::CustomDense)(xi, xj)
return m.σ(m.Wv(xj) * m.Wk(xj) ./ m.dk, dims=2) * m.Wq(xi)
end
n_in = 1000
n_out = 10
n_neighbours = 20
myLayer = CustomLayer(n_in, n_out, n_neighbours)
xi = rand(Float32, 1, n_in)
xj = rand(Float32, n_neighbours, n_in)
myDense = CustomDense(n_in, n_out, n_neighbours)
function test_function()
out = myLayer(xi, xj)
return
end
function test_function_dense()
out = myDense(xi', xj')
return
end