Speed Comparison Python v Julia for custom layers

willleeney · July 24, 2023, 2:48pm

@gdalle @datnamer @bertschi tl;dr: used your advice but haven’t found a speed up yet. Best Julia: 131.717 μs, Best python: 151.624μs (My bad reported wrong in OG post).

@gdalle

below is implementation for Julia without random number generator
I actually am interested in backwards pass but I wanted to know what the forward passes clocked in at first

@datnamer

I have used the torch.compile() on my model but this weirdly didn’t improve performance
151.6243975μs (w/o compiled)
609.6357μs (w/ compiled)

@bertschi Thanks for the info. I’ve changed the struct to take in type parameters.

Here’s a test for a single Flux Dense layer vs just multiplying the arrays. Dense layer wins.

Wq = Flux.glorot_uniform(n_in, n_neighbours)
Wq_dense = Flux.Dense(n_in => n_neighbours, identity; bias=false, init=Flux.glorot_uniform)
xi = rand(Float32, n_in, 1)
xii = rand(Float32, 1, n_in)

@btime Wq_dense(xi) : 5.329 μs (2 allocations: 288 bytes)
@btime xii * Wq : 6.703 μs (1 allocation: 144 bytes)

For the whole layer, difference is still small.
xj = rand(Float32, n_in, n_neighbours)
xjj = rand(Float32, n_neighbours, n_in)
@btime (Wv_dense(xj)) * (Wk_dense(xj) * Wq_dense(xi)): 128.563 μs (8 allocations: 5.80 KiB)
@btime (xii * Wq) * (xjj * Wk) * (xjj * Wv): 136.162 μs (5 allocations: 3.02 KiB)

For the whole implementation my current implementation w/o the dense layers is quicker…
@btime test_function() : 131.717 μs (14 allocations: 5.42 KiB)
@btime test_function_dense() : 133.867 μs (18 allocations: 10.58 KiB)

using Flux
using Statistics
using LinearAlgebra
using GraphNeuralNetworks
using BenchmarkTools

struct CustomLayer{L} <: GNNLayer
    Wq::L
    Wk::L
    Wv::L
    dk::Float64
    σ::typeof(softmax)

end

Flux.@functor CustomLayer

function CustomLayer(n_in::Int, n_out::Int, n_neighbors::Int)
    Wq = Flux.glorot_uniform(n_in, n_neighbors)
    Wk = Flux.glorot_uniform(n_in, n_neighbors)
    Wv = Flux.glorot_uniform(n_in, n_out)
    dk = √n_in
    σ = softmax
    CustomLayer(Wq, Wk, Wv, dk, σ)
end

function (m::CustomLayer)(xi, xj)
    return transpose(m.σ(((xi * m.Wq) * transpose(xj * m.Wk)) ./ m.dk, dims=2) * (xj * m.Wv))
    
end


struct CustomDense{Q} <: GNNLayer
    Wq::Q
    Wk::Q
    Wv::Q
    dk::Float64
    σ::typeof(softmax)

end

Flux.@functor CustomDense

function CustomDense(n_in::Int, n_out::Int, n_neighbours::Int)
    Wq = Flux.Dense(n_in => n_neighbours, identity; bias=false, init=Flux.glorot_uniform)
    Wk = Flux.Dense(n_in => n_neighbours, identity; bias=false, init=Flux.glorot_uniform)
    Wv = Flux.Dense(n_in => n_out, identity; bias=false, init=Flux.glorot_uniform)
    dk = √n_in
    σ = softmax
    CustomDense(Wq, Wk, Wv, dk, σ)
end

function (m::CustomDense)(xi, xj)
    return m.σ(m.Wv(xj) * m.Wk(xj) ./ m.dk, dims=2) * m.Wq(xi)
end


n_in = 1000
n_out = 10
n_neighbours = 20

myLayer = CustomLayer(n_in, n_out, n_neighbours)
xi = rand(Float32, 1, n_in)
xj = rand(Float32, n_neighbours, n_in)
myDense = CustomDense(n_in, n_out, n_neighbours)

function test_function()
    out = myLayer(xi, xj)
    return
end

function test_function_dense()
    out = myDense(xi', xj')
    return
end

Topic		Replies	Views
Julia Performance - Help Needed Performance question , python	40	3145	September 17, 2021
Flux vs pytorch cpu performance Machine Learning first-steps , flux	59	9652	October 2, 2020
Flux multi-cpu parallelism? New to Julia question , flux , zygote	9	3016	November 21, 2020
NeuralOperators.jl Performance (compared with Python) Performance	8	837	August 20, 2024
Performance comparison - Flux.jl's Adam vs Jax's Adam Performance question , package , performance , flux	33	4095	November 3, 2022

Speed Comparison Python v Julia for custom layers

Related topics