New to Julia and looking at comparisons of speed in forward pass with pytorch. I want to optimise performance of training a neural network that will contain custom layers. I have provide some code for an example. I want to know how fast I can speed up my Julia code. Currently these two implementations clock in at – Python: 146 μs – Julia: 34.363 μs – which suggests that Julia is much faster. Please can someone help me to fairly compare the two languages for speed. Is it fairer to compare backward pass as well? I am primarily interested in speed for inference and secondly memory overhead, which is why I didn’t include any optimisation of the network in the examples. Also would these results change as the layers become sparser or as the size of the layers scale?
using Flux using Statistics using LinearAlgebra using GraphNeuralNetworks struct CustomLayer <: GNNLayer Wq::AbstractMatrix Wk::AbstractMatrix Wv::AbstractMatrix dk::Float64 σ::typeof(softmax) end Flux.@functor CustomLayer function CustomLayer(n_in::Int, n_out::Int, n_neighbors::Int) Wq = Flux.glorot_uniform(n_in, n_neighbors) Wk = Flux.glorot_uniform(n_in, n_neighbors) Wv = Flux.glorot_uniform(n_in, n_out) dk = √n_in σ = softmax CustomLayer(Wq, Wk, Wv, dk, σ) end function (m::CustomLayer)(xi, xj) return transpose(m.σ(((xi * m.Wq) * transpose(xj * m.Wk)) ./ m.dk, dims=2) * (xj * m.Wv)) end myLayer = CustomLayer(1000, 10, 20) function test_function() xi = rand(Float32, 1, 1000) xj = rand(Float32, 10, 1000) out = myLayer(xi, xj) return end @btime test_function()
import torch import torch.nn as nn import timeit import numpy as np class CustomLayer(nn.Module): def __init__(self, n_in, n_out, n_neighbors): super().__init__() self.Wq = nn.Linear(n_in, n_neighbors) self.Wk = nn.Linear(n_in, n_neighbors) self.Wv = nn.Linear(n_in, n_out) self.dk = np.sqrt(n_in) self.softmax = nn.Softmax(dim=-1) def forward(self, xi, xj): attention = self.softmax(self.Wq(xi) @ self.Wk(xj).T / self.dk) return attention @ self.Wv(xj) myLayer = CustomLayer(1000, 10, 20) def test_function(): xi = torch.rand(1, 1000) xj = torch.rand(10, 1000) out = myLayer(xi, xj) return number_exc = 10000 print(timeit.timeit("test_function()", setup="from __main__ import test_function", number=number_exc) / number_exc)