Without a bias in the nn.linear() python is faster (without torch.compile). Python: 87.9 μs Julia: 128.3 μs
torch.compile
Python: 87.9 μs
Julia: 128.3 μs