Hi,

I’m new to Julia and testing it out for ML, hoping to use it for RL which is my main area of work. I noticed that Flux seems to be running much slower than I would have expected. I created a simple example of how I’m using Flux. I recreated the same thing in python with pytorch(1.6) and it ran ~ 8 times faster. I’m not sure what I’m doing wrong. Any pointers would be much appreciated.

I’m using Julia v1.5, Flux v0.11.1, Python v3.7, pytorch v1.6, Ubuntu 18.04 no GPU

Flux example:

```
using Flux
using Flux: params, update!
using Dates: now
using Statistics: mean
model = Chain(
Dense(10, 128, tanh),
Dense(128, 128, tanh),
Dense(128, 1)
)
opt = ADAM(3e-4)
p = params(model)
x = rand(Float32, 10, 2000)
y = rand(Float32, 1, 2000)
function loss(x, y)
ŷ = model(x)
mean(y .- ŷ).^2
end
for j = 1:10
st = now()
for i = 1:10
g = gradient(() -> loss(x, y), p)
update!(opt, p, g)
end
println(now() - st)
end
```

python using pytorch example

```
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import time
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.l1 = nn.Linear(10, 128)
self.l2 = nn.Linear(128, 128)
self.l3 = nn.Linear(128, 1)
def forward(self, x):
x = F.tanh(self.l1(x))
x = F.tanh(self.l2(x))
x = self.l3(x)
return x
model = Net()
loss_f = nn.MSELoss()
opt = optim.Adam(model.parameters(), lr=3e-4)
x = torch.rand(2000, 10)
y = torch.rand(2000, 1)
for j in range(10):
st = time.time()
for i in range(10):
opt.zero_grad()
y_hat = model(x)
loss = loss_f(y_hat, y)
loss.backward()
opt.step()
print(f"{(time.time() - st) * 1000:.0f} milliseconds")
```