Hello everyone. I’m new to Flux and as an example to learn I’m just trying to reproduce an example TF model. So far it seems to be working but I have one major snag, the Flux training is orders of magnitude slower. For example, the TF training epochs are about 30s each and the Flux training epochs are about 13 minutes each.

Here is what I am trying to reproduce:

```
INPUT_SHAPE = [train_df.shape[1]] ## 1024
BATCH_SIZE = 5120
model = tf.keras.Sequential([
tf.keras.layers.BatchNormalization(input_shape=INPUT_SHAPE),
tf.keras.layers.Dense(units=512, activation='relu'),
tf.keras.layers.Dense(units=512, activation='relu'),
tf.keras.layers.Dense(units=512, activation='relu'),
tf.keras.layers.Dense(units=num_of_labels,activation='sigmoid') #num_of_labels = 1500
])
# Compile model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['binary_accuracy', tf.keras.metrics.AUC()],
)
history = model.fit(
train_df, labels_df,
batch_size=BATCH_SIZE,
epochs=5
)
```

My “translation” is :

```
INPUT_SHAPE = size(train_df)[2] #1024
BATCH_SIZE = 5120
model = Chain(
BatchNorm(INPUT_SHAPE),
Dense(1024=>512,relu),
Dense(512=>512,relu),
Dense(512=>512,relu),
Dense(512=>1500,sigmoid)
)
obs = Matrix(train_df) |> permutedims
labels = Matrix(labels_df) |> permutedims
loader = Flux.DataLoader((data = obs,label = labels) ,batchsize = BATCH_SIZE)
optim = Flux.setup(Flux.Adam(0.001, (0.9, 0.999), 1.0e-7), model)
for epoch in 1:5
println("epoch: $epoch")
@showprogress for(data,label) in loader
grads = Flux.gradient(model) do m
result = m(data)
Flux.Losses.binarycrossentropy(result,label)
end
Flux.update!(optim,model,grads[1])
end
end
```

As I mentioned, the TF training epochs run about 30s each and the Flux for ~13 minutes.

I feel like I have to be missing something simple. Any feedback you may have would be greatly appreciated.

Thanks