I think Julia doesn’t use GPU efficiently. Bit like it spend more time moving data around than calculating it.
Edit: I did a test run with Tensorflow and got result:
Epoch 10/10 time=5.79 mins: step 7800 total loss=0.8476 loss=0.4077 reg loss=0.4400 accuracy=0.7989
Batch size was 64. Much faster than Flux.