In my case your Flux implementation takes around 7 mins per epoch with batchsize of 64, but my GPU might not be as fast as yours. It’s quite busy, at 100%. Tensorflow trains in 6 min per epoch or total?
Edit: are you using FP16 on RTX2070?