I’m using RTX 2070. It took me 6min in total. My Tensorflow output in full can be seen here https://pastebin.com/qa1Zgft3
If I use same GPU metrics as you have above I get following results
When I use Tensorflow my GPU metrics are as follows
I was given a hint that this might help me https://juliagpu.gitlab.io/CUDA.jl/development/profiling/#Application-profiling-1 . I haven’t had time to try it properly.