I was searching for an answer in the Flux.jl, CUDA.jl, cuDNN.jl, but only found [https://juliagpu.org/2020-10-02-cuda_2.0/#low--and-mixed-precision-operations] which talks about independent GPU operations using CUDA.jl.

I have not found particular information about Flux.jl exploiting this technology.

I don’t think Flux uses mixed-precision, so probably no. It is possible to configure CUDA.jl to use tensor cores more eagerly, at the expense of some precision, by starting Julia with fast math enabled or by calling `CUDA.math_mode!(CUDA.FAST_MATH)`

, which will e.g. use TF32 when doing an F32xF32 matmul. Further speed-ups are possible by setting CUDA.jl’s math precision to `:BFloat16`

or even `:Float16`

. Ideally though, I guess Flux.jl would have an interface to use mixed-precision arithmetic.