I don’t think Flux uses mixed-precision, so probably no. It is possible to configure CUDA.jl to use tensor cores more eagerly, at the expense of some precision, by starting Julia with fast math enabled or by calling CUDA.math_mode!(CUDA.FAST_MATH)
, which will e.g. use TF32 when doing an F32xF32 matmul. Further speed-ups are possible by setting CUDA.jl’s math precision to :BFloat16
or even :Float16
. Ideally though, I guess Flux.jl would have an interface to use mixed-precision arithmetic.