Nvidia TF32 format

johnh · May 15, 2020, 10:08am

I guess we have all see they hype avbout the A100. Is this going to be useful for Julia GPU?

ImreSamu · May 15, 2020, 10:59am

imho
in the early time - the BFloat16 format will be more supported

based on this news:
BFloat16 Support About To Land Within LLVM (13 May 2020)
“Arm has been pushing along the BFloat16 support for LLVM with ARMv8.6-A supporting the new format. But this BFloat16 LLVM support is also relevant ultimately for Intel AVX-512 BF16, Intel Nervana, Google Cloud TPUs, and other hardware coming out with BF16 support to bolster their machine learning capabilities.”

arch.d.robison · May 16, 2020, 5:44pm

Disclosure - I work for Nvidia (specifically TensorRT) and have written unit tests that can distinguish whether TF32 kicked in or not.

The advantage of TF32 is that the format is the same as FP32. When computing inner products with TF32, the input operands have their mantissas rounded from 23 bits to 10 bits. The rounded operands are multiplied exactly, and accumulated in normal FP32.

The big advantage of TF32 is that compiler support is required only at the deepest levels, i.e. inside the Cuda compiler. The rest of code just sees FP32 with less precision, but the same dynamic range. Big linear operations are usually done via libraries anyway, e.g. the BLAS sgemm. So exploiting TF32 will largely be a matter of tweaking callers of these libraries to indicate whether TF32 is okay. E.g., perhaps use it for the initial iterations of a linear solver, and then use slower FP32 to polish the results.

Formats such as FP16 and BFloat16 are more work, since they involve different bit layouts. We still encourage programmers to put in effort into using those formats, since they reduce memory bandwidth and consequently permit even faster execution. TF32 exists as something that can be quickly plugged in to exploit Tensor Core speed without much work.

Topic		Replies	Views
NVIDIA Tensor Cores not useful for double-precision simulations? GPU	12	5706	November 19, 2020
Bug? Using Flux & getting Float32 response on 64 bit Ubuntu OS Machine Learning	4	1242	March 4, 2019
typeof(Int32(1)/Int32(2)) == Float64 @_@ General Usage	16	1927	February 5, 2018
BFloat16 on ARM for neural networks Machine Learning	1	611	August 30, 2019
How to get started with GPU programming? OpenCL or CUDA? GPU	7	7309	August 29, 2017

Nvidia TF32 format

Related topics