Status of BFloat16

Tomas_Pevny · August 23, 2023, 11:42am

Hello,

I have tried Llama2 large language model in Julia following https://github.com/chengchingwen/Transformers.jl/blob/master/example/Llama2_example.ipynb. This works really nice and smoothly, but, the example uses Float32. To save memory, I wanted to use it with Float16, since the model card of llama2 says it that torch_dtype is Float16 When I try that, the model starts to halucinate, so I guess that something overflow / underflows. I wanted to give a try to BFloat16, since they can better handle large differences in magnitude. Does anyone has an experience with BFloats and CUDA? Is there some bf16 equivalent of f16?

I have tried this repository

but I am not sure, how relevant it is.

Thanks for answers in advance.
Tomas

ctkelley · August 23, 2023, 11:52am

That repo is a software implementation, so I suspect it will be slow. If it works as advertised, you should be able to get going for small problems to see if it solves your problems. There is Bfloat16 support in hardware out there (Apple M* seems to have it somewhere, maybe in the neural engine) but software support could be hard to come by.

Tomas_Pevny · August 23, 2023, 11:55am

I would expect the software support to suck. But cuda has HW support, therefore I was hoping it would be possible to use it with CUDA.jl

maleadt · August 23, 2023, 3:05pm

CUDA.jl already support BFloat16s for some common API functions, like gemm, gemv, etc. Native kernel support for BFloat16 depends on Julia properly supporting the type, i.e., not through BFloat16s.jl’ emulation. Keep an eye on Add support for BFloat16 · Issue #41075 · JuliaLang/julia · GitHub for the status of that.

Tomas_Pevny · August 24, 2023, 6:57am

Thanks, I will

Topic		Replies	Views
Float16 with AMDGPU GPU	10	251	August 30, 2024
LLama2-7b difference in inference when between Float16 and Float32 Machine Learning	1	3042	September 13, 2023
Does float16 run natively on a compatible CPU? General Usage	14	583	July 11, 2024
CUDAnative support for Float16 GPU question	5	1358	November 15, 2018
Massive performance penalty for Float16 compared to Float32 Performance performance	17	8083	June 20, 2022

Status of BFloat16

Related topics