LLama2-7b difference in inference when between Float16 and Float32

My first thought was are you sure it’s not bfloat16? It seems not but Float16 (either format), on Julia rounds for each operation losing accuracy, thus accumulates error.

Are you running the model on the GPU? It might be that those do all operations with a larger accumulator. I’m not sure a CPU has that capability, unless you cast to Float32 or Float64, and you would likely need to do it explicitly.

The Llama2 models were trained using bfloat16, but the original inference uses float16. The checkpoints uploaded on the hub use torch_dtype = ‘float16’which will be used by theAutoModelAPI to cast the checkpoints fromtorch.float32totorch.float16`.

The dtype of the online weights is mostly irrelevant, unless you are using torch_dtype=“auto” when initializing a model using model = AutoModelForCausalLM.from_pretrained(“path”, torch_dtype = “auto”). The reason is that the model will first be downloaded ( using the dtype of the checkpoints online) then it will be casted to the default dtype of torch (becomes torch.float32) and finally, if there is a torch_dtype provided in the config, it will be used.

Training the model in float16 is not recommended and known to produce nan, as such the model should be trained in bfloat16.

I couldn’t confirm, since that link doesn’t work, also it was trained in a GPU most likley, thus not really in 16 bits only. Maybe it’s just natural you can’t use Float16, at least on CPUs. Besides it’s very much slower, only though of as a storage format.

Note in case helpful to you:

C++ has (but not C) since C++23 bfloat16 i.e. std::bfloat16_t (also std::float16_t, C has it):
https://en.cppreference.com/w/cpp/types/floating-point

@Oscar_Smith Maybe Julia should add bloat16, to catch up with C++ future… though a package is as good (a different argument can be made for standardized languages and their stdlibs), maybe no need to have in the Julia non-standard, rather excise Float16…?

1 Like