I am trying to calculate mean of some values. These values are of type Float16. When performing mean operation, I see the value to be Inf16. I tried to assign the result to a variable of type float64 as well, but it didn’t work out and I still see the type of the final mean as Float16. Can you suggest how to handle the result going out of the range of float16 here?
you need to convert before the calculation happens; convert the result wont work because the ‘true value’ is already lost.
Or, if you do not want to convert the whole array to Float64
, and you do not mind losing some performance of sum
, you can just define your “own sum” that uses a Float64 as accumulator:
julia> a = rand(Float16, 1000);
julia> r = sum(a)
Float16(496.8)
julia> r2 = foldl(+, a; init = zero(Float64))
497.798828125
sum(Float64, a)
also works.
Oh, ok. I did not know that sum
could take a function as first parameter, it makes sense.
If the smaller type is always promoted before addition then the sum(Float64, a)
seems the way to go.
I think there’s some misunderstanding here. Variables don’t have a type, so you cannot assign anything to a “variable of type Float64”. If you assign a value of type Float16 to a variable, that will not change the type.
It is possible to assign a type to a variable, in which case any assignment to that variable converts the right-hand side to that type.
It looks like using sum
as a higher-order function allocates in this case. If performance matters, a variant of this approach would be to use reduce
, taking special care of initializing the accumulator to a Float64
zero. This does not allocate and should be faster for not-too-large arrays:
julia> x = rand(Float16, 10_000);
# standard use of sum, everything in Float16
julia> @btime sum($x)
198.709 μs (0 allocations: 0 bytes)
Float16(4.972e3)
# using sum as a higher-order function to convert each element to Float64
julia> s1(x) = sum(Float64, x)
julia> @btime s1($x)
930.281 μs (29999 allocations: 468.73 KiB)
4976.8818359375
# using reduce with the accumulator initialized to a Float64 zero
julia> s2(x) = reduce(+, x, init=0.)
julia> @btime s2($x)
32.444 μs (0 allocations: 0 bytes)
4976.8818359375
It seems like this is a inference bug of some kind. If I pass a function instead of the Float64
constructor, I get no allocations:
julia> s1(x) = sum(y -> Float64(y), x)
s1 (generic function with 1 method)
julia> @btime s1($x);
41.182 μs (0 allocations: 0 bytes)
Filed as julia#36783.