My view is that lu does the right thing. If you want to compute in Float16, the language should let you. I think this is likely an unintended consequence of that line. It would
seem that
@inline choltype(A) = typeof(sqrt(oneunit(eltype(A)))
would make more sense.
However, I’m not clear that it would not break something else.
I can make a guess for the reason, which makes this a bug. In the cholesky.jl code,
# The dispatch structure in the cholesky, and cholesky! methods is a bit
# complicated and some explanation is therefore provided in the following
#
# In the methods below, LAPACK is called when possible, i.e. StridedMatrices with Float32,
# Float64, ComplexF32, and ComplexF64 element types.
My guess it that choltype was originally intended to ensure at least Float32 for the LAPACK branch, but is accidentally used even for the types that don’t get sent to LAPACK. In fact, later in the same comments,
# FixMe? The dispatch below seems overly complicated. One simplification could be to
# merge the two Cholesky types into one.
So I believe the correct fix would be to keep both yours and the original version (e.g. by naming choltypeforlapack or by dispatch). As you say, it’s not clear if your fix would break something else, but I would label the current non-LAPACK behavior a bug.
I think there are two issues at play. First thing is that the LAPACK implementation that julia uses doesn’t have float16 routines which is probably why it has to get promoted, the other is that julia currently has a compiler pass to promote Float16 computations to Float32.
One thing that could be done is to truncate the result after doing the LAPACK routine so the user gets a Float16 back.
I think in a previous thread you opened I gave more details Half precision
Strange. Everything works as I would expect with lu. In particular, lu! takes a long time in Float16 because Float16 computations are done in software.
However, cholesky seems to do all of its work in Float32, so the matrix gets promoted on the way in before LAPACK. You can see that the timings are much closer than for lu. That promotion makes some sense given LAPACK’s limits. Why is lu different? Somehow I can’t get cholesky! to work right with @btime.
julia> A=rand(Float32,1000,1000); A ./= opnorm(A);
julia> B=I + Hermitian(A*A); B16=Float16.(B);
julia> @btime cholesky($B);
3.025 ms (4 allocations: 3.81 MiB)
julia> @btime cholesky($B16);
3.386 ms (4 allocations: 3.81 MiB)
The type promotion to at least Float32 is indeed to make use of LAPACK. Converting back down to Float16 was implemented in https://github.com/JuliaLang/julia/pull/41352 and is going to be available in v1.8.
That’s good news. But @ctkelley note that this still promotes and computes in Float32 and then converts the output down to Float16. It still promotes in choltype, whether for LAPACK or generic. In this sense it’s still different from lu. It does ensures the right type is returned, but if what you want is a true Float16 emulation, the precision here may actually be a bit higher.
The main reason for using Float16 and such is performance.
Converting it to Float32, paying the run time tax and then going back is the worse.
Something like not having the cake and not eating it.
If it is an intermediate phase as the current implementations (LAPACK) doesn’t support Float16 it is OK. But the end goal should be that using Float16 must be faster than Float32 and might be less accurate.
Is there a plan for native Linear Algebra wide support for Float16?
You forget one important element for performance - Throughput.
While the execution ports of most CPU’s doesn’t support Float16 natively, for most Linear Algebra operation the issue isn’t the rate of execution but the memory bandwidth which is effectively multiplied (Unless in memory in Julia array of Float16 isn’t dense).
But what I wrote was the end goal. We’re also during intermediate phase in CPU’s as well.
But what I wrote would have happened on CPU’s with Float16 support, so it holds.