Hi all, sorry if this is a weird newbie question but I’ll go ahead and ask in hopes to learn something. I recently followed the advice in this github issues page to build Julia to not escalate Float16 to Float32 for arithmetic on the A64FX and am trying out different functions to see what the performance is like between the different precisions.
The computation I am interested in right now is a linear solve, and so I was looking at the following using different n
for T
is either Float64
or Float16
a = rand(T, n, n);
b = rand(T, n);
@code_llvm c = a\b
I realize that Julia uses OpenBLAS as a backend for a lot of its operations, so for Float16
it would need to use something else. In the output for @code_llvm c = a\b
for Float16
I see that it uses j_#generic_lufact!
and for Float64
it uses j_getrf!
and I’m guessing that because of this Float64
performs faster than Float16
(I’m only interested in performance right now not necessarily accuracy). This leads me to the questions that I have. If Float64
were to use j_#generic_lufact!
as well, would I potentially see that Float16
performs faster on the A64FX that has half precision arithmetic? How would I go about telling Julia to not use the BLAS backend for Float64
and to instead use the same generic LU factorization to test this?
Thanks in advance for indulging me and my weird question, and I would appreciate any insights.