Linear solve in mixed precision koan

ctkelley · November 4, 2020, 7:00pm

I want to store a matrix in Float32 and do several solves with a Float64 right hand sides. In LINPACK, (and I think LAPACK) this just works. In Matlab (see below) it’s less clear. However in something bizarre happens with Julia in the solve phase. Here’s my story.

I’m reporting on lu!. qr! is similar with more allocations.

What is happening? Is there a way I can fix this (MKL)? Is there something I’m missing about LAPACK?

n=1000; A=rand(n,n); B=Float32.(A); x=rand(n); b=A*x;
AC=copy(A);
AF=lu!(A); BF=lu!(B);
y1=AF\b; y2=BF\b; y3=BF\Float32.(b);
[norm(y1-x), norm(y2-x), norm(y3-x)]./norm(x)
[norm(AC*y1-b), norm(AC*y2-b), norm(AC*y3-b)]./norm(b)
julia> n=1000; A=rand(n,n); B=Float32.(A); x=rand(n); b=A*x;
julia> AC=copy(A);
julia> AF=lu!(A); BF=lu!(B);
julia> y1=AF\b; y2=BF\b; y3=BF\Float32.(b);
julia> [norm(y1-x), norm(y2-x), norm(y3-x)]./norm(x)
3-element Array{Float64,1}:
 2.45032e-12
 4.97286e-05
 2.86294e-04
julia> [norm(AC*y1-b), norm(AC*y2-b), norm(AC*y3-b)]./norm(b)
3-element Array{Float64,1}:
 7.71294e-16
 6.27307e-08
 3.17710e-07

The residual and error norms look reasonable and it seems that
like y2 is more accurate. That is no surprise and that is the
case in all the testing I’ve done. I’d be happy, but something is
funny with the solve. I’d expect y2 and y3 to cost about the same.

But y3 is much faster than y2 …

julia> @btime $y1=$AF\$b;
  278.196 μs (1 allocation: 7.94 KiB)

julia> @btime $y2=$BF\$b;
  269.095 μs (1 allocation: 7.94 KiB)

julia> @btime $y3=$BF\Float32.($b);
  88.929 μs (2 allocations: 8.13 KiB)

Why?

Matlab gives consistent results and seems to get y3, even if I ask for y2.

>> n=1000; A=rand(n,n); B=single(A); x=rand(n,1); b=A*x;
>> [l,u]=lu(A); [ls,us]=lu(B);
>> tic; y1=u\(l\b); toc
Elapsed time is 0.009094 seconds.
>> tic; y2=us\(ls\b); toc
Elapsed time is 0.006940 seconds.
>> tic; y3=us\(ls\single(b)); toc
Elapsed time is 0.006864 seconds.
>> [norm(y2-y3), norm(y1-x), norm(y2-x)]/norm(x)
ans =
  1x3 single row vector
           0  5.6362e-13  2.7937e-04

stevengj · November 4, 2020, 11:45pm

As far as I can tell, neither LINPACK nor LAPACK support mixed-precision solves — you have to first promote to the same precision — so I’m not sure what you’re referring to here. Matlab probably promotes to double precision under the hood.

In Julia, if you ask it for a mixed-precision solve, maybe it falls back to a generic Julia routine for the backsubstitution — i.e. it actually does it in mixed precision, which is slower than calling the optimized BLAS/LAPACK routine?

ctkelley · November 4, 2020, 11:59pm

Seems that matlab demotes to single before the solve. Matlab does it without bothering the user.

I Julia it seems I’ll have to do it myself with something ilke

T=eltype(BF);
T == Float64 ? (step .= -(BF \ b)) : (step .= -(BF \ T.(b)))

I’m doing that now and it puts a bandaid on the problem. I do not see why Julia can’t do what matlab does and handle the demotion on its own.

Oscar_Smith · November 5, 2020, 12:05am

automatic demotion is a really weird (and bad) choice here. Generally if you have mixed precision, you should be giving results in the highest precision.

ctkelley · November 5, 2020, 12:50am

No. The whole idea is to store and factor the operator in single, solve equations with a double right hand side, and use the result. The application is Newton’s method, where a single precision step is fine for most problems if you compute the nonlinear residual in double.

Theory supports this and Matlab does exactly the right thing. I’m having to hack Julia, but it works ok.

Topic		Replies	Views
Linear solver \(A, B) performance vs Matlab A\b General Usage	32	7842	May 21, 2017
Worse performance of LU-decomposition for Float32 than for Float64 Performance linearalgebra	7	481	November 5, 2020
Bigfloat lufact Numerics linearalgebra	8	1459	August 2, 2017
Factorize changes the float type from Float32 to Float64 Numerics	3	389	June 22, 2021
Drastic performance hit matrix multiply different types. Internal cast julia vs numpy? Numerics	15	2523	November 4, 2018

Linear solve in mixed precision koan

Related topics