Linear solve in mixed precision koan

I want to store a matrix in Float32 and do several solves with a Float64 right hand sides. In LINPACK, (and I think LAPACK) this just works. In Matlab (see below) it’s less clear. However in something bizarre happens with Julia in the solve phase. Here’s my story.

I’m reporting on lu!. qr! is similar with more allocations.

What is happening? Is there a way I can fix this (MKL)? Is there something I’m missing about LAPACK?

n=1000; A=rand(n,n); B=Float32.(A); x=rand(n); b=A*x;
AC=copy(A);
AF=lu!(A); BF=lu!(B);
y1=AF\b; y2=BF\b; y3=BF\Float32.(b);
[norm(y1-x), norm(y2-x), norm(y3-x)]./norm(x)
[norm(AC*y1-b), norm(AC*y2-b), norm(AC*y3-b)]./norm(b)
julia> n=1000; A=rand(n,n); B=Float32.(A); x=rand(n); b=A*x;
julia> AC=copy(A);
julia> AF=lu!(A); BF=lu!(B);
julia> y1=AF\b; y2=BF\b; y3=BF\Float32.(b);
julia> [norm(y1-x), norm(y2-x), norm(y3-x)]./norm(x)
3-element Array{Float64,1}:
 2.45032e-12
 4.97286e-05
 2.86294e-04
julia> [norm(AC*y1-b), norm(AC*y2-b), norm(AC*y3-b)]./norm(b)
3-element Array{Float64,1}:
 7.71294e-16
 6.27307e-08
 3.17710e-07

The residual and error norms look reasonable and it seems that
like y2 is more accurate. That is no surprise and that is the
case in all the testing I’ve done. I’d be happy, but something is
funny with the solve. I’d expect y2 and y3 to cost about the same.

But y3 is much faster than y2 …

julia> @btime $y1=$AF\$b;
  278.196 μs (1 allocation: 7.94 KiB)

julia> @btime $y2=$BF\$b;
  269.095 μs (1 allocation: 7.94 KiB)

julia> @btime $y3=$BF\Float32.($b);
  88.929 μs (2 allocations: 8.13 KiB)

Why?

Matlab gives consistent results and seems to get y3, even if I ask for y2.

>> n=1000; A=rand(n,n); B=single(A); x=rand(n,1); b=A*x;
>> [l,u]=lu(A); [ls,us]=lu(B);
>> tic; y1=u\(l\b); toc
Elapsed time is 0.009094 seconds.
>> tic; y2=us\(ls\b); toc
Elapsed time is 0.006940 seconds.
>> tic; y3=us\(ls\single(b)); toc
Elapsed time is 0.006864 seconds.
>> [norm(y2-y3), norm(y1-x), norm(y2-x)]/norm(x)
ans =
  1x3 single row vector
           0  5.6362e-13  2.7937e-04

As far as I can tell, neither LINPACK nor LAPACK support mixed-precision solves — you have to first promote to the same precision — so I’m not sure what you’re referring to here. Matlab probably promotes to double precision under the hood.

In Julia, if you ask it for a mixed-precision solve, maybe it falls back to a generic Julia routine for the backsubstitution — i.e. it actually does it in mixed precision, which is slower than calling the optimized BLAS/LAPACK routine?

2 Likes

Seems that matlab demotes to single before the solve. Matlab does it without bothering the user.

I Julia it seems I’ll have to do it myself with something ilke

T=eltype(BF);
T == Float64 ? (step .= -(BF \ b)) : (step .= -(BF \ T.(b)))

I’m doing that now and it puts a bandaid on the problem. I do not see why Julia can’t do what matlab does and handle the demotion on its own.

automatic demotion is a really weird (and bad) choice here. Generally if you have mixed precision, you should be giving results in the highest precision.

3 Likes

No. The whole idea is to store and factor the operator in single, solve equations with a double right hand side, and use the result. The application is Newton’s method, where a single precision step is fine for most problems if you compute the nonlinear residual in double.

Theory supports this and Matlab does exactly the right thing. I’m having to hack Julia, but it works ok.