I’m writing a GPU-version of gesv!, using CuArrays.jl.
The following works:
using CuArrays, LinearAlgebra, Test
function gpugesv!(A,b)
A, ipiv = CuArrays.CUSOLVER.getrf!(A)
CuArrays.CUSOLVER.getrs!('N',A,ipiv,b)
return nothing
end
###
A = rand(32^2,32^2); b = rand(32^2);
A_d = CuArray(A); b_d = CuArray(b);
LAPACK.gesv!(A,b);
gpugesv!(A_d,b_d)
A_d = Array(A_d); b_d = Array(b_d);
@test isapprox(A_d,A) && isapprox(b_d,b)
###
I’ll have to do this computation repeatedly for different A and b (of size 128^2), but I’m not sure how to clean up the GPU after each evaluation of gpugesv!. The CPU becomes stuck at 100% load after several iterations.
MWE that shows the problematic behavior? I tried putting it a loop but don’t see extremely high GC overhead
using CuArrays, LinearAlgebra, Test
function gpugesv!(A,b)
A, ipiv = CuArrays.CUSOLVER.getrf!(A)
CuArrays.CUSOLVER.getrs!('N',A,ipiv,b)
return
end
function main(;N=32^2, i=25)
CuArrays.pool_timings!()
CuArrays.@time for _ in 1:i
A = rand(N, N)
b = rand(N)
A_d = CuArray(A)
b_d = CuArray(b)
LAPACK.gesv!(A,b)
gpugesv!(A_d, b_d)
@test Array(A_d) ≈ A && Array(b_d) ≈ b
end
CuArrays.pool_timings()
end
main()
Ah, maybe you’re running into the “cost” of syncing the GPU. Try wrapping your GPU code (eg. the call to gpugesv!) into CuArrays.@sync. That will synchronize the GPU, after which a download (ie. a call to Array(x::CuArray)) will be “free”.
It’s fine to put the @sync on the call to gpugesv!.
However, if your session really freezes doing main(N=32^2,i=3), there’s something else going on. Could you attach gdb and inspect where the process hangs?
Attaching to gdb finally allows me to interrupt the call to main(N=32^2,i=4).
I manage to complete the loop when I set i=3 in main, although inconsistently.
I don’t know what to make of the backtrace.
Not sure what you’re trying to show with that backtrace, it just points to the SIGINT handler after having pressed CTRL-C (as expected) and in a case where the execution just finishes… My idea was to attach gdb when the process was frozen and see where it hangs, since I can’t reproduce a hang with any problem size / iteration count.
Sorry,
Here’s the backtrace during the hanging main-function.
(gdb) bt
#0 0x00007ffd5cb7b7c2 in clock_gettime ()
#1 0x00002b6c584b993d in clock_gettime () from /usr/lib64/libc.so.6
#2 0x00002b6c88f3be5e in ?? () from /usr/lib64/nvidia/libcuda.so
#3 0x00002b6c88fc9a05 in ?? () from /usr/lib64/nvidia/libcuda.so
#4 0x00002b6c88fe813b in ?? () from /usr/lib64/nvidia/libcuda.so
#5 0x00002b6c88f1e01d in ?? () from /usr/lib64/nvidia/libcuda.so
#6 0x00002b6c88e419ba in ?? () from /usr/lib64/nvidia/libcuda.so
#7 0x00002b6c88e44f8a in ?? () from /usr/lib64/nvidia/libcuda.so
#8 0x00002b6c88f7e265 in cuMemcpyDtoH_v2 () from /usr/lib64/nvidia/libcuda.so
#9 0x00002b6c83734b2e in ?? ()
#10 0x0000000000000002 in ?? ()
#11 0x00002b6c842d2a20 in ?? ()
#12 0x0000000000000000 in ?? ()