LinearAlgebra./ breaks CuArray

Hi,
I have made a minimal reproducible example on the errors reported in my previous posts.
It seems to me that the A / b operation breaks sonething in the CuArray, which is obscure to the user.
Mysterious as it is, the code can run without errors in the “VSCode + Julia extension” mode.

(base) dabajabaza@XXXX:~$ cat jianguoyun/Nutstore/RigorousCoupledWaveAnalysis.jl-master/examples/test.ma2018.jl 
using LinearAlgebra
using CUDA
b = CUDA.rand(ComplexF64,5,5)
A = CUDA.rand(ComplexF64,5,5)
w = CUDA.rand(ComplexF64,5)
#V = A \ b
V = A / b
V * w
(base) dabajabaza@XXXX:~$ julia jianguoyun/Nutstore/RigorousCoupledWaveAnalysis.jl-master/examples/test.ma2018.jl 
ERROR: LoadError: CUBLASError: an invalid value was used as an argument (code 7, CUBLAS_STATUS_INVALID_VALUE)
Stacktrace:
 [1] throw_api_error(res::CUDA.CUBLAS.cublasStatus_t)
   @ CUDA.CUBLAS ~/.julia/packages/CUDA/DfvRa/lib/cublas/error.jl:50
 [2] macro expansion
   @ ~/.julia/packages/CUDA/DfvRa/lib/cublas/error.jl:63 [inlined]
 [3] cublasZgemv_v2(handle::Ptr{Nothing}, trans::Char, m::Int64, n::Int64, alpha::Bool, A::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, lda::Int64, x::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, incx::Int64, beta::Bool, y::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, incy::Int64)
   @ CUDA.CUBLAS ~/.julia/packages/CUDA/DfvRa/lib/utils/call.jl:26
 [4] gemv!
   @ ~/.julia/packages/CUDA/DfvRa/lib/cublas/wrappers.jl:331 [inlined]
 [5] gemv_dispatch!(Y::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, A::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, B::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, alpha::Bool, beta::Bool)
   @ CUDA.CUBLAS ~/.julia/packages/CUDA/DfvRa/lib/cublas/linalg.jl:179
 [6] mul!
   @ ~/.julia/packages/CUDA/DfvRa/lib/cublas/linalg.jl:188 [inlined]
 [7] mul!
   @ ~/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:275 [inlined]
 [8] *(A::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, x::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer})
   @ LinearAlgebra ~/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:51
 [9] top-level scope
   @ ~/jianguoyun/Nutstore/RigorousCoupledWaveAnalysis.jl-master/examples/test.ma2018.jl:8
in expression starting at /home/dabajabaza/jianguoyun/Nutstore/RigorousCoupledWaveAnalysis.jl-master/examples/test.ma2018.jl:8
(base) dabajabaza@XXXX:~$ 
(base) dabajabaza@XXXX:~$ julia --version
julia version 1.7.2
(base) dabajabaza@XXXX:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
(base) dabajabaza@XXXX:~$ 
(base) dabajabaza@XXXX:~$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.2 (2022-02-06)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using CUDA
julia> CUDA.version
version (generic function with 2 methods)

julia> CUDA.version()
v"11.5.0"

julia> 

@maleadt Do you have any suggestions?

just guess … maybe it is related to the stream-ordered allocations feature.

The following line has been involked many times as I profiling some code in a project (working inside VSCode environment). There was no issues with the A/b operations. But running the code from command line produces the issue, and I have then minimized it to the example in the original post

julia> @code_typed (A/b)
CodeInfo(
1 ── %1  = Base.getfield(A, :dims)::Tuple{Int64, Int64}
│    %2  = Base.getfield(%1, 2, true)::Int64
│    %3  = Base.getfield(B, :dims)::Tuple{Int64, Int64}
│    %4  = Base.getfield(%3, 2, true)::Int64
│    %5  = (%2 === %4)::Bool
│    %6  = Base.not_int(%5)::Bool
└───       goto #3 if not %6
2 ── %8  = LinearAlgebra.DimensionMismatch("Both inputs should have the same number of columns")::Any
│          LinearAlgebra.throw(%8)::Union{}
└───       unreachable
3 ── %11 = %new(Adjoint{ComplexF64, CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}}, B)::Adjoint{ComplexF64, CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}}
│    %12 = %new(Adjoint{ComplexF64, CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}}, A)::Adjoint{ComplexF64, CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}}
└───       goto #5 if not false
4 ──       nothing::Nothing
5 ┄─       goto #6
6 ──       goto #7
7 ──       goto #8
8 ──       goto #9
9 ──       goto #10
10 ─       goto #12 if not false
11 ─       nothing::Nothing
12 ┄       goto #13
13 ─       goto #14
14 ─       goto #15
15 ─       goto #16
16 ─       goto #17
17 ─       goto #18
18 ─       goto #19
19 ─       goto #20
20 ─ %30 = %new(CUDA.CUSOLVER.var"#2765#2766"{ComplexF64})::CUDA.CUSOLVER.var"#2765#2766"{ComplexF64}
│    %31 = invoke %30(%11::Adjoint{ComplexF64, CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}})::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}
│    %32 = invoke %30(%12::Adjoint{ComplexF64, CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}})::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}
└───       goto #21
21 ─ %34 = CUDA.CUSOLVER.getrf!::typeof(CUDA.CUSOLVER.getrf!)
│    %35 = invoke %34(%31::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer})::Tuple{CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}, Int32}
│    %36 = Base.getfield(%35, 1)::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}
│    %37 = Base.getfield(%35, 2)::CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}
│    %38 = CUDA.CUSOLVER.getrs!::typeof(CUDA.CUSOLVER.getrs!)
│    %39 = invoke %38('N'::Char, %36::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}, %37::CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}, %32::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer})::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}
└───       goto #22
22 ─ %41 = %new(Adjoint{ComplexF64, CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}}, %39)::Adjoint{ComplexF64, CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}}
│    %42 = invoke LinearAlgebra.copy(%41::Adjoint{ComplexF64, CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}})::CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}
└───       return %42
) => CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}

I guess the problem may be related to line 22 %41 = ... and %42 = ...

solved

It is my fault to install incompatible versions of CUBLAS…
My apology to everyone …