Error "ERROR_NO_BINARY_FOR_GPU" with CUDAnative.jl/CuArrays.jl

Hi,
I’m trying to install CuArrays.jl. Even though installation finishes without any errors, the tests are failing with the following error message:

  CUDA error: no kernel image is available for execution on the device (code #209, ERROR_NO_BINARY_FOR_GPU)

Same error occurs when I try to test CUDAnative.jl. However CUDAdrv.jl passes all tests.
I even tried rebuilding,

(v1.1) pkg> build CuArrays
  Building CUDAdrv ───→ `~/.julia/packages/CUDAdrv/3cR2F/deps/build.log`
  Building LLVM ──────→ `~/.julia/packages/LLVM/tg8MX/deps/build.log`
  Building CUDAnative → `~/.julia/packages/CUDAnative/wU0tS/deps/build.log`
  Building Conda ─────→ `~/.julia/packages/Conda/CpuvI/deps/build.log`
  Building FFTW ──────→ `~/.julia/packages/FFTW/p7sLQ/deps/build.log`
  Building CuArrays ──→ `~/.julia/packages/CuArrays/PwSdF/deps/build.log`

but again the same error occurs during test.

System information is:

Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)

Package status:

(v1.1) pkg> st
    Status `~/.julia/environments/v1.1/Project.toml`
  [3a865a2d] CuArrays v1.0.2
  [438e738f] PyCall v1.91.2

Result of nvidia-smi from terminal:

Sun May 12 09:13:54 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87                 Driver Version: 390.87                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 610      Off  | 00000000:02:00.0 N/A |                  N/A |
| 56%   70C    P0    N/A /  N/A |    721MiB /  1985MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 00000000:03:00.0 Off |                  N/A |
| 36%   77C    P8    22W / 250W |    117MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Which test fails? Please reduce that, if possible, and file an issue with some info on your device and set-up (model, compute capability, CUDA version, etc).

Probably some feature is being tested although it isn’t supported by your device. We currently don’t have a way to dispatch on that.

Lots of tests failed. I believe this is some kind of configuration issue.
Below is the test summary.

Test Summary:                                   | Pass  Fail  Error  Total
CuArrays                                        |  706    12    917   1635
  GPUArrays test suite                          |  526          366    892
    construction                                |  382            4    386
      constructors + similar                    |  180                 180
      comparison against Array                  |  119                 119
      conversion                                |   72                  72
      value constructors                        |   11            1     12
      iterator constructors                     |                 3      3
    parallel execution interface                |                 1      1
    indexing                                    |   72           10     82
      Indexing with Float32                     |   32            2     34
      multi dim, sliced setindex                |                 1      1
      Indexing with Int32                       |   32            2     34
      multi dim, sliced setindex                |                 1      1
      Indexing with Float32                     |    1            1      2
      Indexing with Int32                       |    1            1      2
      issue #42 with Float32                    |    3                   3
      issue #42 with Int32                      |    3                   3
      Colon() Float32                           |                 1      1
      Colon() Int32                             |                 1      1
    input/output                                |    5                   5
    base functionality                          |    4           16     20
      copyto!                                   |                 1      1
      vcat + hcat                               |                 5      5
      reinterpret                               |    2                   2
      ntuple test                               |                 1      1
      cartesian iteration                       |                 1      1
      Custom kernel from Julia function         |                 1      1
      map                                       |                 3      3
      repeat                                    |                 4      4
      heuristics                                |    2                   2
    mapreduce                                   |   51          162    213
      Float32                                   |    8           26     34
        mapreducedim                            |                18     18
        sum maximum minimum prod                |    8            8     16
      Float64                                   |    8           26     34
        mapreducedim                            |                18     18
        sum maximum minimum prod                |    8            8     16
      Int32                                     |    8           26     34
        mapreducedim                            |                18     18
        sum maximum minimum prod                |    8            8     16
      Int64                                     |    8           26     34
        mapreducedim                            |                18     18
        sum maximum minimum prod                |    8            8     16
      Complex{Float32}                          |    4           22     26
        mapreducedim                            |                18     18
        sum maximum minimum prod                |    4            4      8
      Complex{Float64}                          |    4           22     26
        mapreducedim                            |                18     18
        sum maximum minimum prod                |    4            4      8
      any all ==                                |    9           12     21
      isapprox                                  |    2            2      4
    broadcast                                   |    2          105    107
      broadcast Float32                         |    1           20     21
        RefValue                                |                 1      1
        Tuple                                   |                 1      1
        Adjoint and Transpose                   |                 1      1
      broadcast Float64                         |                21     21
        RefValue                                |                 1      1
        Tuple                                   |                 1      1
        Adjoint and Transpose                   |                 1      1
      broadcast Int32                           |    1           15     16
        RefValue                                |                 1      1
        Tuple                                   |                 1      1
        Adjoint and Transpose                   |                 1      1
      broadcast Int64                           |                16     16
        RefValue                                |                 1      1
        Tuple                                   |                 1      1
        Adjoint and Transpose                   |                 1      1
      broadcast Complex{Float32}                |                16     16
        RefValue                                |                 1      1
        Tuple                                   |                 1      1
        Adjoint and Transpose                   |                 1      1
      broadcast Complex{Float64}                |                16     16
        RefValue                                |                 1      1
        Tuple                                   |                 1      1
        Adjoint and Transpose                   |                 1      1
      vec 3                                     |                 1      1
    linear algebra                              |   10            4     14
      transpose                                 |    2                   2
      permutedims                               |                 4      4
      issymmetric/ishermitian                   |    8                   8
    FFT with ND = 1                             |                 4      4
    FFT with ND = 2                             |                 4      4
    FFT with ND = 3                             |                 4      4
    BLAS                                        |                51     51
      matmul with element type Float32          |                12     12
      matmul with element type Float64          |                12     12
      matmul with element type Complex{Float32} |                12     12
      matmul with element type Complex{Float64} |                12     12
      rmul! Complex{Float32}                    |                 1      1
      rmul! Float32                             |                 1      1
      gbmv                                      |                 1      1
    Random                                      |                 1      1
      rand                                      |                 1      1
  Memory                                        |    5                   5
  Array                                         |   20            2     22
  Adapt                                         |    1            1      2
  Broadcast                                     |                10     10
  Cufunc                                        |    3            3      6
  Ref Broadcast                                 |                 1      1
  Broadcast Fix                                 |                 4      4
  Reduce                                        |    2            4      6
  0D                                            |                 1      1
  Slices                                        |   15            2     17
  Reshape                                       |    1                   1
  LinearAlgebra.triu! with diagonal -2          |                 1      1
  LinearAlgebra.triu! with diagonal -1          |                 1      1
  LinearAlgebra.triu! with diagonal 0           |                 1      1
  LinearAlgebra.triu! with diagonal 1           |                 1      1
  LinearAlgebra.triu! with diagonal 2           |                 1      1
  LinearAlgebra.tril! with diagonal -2          |                 1      1
  LinearAlgebra.tril! with diagonal -1          |                 1      1
  LinearAlgebra.tril! with diagonal 0           |                 1      1
  LinearAlgebra.tril! with diagonal 1           |                 1      1
  LinearAlgebra.tril! with diagonal 2           |                 1      1
  Utilities                                     |    2                   2
  accumulate                                    |    1            7      8
  logical indexing                              |                15     15
  generic fallbacks                             |                 1      1
  CUDNN                                         |   59           11     70
    NNlib                                       |   59            8     67
    Activations and Other Ops                   |                 3      3
  CUBLAS                                        |    4            1      5
  CUSPARSE                                      |   66     8    359    433
    util                                        |    7            1      8
    char                                        |   15                  15
    conversion                                  |    8           52     60
      elty = Float32                            |    2           13     15
        make_csc                                |    2                   2
        make_csr                                |                 1      1
        convert_r2c                             |                 1      1
        convert_r2b                             |                 1      1
        convert_c2b                             |                 1      1
        convert_c2h                             |                 1      1
        convert_r2h                             |                 1      1
        convert_d2h                             |                 1      1
        convert_d2b                             |                 1      1
        convert_c2r                             |                 1      1
        convert_r2d                             |                 1      1
        convert_c2d                             |                 1      1
        convert_d2c                             |                 1      1
        convert_d2r                             |                 1      1
      elty = Float64                            |    2           13     15
        make_csc                                |    2                   2
        make_csr                                |                 1      1
        convert_r2c                             |                 1      1
        convert_r2b                             |                 1      1
        convert_c2b                             |                 1      1
        convert_c2h                             |                 1      1
        convert_r2h                             |                 1      1
        convert_d2h                             |                 1      1
        convert_d2b                             |                 1      1
        convert_c2r                             |                 1      1
        convert_r2d                             |                 1      1
        convert_c2d                             |                 1      1
        convert_d2c                             |                 1      1
        convert_d2r                             |                 1      1
      elty = Complex{Float32}                   |    2           13     15
        make_csc                                |    2                   2
        make_csr                                |                 1      1
        convert_r2c                             |                 1      1
        convert_r2b                             |                 1      1
        convert_c2b                             |                 1      1
        convert_c2h                             |                 1      1
        convert_r2h                             |                 1      1
        convert_d2h                             |                 1      1
        convert_d2b                             |                 1      1
        convert_c2r                             |                 1      1
        convert_r2d                             |                 1      1
        convert_c2d                             |                 1      1
        convert_d2c                             |                 1      1
        convert_d2r                             |                 1      1
      elty = Complex{Float64}                   |    2           13     15
        make_csc                                |    2                   2
        make_csr                                |                 1      1
        convert_r2c                             |                 1      1
        convert_r2b                             |                 1      1
        convert_c2b                             |                 1      1
        convert_c2h                             |                 1      1
        convert_r2h                             |                 1      1
        convert_d2h                             |                 1      1
        convert_d2b                             |                 1      1
        convert_c2r                             |                 1      1
        convert_r2d                             |                 1      1
        convert_c2d                             |                 1      1
        convert_d2c                             |                 1      1
        convert_d2r                             |                 1      1
    bsric02                                     |                 8      8
      elty = Float32                            |                 2      2
        bsric02!                                |                 1      1
        bsric02                                 |                 1      1
      elty = Float64                            |                 2      2
        bsric02!                                |                 1      1
        bsric02                                 |                 1      1
      elty = Complex{Float32}                   |                 2      2
        bsric02!                                |                 1      1
        bsric02                                 |                 1      1
      elty = Complex{Float64}                   |                 2      2
        bsric02!                                |                 1      1
        bsric02                                 |                 1      1
    bsrilu02                                    |                 8      8
      elty = Float32                            |                 2      2
        bsrilu02!                               |                 1      1
        bsrilu02                                |                 1      1
      elty = Float64                            |                 2      2
        bsrilu02!                               |                 1      1
        bsrilu02                                |                 1      1
      elty = Complex{Float32}                   |                 2      2
        bsrilu02!                               |                 1      1
        bsrilu02                                |                 1      1
      elty = Complex{Float64}                   |                 2      2
        bsrilu02!                               |                 1      1
        bsrilu02                                |                 1      1
    bsrsm2                                      |                 8      8
      elty = Float32                            |                 2      2
        bsrsm2!                                 |                 1      1
        bsrsm2                                  |                 1      1
      elty = Float64                            |                 2      2
        bsrsm2!                                 |                 1      1
        bsrsm2                                  |                 1      1
      elty = Complex{Float32}                   |                 2      2
        bsrsm2!                                 |                 1      1
        bsrsm2                                  |                 1      1
      elty = Complex{Float64}                   |                 2      2
        bsrsm2!                                 |                 1      1
        bsrsm2                                  |                 1      1
    bsrsv2                                      |                 8      8
      elty = Float32                            |                 2      2
        bsrsv2!                                 |                 1      1
        bsrsv2                                  |                 1      1
      elty = Float64                            |                 2      2
        bsrsv2!                                 |                 1      1
        bsrsv2                                  |                 1      1
      elty = Complex{Float32}                   |                 2      2
        bsrsv2!                                 |                 1      1
        bsrsv2                                  |                 1      1
      elty = Complex{Float64}                   |                 2      2
        bsrsv2!                                 |                 1      1
        bsrsv2                                  |                 1      1
    ilu0                                        |                 8      8
      elty = Float32                            |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
      elty = Float64                            |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
      elty = Complex{Float32}                   |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
      elty = Complex{Float64}                   |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
    ilu02                                       |                 8      8
      elty = Float32                            |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
      elty = Float64                            |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
      elty = Complex{Float32}                   |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
      elty = Complex{Float64}                   |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
    ic0                                         |                 8      8
      elty = Float32                            |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
      elty = Float64                            |                 2      2
        csr                                     |                 1      1
        csc                                     |                 1      1
      elty = Complex{Float32}                   |                 2      2
        csr                                     |                 1      1
.........................
skipped due to discourse limits.
........................ 
      csreigs                                   |                 1      1
      csrlsqvqr!                                |                 1      1
ERROR: LoadError: Some tests did not pass: 706 passed, 12 failed, 917 errored, 0 broken.
in expression starting at /home/kach/.julia/packages/CuArrays/PwSdF/test/runtests.jl:17
ERROR: Package CuArrays errored during testing

Not necessarily. CUDA errors are sticky, so the same one keeps popping up. If it also fails with CUDAnative, please post that test output / try to figure out which test is the first to fail.

Hi @maleadt
The complete test log for CUDAnative.jl is available at https://gist.github.com/v-i-s-h/26b95fffa232d3c3bf80784b57a1b40a

All the errors are due to the same condition:
CUDA error: no kernel image is available for execution on the device (code #209, ERROR_NO_BINARY_FOR_GPU)

1 Like