This example in CUDA.jl does not work for me?

https://cuda.juliagpu.org/stable/api/kernel/#Element-access-and-broadcasting

using Test

using CUDA

a     = rand(Float16, (16, 16))
b     = rand(Float16, (16, 16))
c     = rand(Float32, (16, 16))

a_dev = CuArray(a)
b_dev = CuArray(b)
c_dev = CuArray(c)
d_dev = similar(c_dev)

function kernel(a_dev, b_dev, c_dev, d_dev)
    conf = WMMA.Config{16, 16, 16, Float32}

    a_frag = WMMA.load_a(pointer(a_dev), 16, WMMA.ColMajor, conf)
    b_frag = WMMA.load_b(pointer(b_dev), 16, WMMA.ColMajor, conf)
    c_frag = WMMA.load_c(pointer(c_dev), 16, WMMA.ColMajor, conf)

    c_frag = 0.5f0 .* c_frag

    d_frag = WMMA.mma(a_frag, b_frag, c_frag, conf)

    WMMA.store_d(pointer(d_dev), d_frag, 16, WMMA.ColMajor, conf)

    return
end

@cuda threads=32 kernel(a_dev, b_dev, c_dev, d_dev)
d = Array(d_dev)

@test all(isapprox.(a * b + 0.5 * c, d; rtol=0.01))

Produces the error:

ERROR: LLVM error: Cannot select: intrinsic %llvm.nvvm.wmma.m16n16k16.store.d.col.stride.f32

Any one knows the solution

Kind regards

Which Julia version are you using? My guess is that you likely need to upgrade to get a newer LLVM.

Works here on 1.8.

I gave it a shot on 1.8.0 yesterday. Will try again from a clean terminal and if it still does not work I will try updating to latest 1.8.x

Kind regards

Also make sure you’re using Julia as downloaded from the homepage or juliaup, other sources (e.g., your Linux distro) are not supported.

2 Likes

I’m getting the same error with Julia 1.8.3.

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 470.161.3, for CUDA 11.4
CUDA driver 11.4

Libraries:

  • CUBLAS: 11.10.1
  • CURAND: 10.2.10
  • CUFFT: 10.7.2
  • CUSOLVER: 11.3.5
  • CUSPARSE: 11.7.3
  • CUPTI: 17.0.0
  • NVML: 11.0.0+470.161.3
  • CUDNN: 8.30.2 (for CUDA 11.5.0)
  • CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:

  • Julia: 1.8.3
  • LLVM: 13.0.1
  • PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
  • Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
0: NVIDIA GeForce GTX 1070 (sm_61, 6.201 GiB / 7.921 GiB available)