This example in CUDA.jl does not work for me?

Ahmed_Salih · February 8, 2023, 6:57pm

https://cuda.juliagpu.org/stable/api/kernel/#Element-access-and-broadcasting

using Test

using CUDA

a     = rand(Float16, (16, 16))
b     = rand(Float16, (16, 16))
c     = rand(Float32, (16, 16))

a_dev = CuArray(a)
b_dev = CuArray(b)
c_dev = CuArray(c)
d_dev = similar(c_dev)

function kernel(a_dev, b_dev, c_dev, d_dev)
    conf = WMMA.Config{16, 16, 16, Float32}

    a_frag = WMMA.load_a(pointer(a_dev), 16, WMMA.ColMajor, conf)
    b_frag = WMMA.load_b(pointer(b_dev), 16, WMMA.ColMajor, conf)
    c_frag = WMMA.load_c(pointer(c_dev), 16, WMMA.ColMajor, conf)

    c_frag = 0.5f0 .* c_frag

    d_frag = WMMA.mma(a_frag, b_frag, c_frag, conf)

    WMMA.store_d(pointer(d_dev), d_frag, 16, WMMA.ColMajor, conf)

    return
end

@cuda threads=32 kernel(a_dev, b_dev, c_dev, d_dev)
d = Array(d_dev)

@test all(isapprox.(a * b + 0.5 * c, d; rtol=0.01))

Produces the error:

ERROR: LLVM error: Cannot select: intrinsic %llvm.nvvm.wmma.m16n16k16.store.d.col.stride.f32

Any one knows the solution

Kind regards

carstenbauer · February 9, 2023, 6:05am

Which Julia version are you using? My guess is that you likely need to upgrade to get a newer LLVM.

maleadt · February 9, 2023, 8:26am

Works here on 1.8.

Ahmed_Salih · February 9, 2023, 9:06am

I gave it a shot on 1.8.0 yesterday. Will try again from a clean terminal and if it still does not work I will try updating to latest 1.8.x

Kind regards

maleadt · February 9, 2023, 3:56pm

Also make sure you’re using Julia as downloaded from the homepage or juliaup, other sources (e.g., your Linux distro) are not supported.

Phil_Tomson · February 17, 2023, 11:23pm

I’m getting the same error with Julia 1.8.3.

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 470.161.3, for CUDA 11.4
CUDA driver 11.4

Libraries:

CUBLAS: 11.10.1

CURAND: 10.2.10

CUFFT: 10.7.2

CUSOLVER: 11.3.5

CUSPARSE: 11.7.3

CUPTI: 17.0.0

NVML: 11.0.0+470.161.3

CUDNN: 8.30.2 (for CUDA 11.5.0)

CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:

Julia: 1.8.3

LLVM: 13.0.1

PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2

Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
0: NVIDIA GeForce GTX 1070 (sm_61, 6.201 GiB / 7.921 GiB available)

Topic		Replies	Views
Power of a given value into CUDA.jl kernel GPU question , cuda	2	734	September 5, 2020
LLVM crash when running Flux and CuArray examples in julia 0.7 GPU cudanative , bug , debugging , flux	13	1625	August 21, 2018
Question about CUDA kernels GPU question	4	587	February 10, 2023
Julia 1.7 & CUDA on Windows 11 GPU question	6	1470	December 6, 2021
DualNumbers example doesn't work on GPU New to Julia gpu	2	412	April 22, 2021

This example in CUDA.jl does not work for me?

Related topics