Julia version/CUDA compatibility with Quadro K4100 compute capbility of 3

wizebt · March 26, 2021, 10:38pm

Hi,
I’m really struggling to find a compatible combination of versions: nVidia TK, Julia, CUDA.jl to execute a simple linear shift and add (see fct below) to efficiently run on my old Quadro K4100m GPU with compute capability of 3.

julia> versioninfo()
Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4940MX CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, haswell)
Environment:
  JULIA_NUM_THREADS = 4

julia> CUDA.device()
CuDevice(0): Quadro K4100M

function saad(Sd::CuArray{T}, Id::CuArray{T}, d::Int) where {T<:UInt16}
    Sd[1+d:end] .+= Id[1:end-d]
end

Found TK 10.1 w driver 418 is the latest for compute capability 3
Currently running Julia 1.5.4 CUDA@v1.3.3
GPU runs 100% but perf are less than similar code using a single thread loop on CPU.

function saa(S::Array{T}, I::Array{T}, d::Int) where {T<:UInt16}
    n = length(S)
    for i=1+d:n
        @inbounds S[i] += I[i-d]
    end
end

I suspect broadcast and thread usage is not optimal on GPU ?
So far cannot find any compatibility doc to combine Julia and CUDA version to optimally use this old GPU.

Thanks for your guidance
The following are my test data

using BenchmarkTools
n = 60 * 10^6
S = zeros(UInt16, n)
Sd = CuArray(S)
I = rand(UInt16.(1:9), n)
Id = CuArray(I)
d = 1
iter = 720

maleadt · March 29, 2021, 5:49am

You’re broadcasting a much too simple operation, and the GPU needs some arithmetic complexity to hide the latency of memory operations.

Topic		Replies	Views
Correct version of cuda.jl for Quadro K4200 GPU	5	728	November 2, 2021
Recommendations for using GPU on old hardware GPU	3	676	March 2, 2022
CUDA on Julia 1.7 giving wrong results GPU	5	978	January 3, 2022
Is it time to migrate to CUDA.jl? GPU	10	1477	August 19, 2020
CUDA.jl version compatible with CUDA driver 10.1 GPU	3	125	November 11, 2024

Julia version/CUDA compatibility with Quadro K4100 compute capbility of 3

Related topics