Performance regression with GPUArrays subset sum

eaubanel · November 26, 2020, 5:48pm

I marked all the scalar bits as you suggested, and still get very slow timing and huge memory allocation:

julia> CUDA.allowscalar(false)
julia> @btime subsetSumCuArrays($s, $S)
  2.762 s (898 allocations: 7.31 GiB)
false

Here’s how I modified the function:

function subsetSumCuArrays(s, S)
    n = length(s)
    F_d = CUDA.zeros(Int8, S+1, n)
    s_d = CuArray{Int64,1}(s)
    F_d[1,:] .= 1
    CUDA.@allowscalar(s_d[1]≤ S && (F_d[s_d[1]+1,1] = 1))
    @views for j in 2:n
      F_d[2:S+1,j] .=  F_d[2:S+1,j-1]
      if(CUDA.@allowscalar(s_d[j] <= S))
        F_d[CUDA.@allowscalar(s_d[j]+1):S+1,j] .= F_d[CUDA.@allowscalar(s_d[j]+1):S+1,j] .| F_d[1:CUDA.@allowscalar(S+1-s_d[j]),j-1]
      end
    end
    synchronize()
    return Bool(CUDA.@allowscalar(F_d[S+1,n]))
end

Topic		Replies	Views
CuArray local scope memory issue GPU	4	308	January 4, 2023
Some CUDA functions suddenly become very slow New to Julia	3	194	July 14, 2024
Bug in CUDA, CuArray, or something I just don't know? GPU	3	267	December 25, 2022
How to avoid memory allocation while doing sum on a GPU? General Usage cuda , memory-allocation , cudajl	7	124	April 20, 2025
Thousands of matrix multiplications using CuArray GPU	5	1224	July 11, 2019

Performance regression with GPUArrays subset sum

Related topics