I’ve been away from Julia for several months, and have been rerunning GPU code from this past summer on a new installation of Jula 1.5.3 and CUDA.jl (previously I used the separate CUDA packages). My subset sum program now runs much slower (3 1/3 sec compared to 159 ms):
julia> s = rand(1:10000000,10);
julia> S = (10000000*10)÷4;
julia> @btime subsetSumCuArrays($s, $S)
3.384 s (901 allocations: 7.47 GiB)
false
Clearly it’s the huge amount of memory being allocated (compared to 126.64 KiB previously). I can’t figure out why so much memory is being allocated. Has something changed perhaps with the @views
macro? Can anyone help?
Here’s the code:
function subsetSumCuArrays(s, S)
n = length(s)
F_d = CUDA.zeros(Int8, S+1, n)
s_d = CuArray{Int64,1}(s)
F_d[1,:] .= 1
s_d[1]≤ S && (F_d[s_d[1]+1,1] = 1)
@views for j in 2:n
F_d[2:S+1,j] .= F_d[2:S+1,j-1]
if(s_d[j] <= S)
F_d[s_d[j]+1:S+1,j] .= F_d[s_d[j]+1:S+1,j] .| F_d[1:S+1-s_d[j],j-1]
end
end
synchronize()
return Bool(F_d[S+1,n])
end
Here's my version info:
julia> CUDA.versioninfo()
CUDA toolkit 10.2.89, local installation
CUDA driver 10.2.0
NVIDIA driver 440.33.1
Libraries:
- CUBLAS: 10.2.2
- CURAND: 10.1.2
- CUFFT: 10.1.2
- CUSOLVER: 10.3.0
- CUSPARSE: 10.3.1
- CUPTI: 12.0.0
- NVML: 10.0.0+440.33.1
- CUDNN: 8.0.5 (for CUDA 10.2.0)
- CUTENSOR: 1.2.1 (for CUDA 10.2.0)
Toolchain:
- Julia: 1.5.3
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
Environment:
- JULIA_CUDA_USE_BINARYBUILDER: false
16 devices:
0: Tesla V100-SXM3-32GB (sm_70, 31.365 GiB / 31.749 GiB available)
...