Hi all.
I’m performing the following computation using GPUArrays
:
julia> using GPUArrays
julia> src = CuArray(rand(0:9, 10))
10-element CuArray{Int64, 1, CUDA.DeviceMemory}:
3
4
3
7
3
9
7
2
9
6
julia> dst = similar(src);
julia> op = (acc, x) -> acc + (1 - ((x >> 0) & 0x1))
#211 (generic function with 1 method)
julia> GPUArrays.neutral_element(::typeof(op), T) = one(T)
julia> accumulate!(op, dst, src; dims=1)
10-element CuArray{Int64, 1, CUDA.DeviceMemory}:
1
2
2
2
2
2
2
3
3
2
julia> src = collect(src)
10-element Vector{Int64}:
3
4
3
7
3
9
7
2
9
6
julia> dst = collect(dst);
julia> accumulate!(op, dst, src; init=0)
10-element Vector{Int64}:
0
1
1
1
1
1
1
2
2
3
I noticed that the results differ between CPU and GPU execution.
Why is that the case? Also, could someone explain what the “neutral element” means in this context?