As my first attempt at something serious and practical I want to achieve with a GPU, I have a problem where I have an array of elements, each element is a tuple of (value, count_of_value). count_of_value is initially uniformly 1:
julia> arr2
50-element Array{Tuple{Int64,Int64},1}:
(6, 1)
(6, 1)
(1, 1)
(10, 1)
(10, 1)
(4, 1)
(9, 1)
(4, 1)
(10, 1)
(4, 1)
(4, 1)
(1, 1)
(6, 1)
(1, 1)
(9, 1)
(6, 1)
(8, 1)
(9, 1)
(5, 1)
(6, 1)
(9, 1)
(2, 1)
(6, 1)
(10, 1)
(2, 1)
(5, 1)
(8, 1)
(5, 1)
(6, 1)
(5, 1)
(10, 1)
(8, 1)
(7, 1)
(4, 1)
(1, 1)
(8, 1)
(7, 1)
(9, 1)
(4, 1)
(1, 1)
(6, 1)
(5, 1)
(2, 1)
(4, 1)
(10, 1)
(7, 1)
(10, 1)
(3, 1)
(7, 1)
(10, 1)
If I want to count these values, on the CPU, In my optimised code (not shown) I sort the vector, and then collapse runs of the same value, ending up with something that looks like this:
julia> sort!(arr2)
50-element Array{Tuple{Int64,Int64},1}:
(1, 1)
(1, 1)
(1, 1)
(1, 1)
(1, 1)
(2, 1)
(2, 1)
(2, 1)
(3, 1)
(4, 1)
(4, 1)
(4, 1)
(4, 1)
(4, 1)
(4, 1)
(4, 1)
(5, 1)
(5, 1)
(5, 1)
(5, 1)
(5, 1)
(6, 1)
(6, 1)
(6, 1)
(6, 1)
(6, 1)
(6, 1)
(6, 1)
(6, 1)
(7, 1)
(7, 1)
(7, 1)
(7, 1)
(8, 1)
(8, 1)
(8, 1)
(8, 1)
(9, 1)
(9, 1)
(9, 1)
(9, 1)
(9, 1)
(10, 1)
(10, 1)
(10, 1)
(10, 1)
(10, 1)
(10, 1)
(10, 1)
(10, 1)
julia> arr3 = [(y[1], count(x -> x == y, arr2)) for y in unique(arr2)] # NOT my optimised collapsing version
10-element Array{Tuple{Int64,Int64},1}:
(1, 5)
(2, 3)
(3, 1)
(4, 7)
(5, 5)
(6, 8)
(7, 4)
(8, 4)
(9, 5)
(10, 8)
I’m interested to see if this can be done with CUDA on the GPU, I’ve had a look at Kernel for building histogram on GPU - #11 by jeremiedb but the discussion starts talking about shared dot operations and the relevance gets away from me.
CUDA.sort
in the REPL returns a reference to a function so I guess there is a CuArray version of sort for the GPU?
The next part I’m unsure of is if it’s possible to collapse an array on the GPU using a high-level function of if a custom kernel is nessecery.