As my first attempt at something serious and practical I want to achieve with a GPU, I have a problem where I have an array of elements, each element is a tuple of (value, count_of_value). count_of_value is initially uniformly 1:
julia> arr2
50-element Array{Tuple{Int64,Int64},1}:
 (6, 1)
 (6, 1)
 (1, 1)
 (10, 1)
 (10, 1)
 (4, 1)
 (9, 1)
 (4, 1)
 (10, 1)
 (4, 1)
 (4, 1)
 (1, 1)
 (6, 1)
 (1, 1)
 (9, 1)
 (6, 1)
 (8, 1)
 (9, 1)
 (5, 1)
 (6, 1)
 (9, 1)
 (2, 1)
 (6, 1)
 (10, 1)
 (2, 1)
 (5, 1)
 (8, 1)
 (5, 1)
 (6, 1)
 (5, 1)
 (10, 1)
 (8, 1)
 (7, 1)
 (4, 1)
 (1, 1)
 (8, 1)
 (7, 1)
 (9, 1)
 (4, 1)
 (1, 1)
 (6, 1)
 (5, 1)
 (2, 1)
 (4, 1)
 (10, 1)
 (7, 1)
 (10, 1)
 (3, 1)
 (7, 1)
 (10, 1)
If I want to count these values, on the CPU, In my optimised code (not shown) I sort the vector, and then collapse runs of the same value, ending up with something that looks like this:
julia> sort!(arr2)
50-element Array{Tuple{Int64,Int64},1}:
 (1, 1)
 (1, 1)
 (1, 1)
 (1, 1)
 (1, 1)
 (2, 1)
 (2, 1)
 (2, 1)
 (3, 1)
 (4, 1)
 (4, 1)
 (4, 1)
 (4, 1)
 (4, 1)
 (4, 1)
 (4, 1)
 (5, 1)
 (5, 1)
 (5, 1)
 (5, 1)
 (5, 1)
 (6, 1)
 (6, 1)
 (6, 1)
 (6, 1)
 (6, 1)
 (6, 1)
 (6, 1)
 (6, 1)
 (7, 1)
 (7, 1)
 (7, 1)
 (7, 1)
 (8, 1)
 (8, 1)
 (8, 1)
 (8, 1)
 (9, 1)
 (9, 1)
 (9, 1)
 (9, 1)
 (9, 1)
 (10, 1)
 (10, 1)
 (10, 1)
 (10, 1)
 (10, 1)
 (10, 1)
 (10, 1)
 (10, 1)
julia> arr3 = [(y[1], count(x -> x == y, arr2)) for y in unique(arr2)] # NOT my optimised collapsing version
10-element Array{Tuple{Int64,Int64},1}:
 (1, 5)
 (2, 3)
 (3, 1)
 (4, 7)
 (5, 5)
 (6, 8)
 (7, 4)
 (8, 4)
 (9, 5)
 (10, 8)
I’m interested to see if this can be done with CUDA on the GPU, I’ve had a look at Kernel for building histogram on GPU - #11 by jeremiedb but the discussion starts talking about shared dot operations and the relevance gets away from me.
CUDA.sort in the REPL returns a reference to a function so I guess there is a CuArray version of sort for the GPU?
The next part I’m unsure of is if it’s possible to collapse an array on the GPU using a high-level function of if a custom kernel is nessecery.