Hi all,
I am new with GPU programming (and therefore CLArrays and GPUArrays). I am building a simple kernel using those 2 packages and I need to extract an entire row (or any sub-matrix) of a very large Array (call it A) and perform some operations (sums, min, max…)
Here is an extremely simplified version of my code. Notice that I want to grab all the elements of the Array A on one specific dimension and then do some basic computations.
using CLArrays
using GPUArrays
function Example_Reduction(x_points::Int64, y_points::Int64)
A = rand(x_points,y_points,10)
CL_A = CLArray{Float32,3}(A)
CL_A_sub = CLArray{Float32,2}((x_points,y_points))
tmp_indices = zeros(size(k_grid,1),size(b_grid,1))
CL_grid_kb = CLArray(Array{Float32,2}(tmp_indices));
gpu_call(CL_grid_kb, (CL_grid_kb, CL_A, CL_A_sub)) do state, CL_grid_kb, CL_A, CL_A_sub
idx2 = @cartesianidx CL_grid_kb
i_k = idx2[1]
i_b = idx2[2]
# CL_A_sub[i_k,i_b] = CL_A[i_k,i_b,1] #<This works
CL_A_sub[i_k,i_b] = minimum(CL_A[i_k,i_b,:]) #<This does not work
# CL_A_sub[i_k,i_b] = sum(CL_A[i_k,i_b,:]) #<This does not work
return
end
return Array{Float64,2}(CL_A_sub)
end
A_sub = Example_Reduction(30,20)
The code breaks down whenever I try to extract many elements of A (i.e., CL_A[i_k,i_b,:]). This is the output that I get:
LoadError: AssertionError: Found non concrete return type: Union{}
As a workaround I have created a loop to extract the desired elements of the Array A and perform the desired operations. This is working just fine in the meantime, but it is certainly not the most efficient solution… Any ideas on what is the most efficient solution here?
FYI: I am on Windows 10, using AMD Radeon RX 570.
Thanks a lot!