I was trying to find the largest error between two matrix columns and I wanted to try parallelizing it on the GPU:
using FLoops
using CUDA
using FoldsCUDA
#To loop through all column pairs
allpairs(v) = ((i,j) for j in v for i in v if i > j)
function maxScore(data::CuArray{T}) where T
@floop CUDAEx() for (i,j) in allpairs(axes(data,2))
X = view(data,:,i)
Y = view(data,:,j)
currentScore = sum(abs2,X-Y)
@reduce() do (bestScore = zero(T); currentScore)
if bestScore < currentScore
bestScore = currentScore
end
end
end
return bestScore
end
When I run this function I get some odd errors about InvalidIRError
and unsupported dynamic function invocation (call to print_to_string(xs...)
which doesn’t make sense to me. Does anyone see the problem I’m missing? The data I was using as input was just data = CUDA.rand(100,100)