I don’t have a GPU to test this right now, but reduce and mapreduce should be fast on CUDA arrays, so things like
reduce(a, dims=1) do acc, val
isnan(val) ? acc : val + acc
end
should be an efficient way to filter NaNs out in a sum. For more complex statistics, it’s worth checking if it’s easy to get them using say Transducers or OnlineStats. In principle both packages should work with reduce and thus be GPU compatible (haven’t checked though).