Does the precision of fld matter? If not, then floating-point division is much faster:
julia> x = rand(); y = 0.5;
julia> @btime Int(fld($x, $y)); # sometimes only 9ns
19.523 ns (0 allocations: 0 bytes)
julia> @btime floor(Int, $x/$y);
4.090 ns (0 allocations: 0 bytes)
I tried to integrate it into the loop inside BatchExtractCells!, but that slowed things down for some reason.
With
function BatchExtractCells3!(Cells, Points, CutOff)
# @batch per=thread
for i ∈ eachindex(Cells)
t = map(Tuple(Points[i])) do x
floor(Int, x/CutOff)+2
end
Cells[i] = CartesianIndex(t)
end
end
this gives:
julia> @btime BatchExtractCells!($Cells, $Points, $CutOff)
5.172 ms (0 allocations: 0 bytes)
julia> @btime BatchExtractCells3!($Cells, $Points, $CutOff)
318.036 μs (0 allocations: 0 bytes)