CUDA.jl kernel is half as fast as c++ Kernel

I guess map could be useful.

julia> typeof(2:3)
UnitRange{Int64}

julia> typeof(map(Int32, 2:3))
UnitRange{Int32}