Strange behavior of `mapreduce`

I have a kernel, which uses the exp operation, it breaks down in a strange way

Minimum working example:

using CuArrays, CUDAnative
CuArrays.allowscalar(false)

x = randn(1000) |> cu
CUDAnative.map(CUDAnative.exp, x)
CUDAnative.reduce(+, x)

# works, but no kernel is evaluated
@device_code_warntype CUDAnative.mapreduce(exp, +, x) # no kernel evaluated

# the following code will cause Julia breaking down
@device_code_warntype CUDAnative.mapreduce(CUDAnative.exp, +, x)

Questions:

  1. Why mapreduce can run without calling a kernel and map calls the kernel?
  2. Is it possible to prevent Julia from breaking down when calling CUDAnative.exp directly (not on GPU) and other CUDA intrinsics? Relaunching Julia REPL or Atom can be frustrating.
  3. How to make the kernel function reusable, i.e. switch between CUDAnative.exp and Base.exp without changing the kernel function?
julia> @device_code_warntype CUDAnative.mapreduce(CUDAnative.exp, +, x)
Invalid instruction at 0x7fcb18a3707c: 0x06, 0x00, 0x00, 0x00, 0xd0, 0x8b, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd0, 0x8b, 0x27

signal (4): Illegal instruction

That indicates GPU code is being executed on the CPU, which means mapreduce for CuArrays triggers a CPU fallback (but without scalar indexing, so https://github.com/JuliaGPU/CuArrays.jl/issues/180 would have been useful there).

Please open an issue on CuArrays. The entire CuArrays/GPUArrays mapreduce implementation needs a clean-up.

Thanks a lot for your explaination, may I also merge this issue
https://github.com/JuliaGPU/CuArrays.jl/issues/199

into a list of improvements for mapreduce?