Mapping functions using CuArrays

I am attempting to map functions to the GPU using CUDA, CuArrays, and map(). My ultimate goal is to take what is currently a serial grid search and run each reduction in parallel on the GPU. For this, I need to pass a variety of static arrays and a single varying parameter to the function which then does some operations and returns a vector as a result. Stripping this down, I can easily do what I want on the CPU and the result is what I expect.

julia> function testfunction(x,y=y1)
           x .+ y
       end
testfunction (generic function with 2 methods)

julia> y1=(collect(range(1.0,stop=20,length=20)));

julia> a=ones(4);

julia> res=map(testfunction,a)
4-element Array{Array{Float64,1},1}:
 [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0]
 [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0]
 [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0]
 [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0]

My understanding is that I should be able to use the same exact code where the input variables are CuArrays and get the same thing back out. Unfortunately, I receive an error when attempting this. The following code gives “ERROR: CuArray only supports bits types”.

julia> using CUDA

julia> function testfunction2(x,y=y2)
           x .+ y
       end
testfunction2 (generic function with 2 methods)

julia> y2=CuArray(collect(range(1.0,stop=10,length=10)));

julia> a2=CuArray(ones(4));

julia> res=map(testfunction2,a2)
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays C:\Users\matda\.julia\packages\GPUArrays\eVYIC\src\host\indexing.jl:43
ERROR: CuArray only supports bits types
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] CuArray{CuArray{Float64,1},1}(::UndefInitializer, ::Tuple{Int64}) at C:\Users\matda\.julia\packages\CUDA\dZvbp\src\array.jl:115
 [3] CuArray{CuArray{Float64,1},N} where N(::UndefInitializer, ::Tuple{Int64}) at C:\Users\matda\.julia\packages\CUDA\dZvbp\src\array.jl:124
 [4] similar(::Type{CuArray{CuArray{Float64,1},N} where N}, ::Tuple{Int64}) at .\abstractarray.jl:675
 [5] similar(::Type{CuArray{CuArray{Float64,1},N} where N}, ::Tuple{Base.OneTo{Int64}}) at .\abstractarray.jl:674
 [6] similar(::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},typeof(testfunction2),Tuple{Base.Broadcast.Extruded{CuArray{Float64,1},Tuple{Bool},Tuple{Int64}}}}, ::Type{CuArray{Float64,1}}) at C:\Users\matda\.julia\packages\CUDA\dZvbp\src\broadcast.jl:11
 [7] copy at .\broadcast.jl:877 [inlined]
 [8] materialize(::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Nothing,typeof(testfunction2),Tuple{CuArray{Float64,1}}}) at .\broadcast.jl:837
 [9] map(::Function, ::CuArray{Float64,1}) at C:\Users\matda\.julia\packages\GPUArrays\eVYIC\src\host\broadcast.jl:89
 [10] top-level scope at REPL[5]:1

I am unsure if this is due to a misunderstanding on my part, or an actual problem. Any explanation or assistance is appreciated.

Perhaps a helpful thread, on arrays-of-arrays and GPUs: Map Performance with CuArrays

Thank you for the reply. This does shed some light on the issue.