# Problems with LinearAlgebra functions within KernelAbstractions and CUDA

I’m almost done porting my solver over to GPU with the help of CUDA and KernelAbstractions. However, I keep running into problem with LinearAlgebra functions. Take this simple example

``````using StaticArrays
using CUDA: allowscalar,CuArray
allowscalar(false)
using KernelAbstractions
using LinearAlgebra: norm2,×
using BenchmarkTools

N = (100,100,100)
f = Array # CuArray
p = zeros(Float32,N) |> f;
u = rand(Float32,N...,3) |> f;
@kernel function kern(p,u)
I = @index(Global,Cartesian)
p[I] = norm2(SVector{3}(I.I) × SVector{3}(u[I,:]))
end
apply!(p,u) = kern(get_backend(p),64)(p,u,ndrange=size(p))
@btime apply!(p,u); # 16.459 ms (1000199 allocations: 76.31 MiB)
``````

This gives a crazy number of allocations running on a CPU and won’t run on the GPU

``````Reason: unsupported call through a literal pointer (call to ijl_alloc_array_1d)
Reason: unsupported dynamic function invocation (call to print_to_string(xs...) in Base at strings/io.jl:133)
Reason: unsupported dynamic function invocation (call to dimension_mismatch_fail(SA::Type, a::AbstractArray) in StaticArrays at C:\Users\gweymouth\.julia\packages\StaticArrays\4uslg\src\convert.jl:190)
``````

If I take out the SVector conversion, the allocations go way up on the CPU and the GPU throws errors about `Reason: unsupported dynamic function invocation (call to mapreduce_empty_iter(f, op, itr, ItrEltype) in Base at reduce.jl:375)`.

It seems the main issue is with `SVector{3}(u[I,:])`. Replacing this with `SA[u[I,1],u[I,2],u[I,3]]` runs and cuts most of the allocations. Can someone explain why?

Does a view on `u[I,:]` help ?