I’m trying to broadcast a large function on a GPU Array, but it fails.

```
using CUDA
v1 = CUDA.rand(10)
function fun1(x)
cos(x) * sin(x) +
cos(2 * x) * sin(x) +
cos(3 * x) * sin(x) +
cos(4 * x) * sin(x) +
cos(5 * x) * sin(x) +
cos(6 * x) * sin(x) +
cos(7 * x) * sin(x) +
cos(8 * x) * sin(x) +
cos(9 * x) * sin(x) +
cos(10 * x) * sin(x) +
cos(11 * x) * sin(x) +
cos(12 * x) * sin(x) +
cos(13 * x) * sin(x) +
cos(14 * x) * sin(x) +
cos(15 * x) * sin(x) +
cos(16 * x) * sin(x) +
cos(17 * x) * sin(x) +
cos(18 * x) * sin(x) +
cos(19 * x) * sin(x) +
cos(20 * x) * sin(x) +
cos(21 * x) * sin(x) +
cos(22 * x) * sin(x) +
cos(23 * x) * sin(x) +
cos(24 * x) * sin(x) +
cos(25 * x) * sin(x) +
cos(26 * x) * sin(x) +
cos(27 * x) * sin(x) +
cos(28 * x) * sin(x) +
cos(29 * x) * sin(x) +
cos(30 * x) * sin(x) +
cos(31 * x) * sin(x) +
cos(32 * x) * sin(x) +
cos(33 * x) * sin(x) +
cos(34 * x) * sin(x) +
cos(35 * x) * sin(x) +
cos(36 * x) * sin(x)
end
function fun2(x)
cos(x) * sin(x) +
cos(2 * x) * sin(x) +
cos(3 * x) * sin(x) +
cos(4 * x) * sin(x) +
cos(5 * x) * sin(x) +
cos(6 * x) * sin(x) +
cos(7 * x) * sin(x) +
cos(8 * x) * sin(x) +
cos(9 * x) * sin(x) +
cos(10 * x) * sin(x) +
cos(11 * x) * sin(x) +
cos(12 * x) * sin(x) +
cos(13 * x) * sin(x) +
cos(14 * x) * sin(x) +
cos(15 * x) * sin(x) +
cos(16 * x) * sin(x) +
cos(17 * x) * sin(x) +
cos(18 * x) * sin(x) +
cos(19 * x) * sin(x) +
cos(20 * x) * sin(x) +
cos(21 * x) * sin(x) +
cos(22 * x) * sin(x) +
cos(23 * x) * sin(x) +
cos(24 * x) * sin(x) +
cos(25 * x) * sin(x) +
cos(26 * x) * sin(x) +
cos(27 * x) * sin(x) +
cos(28 * x) * sin(x) +
cos(29 * x) * sin(x) +
cos(30 * x) * sin(x) +
cos(31 * x) * sin(x) +
cos(32 * x) * sin(x) +
cos(33 * x) * sin(x) +
cos(34 * x) * sin(x) +
cos(35 * x) * sin(x)
end
fun1.(v1)
fun2.(v1)
```

Here ‘fun1’ fails, while ‘fun2’ works. ‘fun2’ is a bit shorter which I only drop the cos(36 * x) * sin(x) term in the last line.

The error shows:

```
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(fun1), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to jl_f__apply_iterate)
Stacktrace:
[1] +
@ ./operators.jl:591
[2] fun1
@ ~/Downloads/test/fun.jl:4
[3] _broadcast_getindex_evalf
@ ./broadcast.jl:670
[4] _broadcast_getindex
@ ./broadcast.jl:643
[5] getindex
@ ./broadcast.jl:597
[6] broadcast_kernel
@ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
```