Unsupported call through a literal pointer on float power broadcast (CuArrays)

A compiler error occurs when broadcasting a Float power function on a CuArray: .^(::CuArray{Float32,1}, ::Float32)
I opened an issue on github as well here. I post here in case this is a usage error.
The Minimal Working Example for this bug:

using CuArrays, Flux
w = gpu(collect(1:10))
w.^2 # works
w.^2.0 # error
w.^2.0f0 # error

Stacktrace:

ERROR: InvalidIRError: compiling #23(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(^),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Float32}}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to jl_alloc_string)
Stacktrace:
 [1] _string_n at strings/string.jl:60
 [2] string at strings/substring.jl:180
 [3] throw_exp_domainerror at math.jl:35
 [4] ^ at math.jl:789
 [5] _broadcast_getindex_evalf at broadcast.jl:578
 [6] _broadcast_getindex at broadcast.jl:551
 [7] getindex at broadcast.jl:511
 [8] #23 at C:\Users\Henri\.julia\packages\GPUArrays\t8tJB\src\broadcast.jl:50
Reason: unsupported call through a literal pointer (call to memcpy)
Stacktrace:
 [1] unsafe_copyto! at array.jl:225
 [2] __unsafe_string! at strings/substring.jl:167
 [3] string at strings/substring.jl:183
 [4] throw_exp_domainerror at math.jl:35
 [5] ^ at math.jl:789
 [6] _broadcast_getindex_evalf at broadcast.jl:578
 [7] _broadcast_getindex at broadcast.jl:551
 [8] getindex at broadcast.jl:511
 [9] #23 at C:\Users\Henri\.julia\packages\GPUArrays\t8tJB\src\broadcast.jl:50
Reason: unsupported call to the Julia runtime (call to jl_box_float32)
Stacktrace:
 [1] throw_exp_domainerror at math.jl:35
 [2] ^ at math.jl:789
 [3] _broadcast_getindex_evalf at broadcast.jl:578
 [4] _broadcast_getindex at broadcast.jl:551
 [5] getindex at broadcast.jl:511
 [6] #23 at C:\Users\Henri\.julia\packages\GPUArrays\t8tJB\src\broadcast.jl:50
Stacktrace:
 [1] check_ir(::CUDAnative.CompilerContext, ::LLVM.Module) at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\compiler\validation.jl:77
 [2] compile(::CUDAnative.CompilerContext) at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\compiler\driver.jl:97
 [3] #compile#109(::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::VersionNumber, ::Any, ::Any)
at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\compiler\driver.jl:45
 [4] compile at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\compiler\driver.jl:43 [inlined]
 [5] #compile#108(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CUDAdrv.CuDevice, ::Function, ::Any)
at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\compiler\driver.jl:18
 [6] compile at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\compiler\driver.jl:16 [inlined]
 [7] macro expansion at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\execution.jl:269 [inlined]
 [8] #cufunction#123(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::getfield(GPUArrays, Symbol("##23#24")), ::Type{Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(^),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Float32}}}}) at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\execution.jl:240
 [9] cufunction(::Function, ::Type) at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\execution.jl:240
 [10] macro expansion at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\execution.jl:208 [inlined]
 [11] macro expansion at .\gcutils.jl:87 [inlined]
 [12] macro expansion at C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\src\execution.jl:205 [inlined]
 [13] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Float32,1}, ::Tuple{CuArray{Float32,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(^),Tuple{Base.Broadcast.Extruded{CuArray{Float32,1},Tuple{Bool},Tuple{Int64}},Float32}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at C:\Users\Henri\.julia\packages\CuArrays\qZCAt\src\gpuarray_interface.jl:59
 [14] gpu_call(::Function, ::CuArray{Float32,1}, ::Tuple{CuArray{Float32,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(^),Tuple{Base.Broadcast.Extruded{CuArray{Float32,1},Tuple{Bool},Tuple{Int64}},Float32}}}, ::Int64) at C:\Users\Henri\.julia\packages\GPUArrays\t8tJB\src\abstract_gpu_interface.jl:151
 [15] gpu_call at C:\Users\Henri\.julia\packages\GPUArrays\t8tJB\src\abstract_gpu_interface.jl:128 [inlined]
 [16] copyto! at C:\Users\Henri\.julia\packages\GPUArrays\t8tJB\src\broadcast.jl:48 [inlined]
 [17] copyto! at .\broadcast.jl:797 [inlined]
 [18] copy at .\broadcast.jl:773 [inlined]
 [19] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(^),Tuple{CuArray{Float32,1},Float32}}) at .\broadcast.jl:753
 [20] top-level scope at none:0

Build log

  Building MbedTLS ─────────→ `C:\Users\Henri\.julia\packages\MbedTLS\X4xar\deps\build.log`
  Building WebIO ───────────→ `C:\Users\Henri\.julia\packages\WebIO\7G1ZY\deps\build.log`
  Building Conda ───────────→ `C:\Users\Henri\.julia\packages\Conda\CpuvI\deps\build.log`
  Building FFTW ────────────→ `C:\Users\Henri\.julia\packages\FFTW\p7sLQ\deps\build.log`
  Building SpecialFunctions → `C:\Users\Henri\.julia\packages\SpecialFunctions\fvheQ\deps\build.log`
  Building Rmath ───────────→ `C:\Users\Henri\.julia\packages\Rmath\Py9gH\deps\build.log`
  Building PyCall ──────────→ `C:\Users\Henri\.julia\packages\PyCall\ttONZ\deps\build.log`
  Building CUDAdrv ─────────→ `C:\Users\Henri\.julia\packages\CUDAdrv\lu32K\deps\build.log`
  Building GR ──────────────→ `C:\Users\Henri\.julia\packages\GR\KGODl\deps\build.log`
  Building LLVM ────────────→ `C:\Users\Henri\.julia\packages\LLVM\tg8MX\deps\build.log`
  Building CodecZlib ───────→ `C:\Users\Henri\.julia\packages\CodecZlib\9jDi1\deps\build.log`
  Building Arpack ──────────→ `C:\Users\Henri\.julia\packages\Arpack\cu5By\deps\build.log`
  Building ZipFile ─────────→ `C:\Users\Henri\.julia\packages\ZipFile\YHTbb\deps\build.log`
  Building CUDAnative ──────→ `C:\Users\Henri\.julia\packages\CUDAnative\PFgO3\deps\build.log`
  Building Plots ───────────→ `C:\Users\Henri\.julia\packages\Plots\47Tik\deps\build.log`
  Building CuArrays ────────→ `C:\Users\Henri\.julia\packages\CuArrays\qZCAt\deps\build.log`

Environment details
Details on Julia:

Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "C:\Users\Henri\AppData\Local\atom\app-1.37.0\atom.exe"  -a
  JULIA_NUM_THREADS = 2

Julia packages:

  • CuArrays.jl
  • Flux.jl

CUDA: toolkit version: v10.1 and driver version: I don’t know where to find that information

Github solution from @maleadt:

JuliaGPU/CUDAnative.jl#367

You can always use CUDAnative intrinsics directly:

julia> CUDAnative.pow.(w,2f0)
10-element CuArray{Float32,1}:
  1.0     
  4.0     
  9.0     
 16.0     
 24.999998
 36.0     
 48.999996
 64.0     
 81.0     
 99.99999