ERROR: InvalidIRError: compiling reduce_kernel with `eigen` command

I’m trying to convert my Julia code with CUDA programming, because I get a OutOfMemory() error when I run the original code on Julia v.1.2. My converted code starts with the following lines:

using CuArrays
using Distances
using LinearAlgebra
using Distributions
data=Float32.(rand(10000,15))
Eucldist=pairwise(Euclidean(),data,dims=1)
D=maximum(Eucldist.^2)
sigma2hat=mean(((Eucldist.^2)./D)[tril!(trues(size((Eucldist.^2)./D)),-1)])
L=exp.(-(Eucldist.^2/D)/(2*sigma2hat))
L=cu(L)
K=eigen(L)

With the last command, I get the following error:

┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with allowscalar(false)
└ @ GPUArrays C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\indexing.jl:16
ERROR: InvalidIRError: compiling reduce_kernel(CuArrays.CuKernelState, typeof(==), typeof(&), Bool, CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Val{256}, CuDeviceArray{Bool,1,CUDAnative.AS.Global}, Adjoint{Float32,CuDeviceArray{Float32,2,CUDAnative.AS.Global}}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to simple_broadcast_index)
Stacktrace:
[1] reduce_kernel at C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\mapreduce.jl:141
Stacktrace:
[1] check_ir(::CUDAnative.CompilerJob, ::LLVM.Module) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\compiler\validation.jl:114
[2] macro expansion at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\compiler\driver.jl:188 [inlined]
[3] macro expansion at C:\Users\User.julia\packages\TimerOutputs\7zSea\src\TimerOutput.jl:216 [inlined]
[4] #codegen#130(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\compiler\driver.jl:186
[5] #codegen at .\none:0 [inlined]
[6] #compile#129(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\compiler\driver.jl:47
[7] #compile#128 at .\none:0 [inlined]
[8] #compile at .\none:0 [inlined] (repeats 2 times)
[9] macro expansion at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:389 [inlined]
[10] #cufunction#170(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}, ::typeof(cufunction), ::typeof(GPUArrays.reduce_kernel), ::Type{Tuple{CuArrays.CuKernelState,typeof(==),typeof(&),Bool,CuDeviceArray{Float32,2,CUDAnative.AS.Global},Val{256},CuDeviceArray{Bool,1,CUDAnative.AS.Global},Adjoint{Float32,CuDeviceArray{Float32,2,CUDAnative.AS.Global}}}}) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:357
[11] cufunction(::Function, ::Type) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:357
[12] macro expansion at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:174 [inlined]
[13] macro expansion at .\gcutils.jl:87 [inlined]
[14] macro expansion at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:171 [inlined]
[15] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Bool,1}, ::Tuple{typeof(==),typeof(&),Bool,CuArray{Float32,2},Val{256},CuArray{Bool,1},Adjoint{Float32,CuArray{Float32,2}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at C:\Users\User.julia\packages\CuArrays\wXQp8\src\gpuarray_interface.jl:60
[16] gpu_call(::Function, ::CuArray{Bool,1}, ::Tuple{typeof(==),typeof(&),Bool,CuArray{Float32,2},Val{256},CuArray{Bool,1},Adjoint{Float32,CuArray{Float32,2}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\abstract_gpu_interface.jl:151
[17] acc_mapreduce(::Function, ::Function, ::Bool, ::CuArray{Float32,2}, ::Tuple{Adjoint{Float32,CuArray{Float32,2}}}) at C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\mapreduce.jl:186
[18] ishermitian at C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\mapreduce.jl:15 [inlined]
[19] issymmetric at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\LinearAlgebra\src\generic.jl:1009 [inlined]
[20] #eigen!#56(::Bool, ::Bool, ::typeof(LinearAlgebra.eigsortby), ::typeof(eigen!), ::CuArray{Float32,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\LinearAlgebra\src\eigen.jl:53
[21] #eigen! at .\none:0 [inlined]
[22] #eigen#58(::Bool, ::Bool, ::typeof(LinearAlgebra.eigsortby), ::typeof(eigen), ::CuArray{Float32,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\LinearAlgebra\src\eigen.jl:139
[23] eigen(::CuArray{Float32,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\LinearAlgebra\src\eigen.jl:137
[24] top-level scope at none:0

According to https://github.com/JuliaGPU/CuArrays.jl “because CuArray is an AbstractArray , it doesn’t have much of a learning curve; just use your favourite array ops as usual.” So what am I dong wrong if I cannot perform a simple eigendecomposition…? And that’s just the beginning of my code…

CuArrays indeed aims to provide a relatively simple programming model for GPUs. We generally aim to have linear algebra and broadcast/map just work for users, but that doesn’t mean all packages just work. Examples are functions that are for loops over GPU memory and therefore do scalar indexing into GPU memory (that will be rather slow).

Can you post the full stacktrace? It sounds like somewhere something is calling pointer on CuArray and that is not possible, since most pieces of code that operate on Ptr expect it to be a CPU address.

I started answering your post before you updated it.


The updated snippet indeed seems like a bug in probably GPUArrays.jl

I put the whole stacktrace in my post :slight_smile: From [1] to [24]

Yeah that was a response to version 1 of your post :slight_smile:

https://github.com/JuliaGPU/GPUArrays.jl/issues/201

Yes :slight_smile: I tried to put more lines of my code :slight_smile: