I’m trying to convert my Julia code with CUDA programming, because I get a OutOfMemory()
error when I run the original code on Julia v.1.2. My converted code starts with the following lines:
using CuArrays
using Distances
using LinearAlgebra
using Distributions
data=Float32.(rand(10000,15))
Eucldist=pairwise(Euclidean(),data,dims=1)
D=maximum(Eucldist.^2)
sigma2hat=mean(((Eucldist.^2)./D)[tril!(trues(size((Eucldist.^2)./D)),-1)])
L=exp.(-(Eucldist.^2/D)/(2*sigma2hat))
L=cu(L)
K=eigen(L)
With the last command, I get the following error:
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with
allowscalar(false)
└ @ GPUArrays C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\indexing.jl:16
ERROR: InvalidIRError: compiling reduce_kernel(CuArrays.CuKernelState, typeof(==), typeof(&), Bool, CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Val{256}, CuDeviceArray{Bool,1,CUDAnative.AS.Global}, Adjoint{Float32,CuDeviceArray{Float32,2,CUDAnative.AS.Global}}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to simple_broadcast_index)
Stacktrace:
[1] reduce_kernel at C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\mapreduce.jl:141
Stacktrace:
[1] check_ir(::CUDAnative.CompilerJob, ::LLVM.Module) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\compiler\validation.jl:114
[2] macro expansion at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\compiler\driver.jl:188 [inlined]
[3] macro expansion at C:\Users\User.julia\packages\TimerOutputs\7zSea\src\TimerOutput.jl:216 [inlined]
[4] #codegen#130(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\compiler\driver.jl:186
[5] #codegen at .\none:0 [inlined]
[6] #compile#129(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\compiler\driver.jl:47
[7] #compile#128 at .\none:0 [inlined]
[8] #compile at .\none:0 [inlined] (repeats 2 times)
[9] macro expansion at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:389 [inlined]
[10] #cufunction#170(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}, ::typeof(cufunction), ::typeof(GPUArrays.reduce_kernel), ::Type{Tuple{CuArrays.CuKernelState,typeof(==),typeof(&),Bool,CuDeviceArray{Float32,2,CUDAnative.AS.Global},Val{256},CuDeviceArray{Bool,1,CUDAnative.AS.Global},Adjoint{Float32,CuDeviceArray{Float32,2,CUDAnative.AS.Global}}}}) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:357
[11] cufunction(::Function, ::Type) at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:357
[12] macro expansion at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:174 [inlined]
[13] macro expansion at .\gcutils.jl:87 [inlined]
[14] macro expansion at C:\Users\User.julia\packages\CUDAnative\LkH1v\src\execution.jl:171 [inlined]
[15] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Bool,1}, ::Tuple{typeof(==),typeof(&),Bool,CuArray{Float32,2},Val{256},CuArray{Bool,1},Adjoint{Float32,CuArray{Float32,2}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at C:\Users\User.julia\packages\CuArrays\wXQp8\src\gpuarray_interface.jl:60
[16] gpu_call(::Function, ::CuArray{Bool,1}, ::Tuple{typeof(==),typeof(&),Bool,CuArray{Float32,2},Val{256},CuArray{Bool,1},Adjoint{Float32,CuArray{Float32,2}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\abstract_gpu_interface.jl:151
[17] acc_mapreduce(::Function, ::Function, ::Bool, ::CuArray{Float32,2}, ::Tuple{Adjoint{Float32,CuArray{Float32,2}}}) at C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\mapreduce.jl:186
[18] ishermitian at C:\Users\User.julia\packages\GPUArrays\J4c3Q\src\mapreduce.jl:15 [inlined]
[19] issymmetric at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\LinearAlgebra\src\generic.jl:1009 [inlined]
[20] #eigen!#56(::Bool, ::Bool, ::typeof(LinearAlgebra.eigsortby), ::typeof(eigen!), ::CuArray{Float32,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\LinearAlgebra\src\eigen.jl:53
[21] #eigen! at .\none:0 [inlined]
[22] #eigen#58(::Bool, ::Bool, ::typeof(LinearAlgebra.eigsortby), ::typeof(eigen), ::CuArray{Float32,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\LinearAlgebra\src\eigen.jl:139
[23] eigen(::CuArray{Float32,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\LinearAlgebra\src\eigen.jl:137
[24] top-level scope at none:0
According to https://github.com/JuliaGPU/CuArrays.jl “because CuArray
is an AbstractArray
, it doesn’t have much of a learning curve; just use your favourite array ops as usual.” So what am I dong wrong if I cannot perform a simple eigendecomposition…? And that’s just the beginning of my code…