Hello,
I have a symbolic expression that I’d like to evaluate a large number of times on a GPU. A few days ago I made a post about creating an evaluator function (see: the previous post), which led me to Symbolics.build_function
, but the functions made this way do not seem to be broadcastable over CuArrays. Here’s a MWE with the type of error I’m getting:
using CUDA, Symbolics
@variables x, y
to_calculate = 1 + x*y
f = build_function(to_calculate, x, y, expression=Val{false})
out = CUDA.fill(0.0f0, 5)
xvals = CUDA.fill(4.0f0, 5)
yvals = CUDA.fill(5.0f0, 5)
out .= f.(xvals, yvals)
And the stacktrace:
ERROR: GPU compilation of kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, which is not isbits:
.f is of type RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr} which is not isbits.
.body is of type Expr which is not isbits.
.head is of type Symbol which is not isbits.
.args is of type Vector{Any} which is not isbits.
Stacktrace:
[1] check_invocation(job::GPUCompiler.CompilerJob)
@ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\validation.jl:86
[2] macro expansion
@ C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:413 [inlined]
[3] macro expansion
@ C:\Users\Robert\.julia\packages\TimerOutputs\jgSVI\src\TimerOutput.jl:252 [inlined]
[4] macro expansion
@ C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:412 [inlined]
[5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\utils.jl:64
[6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
@ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:354
[7] #224
@ C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:347 [inlined]
[8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}})
@ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:74
[9] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:346
[10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\cache.jl:90
[11] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:299
[12] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}})
@ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:293
[13] macro expansion
@ C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:102 [inlined]
[14] #launch_heuristic#248
@ C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\gpuarrays.jl:17 [inlined]
[15] _copyto!
@ C:\Users\Robert\.julia\packages\GPUArrays\gok9K\src\host\broadcast.jl:73 [inlined]
[16] materialize!
@ C:\Users\Robert\.julia\packages\GPUArrays\gok9K\src\host\broadcast.jl:51 [inlined]
[17] materialize!(dest::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", (0x46ad784e, 0x1596c737, 0x6d1e01f0, 0xc25942f6, 0x1ba02e3f)}, Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})
@ Base.Broadcast .\broadcast.jl:868
[18] top-level scope
@ c:\Users\Robert\OneDrive\Research\General Work Files\GPU_testing.jl:11
My understanding of this error is that the RuntimeGeneratedFunction
output from build_function
contains objects like Symbol
and Vector{Any}
, which the GPU can’t handle. Is there a way to make a RuntimeGeneratedFunction
broadcastable over CuArrays?
Or alternatively, is there a different way to generate a CuArray-broadcastable Julia function to evaluate some expression?
Any help would be greatly appreciated!
Thanks!