Broadcasting an expression-evaluator function

Hello,

I have a symbolic expression that I’d like to evaluate a large number of times on a GPU. A few days ago I made a post about creating an evaluator function (see: the previous post), which led me to Symbolics.build_function, but the functions made this way do not seem to be broadcastable over CuArrays. Here’s a MWE with the type of error I’m getting:

using CUDA, Symbolics

@variables x, y
to_calculate = 1 + x*y
f = build_function(to_calculate, x, y, expression=Val{false})

out = CUDA.fill(0.0f0, 5)
xvals = CUDA.fill(4.0f0, 5)
yvals = CUDA.fill(5.0f0, 5)

out .= f.(xvals, yvals)

And the stacktrace:

ERROR: GPU compilation of kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, which is not isbits:
  .f is of type RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr} which is not isbits.
    .body is of type Expr which is not isbits.
      .head is of type Symbol which is not isbits.
      .args is of type Vector{Any} which is not isbits.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\validation.jl:86
  [2] macro expansion
    @ C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:413 [inlined]
  [3] macro expansion
    @ C:\Users\Robert\.julia\packages\TimerOutputs\jgSVI\src\TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:412 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:354     
  [7] #224
    @ C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}})
    @ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\cache.jl:90
 [11] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:299
 [12] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}})
    @ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:293
 [13] macro expansion
    @ C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:102 [inlined]
 [14] #launch_heuristic#248
    @ C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\gpuarrays.jl:17 [inlined]
 [15] _copyto!
    @ C:\Users\Robert\.julia\packages\GPUArrays\gok9K\src\host\broadcast.jl:73 [inlined]
 [16] materialize!
    @ C:\Users\Robert\.julia\packages\GPUArrays\gok9K\src\host\broadcast.jl:51 [inlined]
 [17] materialize!(dest::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", (0x46ad784e, 0x1596c737, 0x6d1e01f0, 0xc25942f6, 0x1ba02e3f)}, Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})
    @ Base.Broadcast .\broadcast.jl:868
 [18] top-level scope
    @ c:\Users\Robert\OneDrive\Research\General Work Files\GPU_testing.jl:11

My understanding of this error is that the RuntimeGeneratedFunction output from build_function contains objects like Symbol and Vector{Any}, which the GPU can’t handle. Is there a way to make a RuntimeGeneratedFunction broadcastable over CuArrays?

Or alternatively, is there a different way to generate a CuArray-broadcastable Julia function to evaluate some expression?

Any help would be greatly appreciated!

Thanks!

Try

@eval f(x, y) = $(build_function(to_calculate, x, y, expression=Val{true})) 

instead of

RuntimeGeneratedFunctions.jl doesn’t really play nicely with CUDA.jl.

Thanks very much, that seems to do the trick! But it results in some pretty weird syntax to call the function (and the x and y on the LHS get de-coupled from the x and y in build_function):

@eval f(x, y) = $(build_function(to_calculate, x, y, expression=Val{true}))

out .= f(xvals, yvals).(xvals, yvals) #Correct result
out .= f(0,0).(xvals, yvals) #Correct result
out .= f.(xvals, yvals) #InvalidIRError; Reason: unsupported dynamic function invocation [etc.]

Looks like the (x, y) can simply be removed, though, and that fixes the weird syntax issue:

@eval g = $(build_function(to_calculate, x, y, expression=Val{true}))

out .= g.(xvals, yvals) #Correct result

Thanks again for your help!