Broadcasting an expression-evaluator function

RXGottlieb · August 1, 2022, 11:00pm

Hello,

I have a symbolic expression that I’d like to evaluate a large number of times on a GPU. A few days ago I made a post about creating an evaluator function (see: the previous post), which led me to Symbolics.build_function, but the functions made this way do not seem to be broadcastable over CuArrays. Here’s a MWE with the type of error I’m getting:

using CUDA, Symbolics

@variables x, y
to_calculate = 1 + x*y
f = build_function(to_calculate, x, y, expression=Val{false})

out = CUDA.fill(0.0f0, 5)
xvals = CUDA.fill(4.0f0, 5)
yvals = CUDA.fill(5.0f0, 5)

out .= f.(xvals, yvals)

And the stacktrace:

ERROR: GPU compilation of kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, which is not isbits:
  .f is of type RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr} which is not isbits.
    .body is of type Expr which is not isbits.
      .head is of type Symbol which is not isbits.
      .args is of type Vector{Any} which is not isbits.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\validation.jl:86
  [2] macro expansion
    @ C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:413 [inlined]
  [3] macro expansion
    @ C:\Users\Robert\.julia\packages\TimerOutputs\jgSVI\src\TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:412 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:354     
  [7] #224
    @ C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}})
    @ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\Robert\.julia\packages\GPUCompiler\iaKrd\src\cache.jl:90
 [11] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:299
 [12] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", Expr}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}})
    @ CUDA C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:293
 [13] macro expansion
    @ C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:102 [inlined]
 [14] #launch_heuristic#248
    @ C:\Users\Robert\.julia\packages\CUDA\DfvRa\src\gpuarrays.jl:17 [inlined]
 [15] _copyto!
    @ C:\Users\Robert\.julia\packages\GPUArrays\gok9K\src\host\broadcast.jl:73 [inlined]
 [16] materialize!
    @ C:\Users\Robert\.julia\packages\GPUArrays\gok9K\src\host\broadcast.jl:51 [inlined]
 [17] materialize!(dest::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:x, :y), Symbolics.var"#_RGF_ModTag", Symbolics.var"#_RGF_ModTag", (0x46ad784e, 0x1596c737, 0x6d1e01f0, 0xc25942f6, 0x1ba02e3f)}, Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})
    @ Base.Broadcast .\broadcast.jl:868
 [18] top-level scope
    @ c:\Users\Robert\OneDrive\Research\General Work Files\GPU_testing.jl:11

My understanding of this error is that the RuntimeGeneratedFunction output from build_function contains objects like Symbol and Vector{Any}, which the GPU can’t handle. Is there a way to make a RuntimeGeneratedFunction broadcastable over CuArrays?

Or alternatively, is there a different way to generate a CuArray-broadcastable Julia function to evaluate some expression?

Any help would be greatly appreciated!

Thanks!

simeonschaub · August 2, 2022, 1:36am

Try

@eval f(x, y) = $(build_function(to_calculate, x, y, expression=Val{true}))

instead of

RuntimeGeneratedFunctions.jl doesn’t really play nicely with CUDA.jl.

RXGottlieb · August 2, 2022, 2:21pm

Thanks very much, that seems to do the trick! But it results in some pretty weird syntax to call the function (and the x and y on the LHS get de-coupled from the x and y in build_function):

@eval f(x, y) = $(build_function(to_calculate, x, y, expression=Val{true}))

out .= f(xvals, yvals).(xvals, yvals) #Correct result
out .= f(0,0).(xvals, yvals) #Correct result
out .= f.(xvals, yvals) #InvalidIRError; Reason: unsupported dynamic function invocation [etc.]

Looks like the (x, y) can simply be removed, though, and that fixes the weird syntax issue:

@eval g = $(build_function(to_calculate, x, y, expression=Val{true}))

out .= g.(xvals, yvals) #Correct result

Thanks again for your help!

Topic		Replies	Views
Can't compile large expression on CUDA.jl New to Julia gpu , cuda , broadcast	8	584	September 5, 2022
CuArrays not working as expected when broadcasting a function Machine Learning gpu	7	1439	December 1, 2019
How to vectorize any function on the GPU with CUDA.jl? GPU question , function	3	439	March 14, 2024
Best way to deal with broadcasting of intrinsics on CuArrays? General Usage	4	962	February 7, 2019
Broadcast over CuArray error GPU	3	406	February 15, 2023

Broadcasting an expression-evaluator function

Related topics