Improving GPU performance for symbolic regression

MilesCranmer · January 15, 2024, 7:56am

Alright so that big red bar in that flame graph turned out to be this call in CUDA.jl in execution.jl:

function (kernel::HostKernel)(args...; threads::CuDim=1, blocks::CuDim=1, kwargs...)
    call(kernel, map(cudaconvert, args)...; threads, blocks, kwargs...)
end

If I simply apply this change to it:

-  function (kernel::HostKernel)(args...; threads::CuDim=1, blocks::CuDim=1, kwargs...)
+  function (kernel::HostKernel)(args::Vararg{Any,M}; threads::CuDim=1, blocks::CuDim=1, kwargs...) where {M}

this makes it specialize and gets another 20% improvement!

This is because Vararg doesn’t automatically specialize Performance Tips · The Julia Language – so that map(cudaconvert, args)... call causes a bit of damage.

@maleadt are you okay if we upstream this change?

Topic		Replies	Views
Optimizing CUDA.jl performance for small array operations GPU performance , cuda	5	2476	February 8, 2021
Thousands of matrix multiplications using CuArray GPU	5	1224	July 11, 2019
Accelerate solving many matrix problems GPU cuda , linearalgebra , regression	8	2543	June 3, 2020
Accelerating computations Neural Network Performance gpu , cuda	6	770	March 22, 2021
[ANN] EvoTrees.jl: experimental GPU support for gradient boosting trees Package Announcements machine-learning	4	1070	August 20, 2020

Improving GPU performance for symbolic regression

Related topics