How to avoid "unsupported dynamic function invocation" in CUDA with nested gradients

andrewrosemberg · April 11, 2025, 3:18pm

I am trying to train a model in Flux in which the loss has a nested gradient. I know I should avoid dynamic function invocation, but I am unsure how. The following code works on CPU but not on CUDA/GPU:

using Flux
using Zygote

device = cpu # or gpu
λ = Float32(0.1)

X = Float32.(rand(30, 20)) |> device
y = Float32.(rand(10, 20)) |> device
dx = Float32.(rand(30, 20)) |> device
dy = Float32.(ones(10, 20)) |> device

model = Chain(
    Dense(30, 10, relu),
    Dense(10, 10, relu),
    Dense(10, 10)
) |> device

# 1) If dy = ∂sum(ŷ)/∂ŷ
loss, grad = Zygote.withgradient(model) do model
    ret = Zygote.withgradient(X) do X
        ŷ  = model(X)
        return sum(ŷ)
    end
    return Flux.mse(model(X), y) + λ * Flux.mse(dx, ret.grad[1])
end

# 2) General dy
loss_general, grad_general = Zygote.withgradient(model) do model
    ŷ, pb = Zygote.pullback(X) do X
        model(X)
    end
    return Flux.mse(ŷ, y) + λ * Flux.mse(dx, pb(dy)[1])
end

It would be great to get 1 and 2 to work on CUDA, but I am happy with just one.

gdalle · April 11, 2025, 5:57pm

Maybe Lux.jl is a bit better at handling nested autodiff naturally?

andrewrosemberg · April 15, 2025, 2:58pm

Thank you. I followed the instructions there, and it worked!

Topic		Replies	Views
Flux, CUDA, Zygote : InvalidIRError: compiling kernel getindex_kernel(CUDA.CuKernelContext, CuDeviceArray New to Julia cuda , flux , zygote	4	1007	December 30, 2020
Julia Flux error- Unsupported dynamic function invocation (call to pointerref) New to Julia gpu , flux	0	612	January 10, 2021
CUDA.jl unsupported call to an unknown function, unsupported dynamic function invocation GPU	3	1229	July 11, 2022
CUDA: unsupported dynamic function invocation for closure GPU	3	341	June 23, 2024
GPU gradient calculation with Zygote failling with an RTX 4060 General Usage question	1	183	December 17, 2023

How to avoid "unsupported dynamic function invocation" in CUDA with nested gradients

Related topics