Help using CUDA, Zygote, and random numbers

I get the error “llvmcall requires the compiler” when trying to take the gradient of a function that involves generating random numbers in CUDA. Here is a minimal example:

using GPUArraysCore: GPUArraysCore
using CUDA, Flux
using LinearAlgebra
using Zygote

function f3(v::AbstractVector{T}) where {T}
    randn(T, 4,4) * v[1:4]
end
function f3(v::GPUArraysCore.AbstractGPUVector{T}) where {T}
    CUDA.randn(T, 4,4) * v[1:4]
end
v_orig = collect(1.0:10.0)
Zygote.gradient(v -> sum(f3(v)), v_orig) # works

v = v_orig |> gpu
m = f3(v)
Zygote.gradient(v -> sum(f3(v)), v) # fails
Zygote.gradient(v -> sum(cpu(f3(v))), v) # fails

I suspect, I did not sufficiently understand the CUDA/Zygote workings yet. Could someone, please, explain to me why this fails, what I need to do, and point me to the resources to understand better?
Background for generating random numbers: I want to use a Monte-Carlo approximation of an expectation inside a cost function of a stochastic gradient descent.

Not sure why this fails but can you maybe generate the random numbers outside of your differentiated function and pass them as arguments?

Yes, thanks, passing in another pre-allocated CuArray of randon numbers works, that I have tried before.

However, it is awkward and probably not very efficient to pre-allocate a lot of random-data and pass it with a DataLoader of a machine-learning optimization to the cost function. There are much fewer observations, covariates, and parameters compared the number of randoms that I need. Maybe, I need to implement a special DataLoader that generates the random-numbers when asked for the next batch.

What’s probably happening is that Zygote is trying to differentiate the code inside CUDA.randn, which ultimately calls non-Julia code via llvmcall.

The reason that it does not try to do this with randn is that there’s a rule instructing it not to look, here.

You can define such a rule for CUDA.randn in your code, or make a PR adding it for everyone here.

In general, you can also tell Zygote to ignore some bit of code by doing this (or the ChainRulesCore equivalent):

r = Zygote.@ignore CUDA.randn(T, 4,4)
r * v[1:4]
1 Like