I get the error “llvmcall requires the compiler” when trying to take the gradient of a function that involves generating random numbers in CUDA. Here is a minimal example:
using GPUArraysCore: GPUArraysCore
using CUDA, Flux
using LinearAlgebra
using Zygote
function f3(v::AbstractVector{T}) where {T}
randn(T, 4,4) * v[1:4]
end
function f3(v::GPUArraysCore.AbstractGPUVector{T}) where {T}
CUDA.randn(T, 4,4) * v[1:4]
end
v_orig = collect(1.0:10.0)
Zygote.gradient(v -> sum(f3(v)), v_orig) # works
v = v_orig |> gpu
m = f3(v)
Zygote.gradient(v -> sum(f3(v)), v) # fails
Zygote.gradient(v -> sum(cpu(f3(v))), v) # fails
I suspect, I did not sufficiently understand the CUDA/Zygote workings yet. Could someone, please, explain to me why this fails, what I need to do, and point me to the resources to understand better?
Background for generating random numbers: I want to use a Monte-Carlo approximation of an expectation inside a cost function of a stochastic gradient descent.
Yes, thanks, passing in another pre-allocated CuArray of randon numbers works, that I have tried before.
However, it is awkward and probably not very efficient to pre-allocate a lot of random-data and pass it with a DataLoader of a machine-learning optimization to the cost function. There are much fewer observations, covariates, and parameters compared the number of randoms that I need. Maybe, I need to implement a special DataLoader that generates the random-numbers when asked for the next batch.