Autodiff with Zygote: issues with setting seeds

Hello all,

I would like to differentiate a function with fixed seed but obtain a “can’t differentiate foreign call” error when using Zygote. Any advise would be appreciated.

The following is a minimal working example and one possible, but for me somewhat limiting, work around.

using Zygote
using Random

# Random.seed!( ) does not work with Zygote. It produces a "can't differentiate foreight call" error
function simulator(x, id::Int64)
    Random.seed!(id)
    return simulator(x)
end

"This is a work around. The simulator needs to take a rng as input."
function simulator(x, rng::AbstractRNG)
    noise1 = randn(rng)
    noise2 = randn(rng)
    @show noise1
    @show noise2
    return x+noise1+noise2
end

function simulator(x)
    noise1 = randn()
    noise2 = randn()
    @show noise1
    @show noise2
    return x+noise1+noise2
end

function distance(sim, obs)
    return sum((sim-obs).^2)
end

"This will work"
function loss(x, obsdata, id::Int64)
    rng = Xoshiro(id) 
    sim = simulator(x, rng)
    return distance(sim, obsdata)
end

"This won't work"
function loss_with_issue(x, obsdata, id::Int64)
    sim = simulator(x, id)
    return distance(sim, obsdata)
end

# data
myobs = 2.0;

# to fix the seed
id = 123

# test point
xtest = 3.0

# This works
Zygote.gradient(x->loss(x, myobs, id), xtest)
2*(simulator(xtest, Xoshiro(id))-myobs)

# This throws an error: can't differentiate foreigncall expression"
Zygote.gradient(x->loss_with_issue(x, myobs, id), xtest)

Arguably simulator(x, rng::AbstractRNG) is cleaner code and may be preferred anyway, but I needed to be able to differentiate my loss also for simulators such as simulator(x) that do not work with an explicit RNG instance.

Would someone know how to make Zygote work without having to pass around a RNG instance, i.e. for the loss_with_issue case?

Many thanks!

I think you can just tell Zygote not to look inside that function, like so:

julia> Zygote.gradient(x->loss_with_issue(x, myobs, id), xtest)
noise1 = -0.6457306721039767
noise2 = -1.4632513788889214
ERROR: Can't differentiate foreigncall expression $(Expr(:foreigncall, :(:jl_get_current_task), Ref{Task}, svec(), 0, :(:ccall))).
Stacktrace:
...
  [4] setstate!
    @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Random/src/Xoshiro.jl:132 [inlined]

julia> function simulator(x, id::Int64)
           Zygote.@ignore Random.seed!(id)
           return simulator(x)
       end
simulator (generic function with 3 methods)

julia> Zygote.gradient(x->loss_with_issue(x, myobs, id), xtest)
noise1 = -0.6457306721039767
noise2 = -1.4632513788889214
(-2.2179641019857965,)

I believe that could be made permanent by a one-line PR here.

2 Likes

Tangential remark: why do you want to differentate a function that returns random values? Autodiff engines are not designed to deal with such situations by default, so you might obtain unexpected (and backend-dependent) results

Thank you very much. That indeed resolves the issue.

The motivation for this is the implementation of a statistical inference procedure that works by fixing the seed of the stochastic generative model (the simulator). Details about the method would be here.

@gdall Isn’t that basically what all stochastic gradient descent-based methods do? At least we do that all over the place in AdvancedVI

Not really. Stochastic gradient descent approximates the gradient of a deterministic function f(x) = \sum_{i \in \mathcal{I}} f_i(x) with a random subset \mathcal{S} \subset \mathcal{I} of the component’s gradients. Here, we’re talking about a function f which itself involves randomness. The right way to think about it is as a stochastic computational graph, see https://arxiv.org/abs/1506.05254.

I would say that is only one specific type of SGD, where stochasticity is discrete due to subsampling. In variational inference, for example, we deal with more general type of stochastic gradient descent, where the gradient is defined as

\nabla_{x} \; \mathbb{E}_{\epsilon} f\left(x, \epsilon\right),

where \epsilon is general (often continuous) noise.
To me, this is the same as differentiating a random function if we think as the randomness \epsilon being implicitly generated inside the function. In fact, abstractly speaking, AdvancedVI operates exactly as the snippet shown in the original post here.

You’re right, but even in the general SGD you’re differentiating an expectation, which is a deterministic function, and you’re only replacing its gradient with a stochastic approximation. While the implementation may be similar, conceptually it is very different from differentiating a function with random outputs (the best review on Monte-Carlo gradients is https://jmlr.org/papers/volume21/19-346/19-346.pdf). I’m just highlighting that users should be aware of which function they’re considering, and whether it is inherently stochastic or not.

Yes, it’s a rather different situation. In our case, by setting a random seed (or setting \epsilon to \epsilon_0), we are working on a realisation of a random process, i.e. a single instance f(x, \epsilon_0) of the random function. After changing x, we keep the seed \epsilon fixed. In mini-batch approaches to stochastic optimisation, one would take a new random sample (i.e. change \epsilon after updating x.

1 Like