Clipping gradients with Zygote/Flux

mlpeschl · June 7, 2019, 4:45pm

I am currently working with Flux and due to a high amount of stochasticity in my data, I often receive
“Loss is NaN” and “Loss is infinite” errors during training. Reducing the stepsize of my gradient updates of course helps, but slows down training unnecessarily.

I would like to avoid this issue by clipping gradients, but I have not really found a way of doing it. Basically, what I want to do is to clip the gradient by its L^2 norm, so that if the gradient has a norm greater than 1 (or some other constant), I would want to divide it by its norm.

After searching around for a bit, I found the hook function in the Zygote package, which theoretically should be able to do this.
Here is the gradient descent function I currently have:


function pupdate!(S, A, δ, model, α, γ, t)
    function loss(x) log(model(x)[A]) end
    local ps = Flux.params(model)
    local gs = Zygote.gradient(() -> loss(S), ps)
    #@info "neural network before: $(model(S)[A])"
    for p in ps
        Flux.Tracker.update!(p,  α * (γ^t)*δ.* gs[p])
    end
    #@info "neural network after: $(model(S)[A])"
end

I thought that switching out the gradient line by:

Zygote.gradient(() -> Zygote.hook(Zygote.hook(clipper,loss(S)),ps)

where

 function clipper(x)
     if norm(x) > 1
         return x./norm(x)
     else 
         return x 
     end
 end

should do the trick, but unfortunately this does not work.
Any help would be appreciated!

tanhevg · June 10, 2019, 11:20am

Is BatchNorm something that you are looking for?

mlpeschl · June 10, 2019, 8:16pm

Correct me, if I am wrong but I’m not sure if BatchNormalization helps me here. I need to pass single arrays into my networks frequently, so a BatchNorm layer would just return 0 for single inputs.

Topic		Replies	Views
How to use hook to clip a gradient? General Usage zygote	4	1038	April 20, 2020
How to add norm of gradient to a loss function? Machine Learning flux , zygote	16	2187	November 17, 2021
Compute gradients in neuralODE with Zygote Machine Learning	3	254	August 24, 2023
Flux/Zygote: Gradient with respect to inputs and implicit parameters (in 2021) Machine Learning question , flux , zygote	1	972	November 23, 2021
How to do L2 regularization with new Flux and Zygote Machine Learning flux	2	2136	December 29, 2019

Clipping gradients with Zygote/Flux

Related topics