How to do gradient clipping in Julia for large for loops

Devetak · January 5, 2024, 6:26pm

Hello.

I have a function that has many for loops. For example:

function f(x)
           res = 1.0
           for i in 1:1000000
               res = res * x^2
           end
           return res / x
           end

of course then

ForwardDiff.derivative(f, 2.0) == Inf

how do I do gradient clipping? I am not sure it is called like that. What I want to do is to normalize the gradient after every iteration of the for loop so that the final answer is a direction which makes sense.

In particular my input is an array and I want to preserve the relative importance of the gradients. Say x[1] is very impactful so it has grarient 0.999and x[2] is not very impactful and it has gradient 0.00001 and so on…

I am not sure on how to do it/how to actually make this work or MORE IMPORTANTLY if there are better things to do in this case.

PS: Maybe this is not the best example as f(2.0) == Inf. In practice my gradient comes out to be Nan, which is less informative that Inf.

gdalle · January 5, 2024, 8:08pm

Hi,
Can you tell us a little more about the context, why you need AD, why your function might diverge, etc?

stevengj · January 6, 2024, 12:40am

Do you have an example of a function that is finite and differentiable but the gradient still comes out to be NaN?

If your function is theoretically finite but is overflowing to Inf due to the finite floating-point precision, maybe you should instead compute the logarithm or similar. For example,

function logf(x)
    logres = 0.0
    for i in 1:1000000
        logres += 2 * log(x)
    end
    return logres - log(x)
end

computes the logarithm of your function f(x) above, but without overflowing, and both logf and its derivative work fine:

julia> logf(2.0)
1.3862936679852684e6

julia> ForwardDiff.derivative(logf, 2.0)
999999.5

Devetak · January 18, 2024, 10:10am

Hi. Sorry for the radio silence I was sick.

The NaNs were part of an unrelated bug. Nevertheless to answer @gdalle’s question.

I just have a very “long” function (ie. many loops) that I want to take a derivative of. Its quite similar to applying a RNN multiple times. Of course then the gradients explode or go to zero.

I read that in that case gradient clipping helps. I wanted to implement the same in Forwardiff(and/or Zygote) but could not find any resource.

To make it clearer. What I want is after every iteration of the for loop to renormalize the gradient so that it will never blow up.

I tought this could be a nice post to have for the community.

Topic		Replies	Views
Clipping gradients with Zygote/Flux Machine Learning	2	1215	June 10, 2019
Issues with computing gradient with ForwardDiff.jl (Any fixes other than ND?) Machine Learning forwarddiff , neural-network	2	213	December 24, 2023
Need help using ForwardDiff gradient with numerical loop General Usage question	11	178	November 7, 2024
ForwardDiff.jl returns NaNs for calculating gradient of simulation model Probabilistic Programming forwarddiff , bayesian-inference , ad	2	327	February 8, 2024
Best way to prevent AD to differentiate through useless or zero gradient function Machine Learning question , differentiation , autodiff	2	175	March 13, 2025

How to do gradient clipping in Julia for large for loops

Related topics