How to do gradient clipping in Julia for large for loops

Hello.

I have a function that has many for loops. For example:

function f(x)
           res = 1.0
           for i in 1:1000000
               res = res * x^2
           end
           return res / x
           end

of course then

ForwardDiff.derivative(f, 2.0) == Inf

how do I do gradient clipping? I am not sure it is called like that. What I want to do is to normalize the gradient after every iteration of the for loop so that the final answer is a direction which makes sense.

In particular my input is an array and I want to preserve the relative importance of the gradients. Say x[1] is very impactful so it has grarient 0.999and x[2] is not very impactful and it has gradient 0.00001 and so on…

I am not sure on how to do it/how to actually make this work or MORE IMPORTANTLY if there are better things to do in this case.

PS: Maybe this is not the best example as f(2.0) == Inf. In practice my gradient comes out to be Nan, which is less informative that Inf.

Hi,
Can you tell us a little more about the context, why you need AD, why your function might diverge, etc?

Do you have an example of a function that is finite and differentiable but the gradient still comes out to be NaN?

If your function is theoretically finite but is overflowing to Inf due to the finite floating-point precision, maybe you should instead compute the logarithm or similar. For example,

function logf(x)
    logres = 0.0
    for i in 1:1000000
        logres += 2 * log(x)
    end
    return logres - log(x)
end

computes the logarithm of your function f(x) above, but without overflowing, and both logf and its derivative work fine:

julia> logf(2.0)
1.3862936679852684e6

julia> ForwardDiff.derivative(logf, 2.0)
999999.5
2 Likes

Hi. Sorry for the radio silence I was sick.

The NaNs were part of an unrelated bug. Nevertheless to answer @gdalle’s question.

I just have a very “long” function (ie. many loops) that I want to take a derivative of. Its quite similar to applying a RNN multiple times. Of course then the gradients explode or go to zero.

I read that in that case gradient clipping helps. I wanted to implement the same in Forwardiff(and/or Zygote) but could not find any resource.

To make it clearer. What I want is after every iteration of the for loop to renormalize the gradient so that it will never blow up.

I tought this could be a nice post to have for the community.