# How to do gradient clipping in Julia for large for loops

Hello.

I have a function that has many for loops. For example:

``````function f(x)
res = 1.0
for i in 1:1000000
res = res * x^2
end
return res / x
end
``````

of course then

``````ForwardDiff.derivative(f, 2.0) == Inf
``````

how do I do gradient clipping? I am not sure it is called like that. What I want to do is to normalize the gradient after every iteration of the for loop so that the final answer is a direction which makes sense.

In particular my input is an array and I want to preserve the relative importance of the gradients. Say `x[1]` is very impactful so it has grarient `0.999`and `x[2]` is not very impactful and it has gradient `0.00001` and so onâ€¦

I am not sure on how to do it/how to actually make this work or MORE IMPORTANTLY if there are better things to do in this case.

PS: Maybe this is not the best example as `f(2.0) == Inf`. In practice my gradient comes out to be `Nan`, which is less informative that `Inf`.

Hi,
Can you tell us a little more about the context, why you need AD, why your function might diverge, etc?

Do you have an example of a function that is finite and differentiable but the gradient still comes out to be `NaN`?

If your function is theoretically finite but is overflowing to `Inf` due to the finite floating-point precision, maybe you should instead compute the logarithm or similar. For example,

``````function logf(x)
logres = 0.0
for i in 1:1000000
logres += 2 * log(x)
end
return logres - log(x)
end
``````

computes the logarithm of your function `f(x)` above, but without overflowing, and both `logf` and its derivative work fine:

``````julia> logf(2.0)
1.3862936679852684e6

julia> ForwardDiff.derivative(logf, 2.0)
999999.5
``````
2 Likes

Hi. Sorry for the radio silence I was sick.

The NaNs were part of an unrelated bug. Nevertheless to answer @gdalleâ€™s question.

I just have a very â€ślongâ€ť function (ie. many loops) that I want to take a derivative of. Its quite similar to applying a RNN multiple times. Of course then the gradients explode or go to zero.

I read that in that case gradient clipping helps. I wanted to implement the same in Forwardiff(and/or Zygote) but could not find any resource.

To make it clearer. What I want is after every iteration of the for loop to renormalize the gradient so that it will never blow up.

I tought this could be a nice post to have for the community.