Noise with gradient works!

Tarny_GG_Channie · May 12, 2024, 9:30pm

In this video, it was explained why the gradient of noise was needed.

In most programming languages, you’re talking about either writing your own noise to add the gradient feature or diving deep into the library to make it work.

Not here!

using CoherentNoise
using ForwardDiff
using StaticArrays
sampler = opensimplex2_2d(seed=1)

function sample_gradient_forwarddiff(sampler,x,y)
    
    arr = @SVector [x,y]
    f = a-> @inbounds sample(sampler,a[1],a[2])
    return ForwardDiff.gradient(f,arr)

end

println(sample_gradient_forwarddiff(sampler,0.87,0.788))

Unfortunately, there is no luck with Enzyme or Zygote. Maybe someone can explain why.
Edit: ReverseDiff works too.

jling · May 12, 2024, 9:48pm

Are we sure this snippet is take the same gradient as needed for whatever the video is talking about? No crash != it’s doing the right thing I guess

Tarny_GG_Channie · May 12, 2024, 9:52pm

I tried forwardDiff and BackwardDiff and they got the same result. Approximate gradient using finite difference gives roughly the same result too.

jling · May 12, 2024, 10:07pm

hmm I quickly watched the video – IIUC it’s not actually taking the gradient of the sampling function itself, the gradient is taking on the “landscape”, which starts out as a random 2D matrix (value = lanscape height).

So really what’s working is that Forward and ReverseDiff packages knows to “skip” diffing into the sample(::sampler, ...) call, rather than crashing?

Tarny_GG_Channie · May 12, 2024, 10:10pm

The landscape was likely generated by sampling lots of points in this fashion. One sampled point is corresponding to each point in the matrix. In real use cases, you would need to sample a grid. This was just a proof of concept that it works.

jling · May 12, 2024, 10:15pm

right, and I’m saying the video talks about taking gradient of the landscape (i.e. height change with respect to x,y coordinate changes), no point in this process you need to take gradient “through” the sample() process

Tarny_GG_Channie · May 12, 2024, 10:17pm

The sample function takes in the noise and the coordinates and return the value of the noise (which maps to height in this case). You need to take the gradient of height WRT x and y, which is exactly what these are doing. The real deal is a bit more complicated than this, but given that you could take the gradient through the noise function, it’s probably not too hard.

wsmoses · May 12, 2024, 11:39pm

What’s the Enzyme code that didn’t work and corresponding error message?

Tarny_GG_Channie · May 12, 2024, 11:48pm

Reverse mode:

function sample_gradient_enzyme(sampler,x,y)
    arr = @SVector [x,y]
    f = a-> @inbounds sample(sampler,a[1],a[2])
    return Enzyme.gradient(Reverse,f,arr)
end

println(sample_gradient_enzyme(sampler,0.87,0.88))
end

This prints out a wrong gradient, [0.0,0.0].

Forward mode:

function sample_gradient_enzyme(sampler,x,y)
    arr = @SVector [x,y]
    f = a-> @inbounds sample(sampler,a[1],a[2])
    return Enzyme.gradient(Forward,f,arr)
end

println(sample_gradient_enzyme(sampler,0.87,0.88))

ERROR: MethodError: no method matching BatchDuplicated(::SVector{2, Float64}, ::Tuple{MVector{2, Float64}, MVector{2, Float64}})
Closest candidates are:
  BatchDuplicated(::T, ::Tuple{Vararg{T, N}}) where {T, N} at C:\Users\User\.julia\packages\EnzymeCore\5yOUk\src\EnzymeCore.jl:85

Are you an Enzyme maintainer or something along that line?

wsmoses · May 13, 2024, 2:29am

Can you post a full log and version of packages you’re using.

FWIW the reverse mode Enzyme one works for me.

julia> println(sample_gradient_enzyme(sampler,0.87,0.88))
[0.07474945410688591, 0.017113162351589075]

I can reproduce the SVector / MVector forward mode mismatch though.

Are you an Enzyme maintainer or something along that line?

I dabble from time to time.

Tarny_GG_Channie · May 13, 2024, 3:13am

I still use Julia version 1.8.4 if it matters. (I don’t really use new features yet but I still use Loopvectorization so I stuck to the version where it still worked.)
I used Enzyme version 0.10.18 (it installed this version for me).
The Reverse mode ran without any error, but produced [0.0, 0.0] gradient. The CoherentNoise is of version 1.6.6.

I updated StaticArrays to version 1.9.3 but the issue still persisted.

Tarny_GG_Channie · May 13, 2024, 4:47am

Update: After updating to 0.12.5 (and somehow also updating most of my other packages in the process), it works.

wsmoses · May 13, 2024, 5:38am

The forward mode sarray gradient error in Enzyme should be resolved by this PR: Fix static arrays on forward mode gradient call by wsmoses · Pull Request #1438 · EnzymeAD/Enzyme.jl · GitHub

Tarny_GG_Channie · May 13, 2024, 5:58am

Nice! I never thought I’d get to contribute some test case to Enzyme. I was just playing with gradient because the video said that you need the gradient of a noise and I decided to play with it a bit.

gdalle · May 13, 2024, 8:08am

The video is long so I didn’t watch but if @jling is right then your problem is deeper than a specific autodiff package. Differentiating a function with stochastic output is not what these packages are made for, and it could reasonably be classified as undefined behavior. Autodiff of stochastic computational graphs is its own research field and has its own libraries (like storchastic), but I suspect that is not what you need here, so your function might have been ill-specified?

Tarny_GG_Channie · May 13, 2024, 9:14am

This function is quite complicated. The function involved, noise, have some stochasticity in its computation. However, the function is cleverly designed to make it differentiable everywhere. The first-order gradient of the function is very well-defined

Let me elaborate further then. The noise functions typically work by having a grid. The stochastic process is simply in determining which subsection you’re in. Once you’ve determined which section of the grid you’re in, the value of the noise in the position is determined by points. The position of the point within the block you’re in is definitely differentiable WRT the global position almost everywhere inside the same block. The function is cleverly designed so that the function is differentiable WRT the position within the block, and that the influence of each corner continuously drops down to zero as the position of the point gets further away from the corner, dropping to zero as it approaches the corner where going further would mean the corner would have no way of influencing the value of the noise (because the point has moved to a different block in the grid). This ensures that the function is continuous and differentiable everywhere. I’d assume that means that if the autograd can differentiate a piecewise differentiable function, it can indeed differentiate this. Some versions of the noises have it designed so that even the gradient of the function is continuous everywhere.

Tarny_GG_Channie · May 13, 2024, 10:34pm

Oops… I was perhaps confusing stochastic with discrete. The noise process is entirely deterministic.

jling · May 13, 2024, 11:35pm

yeah I didn’t continue since I think that’s unrelated tangent. the video is NOT about GitHub - gaurav-arya/StochasticAD.jl: Research package for automatic differentiation of programs containing discrete randomness..

Help me with how to describe it because all the way I can think of to describe it involve already knowing the difference… but I was trying to make the point that the gradient propagation doesn’t have to pass the sample(), i.e. reparametrization trick ( mathematical statistics - How does the reparameterization trick for VAEs work and why is it important? - Cross Validated) works in the trivial sense that it’s never part of the gradient flow path? It’s also possible I’m just mixing things up.

Tarny_GG_Channie · May 14, 2024, 1:24am

The program output is pseudo-random WRT to seed, but is continuous and differentiable WRT the coordinate.

gdalle · May 14, 2024, 6:46am

So in a way you sample a random function, and then you fix that function and differentiate with respect to it? I feel like this might be represented better by having the noise drawn first (a discrete, grid-like object probably), and then the function constructed on top by interpolation