In this video, it was explained why the gradient of noise was needed.
In most programming languages, you’re talking about either writing your own noise to add the gradient feature or diving deep into the library to make it work.
Not here!
using CoherentNoise
using ForwardDiff
using StaticArrays
sampler = opensimplex2_2d(seed=1)
function sample_gradient_forwarddiff(sampler,x,y)
arr = @SVector [x,y]
f = a-> @inbounds sample(sampler,a[1],a[2])
return ForwardDiff.gradient(f,arr)
end
println(sample_gradient_forwarddiff(sampler,0.87,0.788))
Unfortunately, there is no luck with Enzyme or Zygote. Maybe someone can explain why.
Edit: ReverseDiff works too.
hmm I quickly watched the video – IIUC it’s not actually taking the gradient of the sampling function itself, the gradient is taking on the “landscape”, which starts out as a random 2D matrix (value = lanscape height).
So really what’s working is that Forward and ReverseDiff packages knows to “skip” diffing into the sample(::sampler, ...) call, rather than crashing?
The landscape was likely generated by sampling lots of points in this fashion. One sampled point is corresponding to each point in the matrix. In real use cases, you would need to sample a grid. This was just a proof of concept that it works.
right, and I’m saying the video talks about taking gradient of the landscape (i.e. height change with respect to x,y coordinate changes), no point in this process you need to take gradient “through” the sample() process
The sample function takes in the noise and the coordinates and return the value of the noise (which maps to height in this case). You need to take the gradient of height WRT x and y, which is exactly what these are doing. The real deal is a bit more complicated than this, but given that you could take the gradient through the noise function, it’s probably not too hard.
function sample_gradient_enzyme(sampler,x,y)
arr = @SVector [x,y]
f = a-> @inbounds sample(sampler,a[1],a[2])
return Enzyme.gradient(Reverse,f,arr)
end
println(sample_gradient_enzyme(sampler,0.87,0.88))
end
This prints out a wrong gradient, [0.0,0.0].
Forward mode:
function sample_gradient_enzyme(sampler,x,y)
arr = @SVector [x,y]
f = a-> @inbounds sample(sampler,a[1],a[2])
return Enzyme.gradient(Forward,f,arr)
end
println(sample_gradient_enzyme(sampler,0.87,0.88))
ERROR: MethodError: no method matching BatchDuplicated(::SVector{2, Float64}, ::Tuple{MVector{2, Float64}, MVector{2, Float64}})
Closest candidates are:
BatchDuplicated(::T, ::Tuple{Vararg{T, N}}) where {T, N} at C:\Users\User\.julia\packages\EnzymeCore\5yOUk\src\EnzymeCore.jl:85
Are you an Enzyme maintainer or something along that line?
I still use Julia version 1.8.4 if it matters. (I don’t really use new features yet but I still use Loopvectorization so I stuck to the version where it still worked.)
I used Enzyme version 0.10.18 (it installed this version for me).
The Reverse mode ran without any error, but produced [0.0, 0.0] gradient. The CoherentNoise is of version 1.6.6.
I updated StaticArrays to version 1.9.3 but the issue still persisted.
Nice! I never thought I’d get to contribute some test case to Enzyme. I was just playing with gradient because the video said that you need the gradient of a noise and I decided to play with it a bit.
The video is long so I didn’t watch but if @jling is right then your problem is deeper than a specific autodiff package. Differentiating a function with stochastic output is not what these packages are made for, and it could reasonably be classified as undefined behavior. Autodiff of stochastic computational graphs is its own research field and has its own libraries (like storchastic), but I suspect that is not what you need here, so your function might have been ill-specified?
This function is quite complicated. The function involved, noise, have some stochasticity in its computation. However, the function is cleverly designed to make it differentiable everywhere. The first-order gradient of the function is very well-defined
Let me elaborate further then. The noise functions typically work by having a grid. The stochastic process is simply in determining which subsection you’re in. Once you’ve determined which section of the grid you’re in, the value of the noise in the position is determined by points. The position of the point within the block you’re in is definitely differentiable WRT the global position almost everywhere inside the same block. The function is cleverly designed so that the function is differentiable WRT the position within the block, and that the influence of each corner continuously drops down to zero as the position of the point gets further away from the corner, dropping to zero as it approaches the corner where going further would mean the corner would have no way of influencing the value of the noise (because the point has moved to a different block in the grid). This ensures that the function is continuous and differentiable everywhere. I’d assume that means that if the autograd can differentiate a piecewise differentiable function, it can indeed differentiate this. Some versions of the noises have it designed so that even the gradient of the function is continuous everywhere.
Help me with how to describe it because all the way I can think of to describe it involve already knowing the difference… but I was trying to make the point that the gradient propagation doesn’t have to pass the sample(), i.e. reparametrization trick ( mathematical statistics - How does the reparameterization trick for VAEs work and why is it important? - Cross Validated) works in the trivial sense that it’s never part of the gradient flow path? It’s also possible I’m just mixing things up.
So in a way you sample a random function, and then you fix that function and differentiate with respect to it? I feel like this might be represented better by having the noise drawn first (a discrete, grid-like object probably), and then the function constructed on top by interpolation