Auto-diff Friendly GPU Stencils

Handwritten CUDA.jl kernels will recompile with different number types so that’s fine. Though stencils are linear operators so directly defining their derivatives isn’t hard: I’d just define the derivative overload for NNLib.jl’s conv and use it.