Diff! (in-place or preallocated output version of diff)

The Julia manual nicely makes the case for pre-allocating outputs:
Yet the function diff that is used extensively in scientific computing seems not to have a corresponding diff!. I searched and could not find one so I wrote one myself (with some extra features like periodic boundary conditions):

Would some version of diff! be appropriate for the standard library, or is it better left to external packages? (And did I reinvent the wheel?)
You might find this package of interest


Thanks so much for the pointer to that package!
I updated my version to use @inbounds and then added a comparison and found that StaticKernels and my “diff specific” code had essentially identical compute times. That is a testament to the efficiency of the more general approach in StaticKernels

My advice to anyone who comes here looking for diff! is to use StaticKernels but be sure to use map! not map.
Here’s an equivalent of diff!(out, in, dims=2) for a 2D input in:

@inline kf(w) = @inbounds w[0,0] - w[0,-1] # for dims=2
k = Kernel{(0:0,-1:0)}(kf)
map!(k, out, extend(in, StaticKernels.ExtensionReplicate()))