I would like to call the diff function in base a few times in a row to compute n-th differenced data. Of course, I could import Base.diff and write down an extended version of the function with an ad-hoc for-loop to compute the desired results. However, I was wondering if there’s a simpler way to do it (perhaps through a one liner / maybe through function composition). What would you suggest me to try?

I have edited your solution to be compatible with the Base.diff keyword arguments and compared it with one similar solution in the link included in the OP

using BenchmarkTools, Random;
import Base.∘;
∘(f, g) = (x...; kw...)->f(g(x...; kw...); kw...);
ndiff1(n) = reduce(∘, (diff for _=1:n));
ndiff2(n) = reduce(∘, ntuple(_ -> diff, n));
Random.seed!(1)
A = randn(50, 200);

EDIT: Ah, I missed the part in your original post where you said you didn’t want to write a manual loop. Oh well. I’m going to leave this up anyway because why not.

Sort of a fun exercise. Maybe something very manual could be faster with enough optimizations? Here’s a little code that just manually computes the binomial coefficients and just do the linear convolution manually. It is not faster than the composed diff as it is written, but the binomial coefficient function is like half the runtime on my computer, so if you forced that into compile time or just wrote something slightly smarter it would then win out by a meaningful margin I bet.

# See text above, this function is slow and if you fixed this up you'd
# probably see a good speedup.
function binomial_coefs_altsign(n)
if iseven(n)
return ntuple(j->iseven(j-1) ? binomial(n,j-1) : -binomial(n,j-1), n+1)
else
return ntuple(j->iseven(j-1) ? -binomial(n,j-1) : binomial(n,j-1), n+1)
end
end
function ndiff(x, n)
conv_coefs = binomial_coefs_altsign(n)
out = Vector{eltype(x)}(undef, length(x)-n)
@inbounds for j in eachindex(out)
tmp = zero(eltype(x))
# Again, I'm sure this isn't the best way to do this. But you see the point.
@simd for k in 1:length(conv_coefs)
tmp += conv_coefs[k]*x[j+k-1]
end
out[j] = tmp
end
out
end

I bet with LoopVectorization.jl something like this could fly, although I see that you don’t want to add any more dependencies.

it is not clear to me what kind of use can be done with the result produced by these functions…

comparing on a small matrix a function with reduce () and one with nth (iterated (diff)) the latter seems to be more efficient (at least than my reduce ())

I was unable to reproduce any useful results with the function in your form.
Instead I tried to change it in the following way.
But in comparison with “my” reduce() version it seems 10X less performing, for A = randn(5, 30);