Efficient way to chain a series of `diff` calls

fipelle · July 15, 2022, 3:15pm

Hi,

I would like to call the diff function in base a few times in a row to compute n-th differenced data. Of course, I could import Base.diff and write down an extended version of the function with an ad-hoc for-loop to compute the desired results. However, I was wondering if there’s a simpler way to do it (perhaps through a one liner / maybe through function composition). What would you suggest me to try?

EDIT:

I have found this Repeat a function N times in Julia (composition) – Tech Help Notes but I am not sure whether there are more efficient / faster solutions.

rocco_sprmnt21 · July 15, 2022, 3:22pm

try something like this and see if it works

using IterTools
nth(iterated(diff, v),n)

fipelle · July 15, 2022, 3:25pm

Thanks, I would prefer a solution without external dependencies (if possible) since I would like to integrate it in a package.

jling · July 15, 2022, 3:26pm

just pick up it as dependency? are you trying to write a pkg with no dependency?

fipelle · July 15, 2022, 3:27pm

No, of course not. However, I prefer to minimise the dependancies whenever possible to avoid future conflicts.

jling · July 15, 2022, 3:29pm

julia> ndiff(n) = reduce(∘, (diff for _=1:n))
ndiff (generic function with 1 method)

julia> ndiff(10)
diff ∘ diff ∘ diff ∘ diff ∘ diff ∘ diff ∘ diff ∘ diff ∘ diff ∘ diff

julia> @code_typed ndiff(3)(rand(10))
CodeInfo(
1 ─ %1 = Core.getfield(x, 1)::Vector{Float64}
│   %2 = invoke Base.:(var"#_#95")($(QuoteNode(Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}()))::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, diff ∘ diff ∘ diff::ComposedFunction{ComposedFunction{typeof(diff), typeof(diff)}, typeof(diff)}, %1::Vector{Float64})::Vector{Float64}
└──      return %2
) => Vector{Float64}

fipelle · July 15, 2022, 3:40pm

I have edited your solution to be compatible with the Base.diff keyword arguments and compared it with one similar solution in the link included in the OP

using BenchmarkTools, Random;
import Base.∘;
∘(f, g) = (x...; kw...)->f(g(x...; kw...); kw...);
ndiff1(n) = reduce(∘, (diff for _=1:n));
ndiff2(n) = reduce(∘, ntuple(_ -> diff, n));

Random.seed!(1)
A = randn(50, 200);

via @btime

julia> @btime ndiff1(10)(A, dims=2);
  48.344 μs (28 allocations: 761.03 KiB)

julia> @btime ndiff2(10)(A, dims=2);
  46.929 μs (28 allocations: 761.03 KiB)

and they look pretty similar.

cgeoga · July 15, 2022, 4:10pm

EDIT: Ah, I missed the part in your original post where you said you didn’t want to write a manual loop. Oh well. I’m going to leave this up anyway because why not.

Sort of a fun exercise. Maybe something very manual could be faster with enough optimizations? Here’s a little code that just manually computes the binomial coefficients and just do the linear convolution manually. It is not faster than the composed diff as it is written, but the binomial coefficient function is like half the runtime on my computer, so if you forced that into compile time or just wrote something slightly smarter it would then win out by a meaningful margin I bet.

# See text above, this function is slow and if you fixed this up you'd 
# probably see a good speedup.
function binomial_coefs_altsign(n)
  if iseven(n)
    return ntuple(j->iseven(j-1) ? binomial(n,j-1) : -binomial(n,j-1), n+1)
  else
    return ntuple(j->iseven(j-1) ? -binomial(n,j-1) : binomial(n,j-1), n+1)
  end
end

function ndiff(x, n)
  conv_coefs = binomial_coefs_altsign(n)
  out = Vector{eltype(x)}(undef, length(x)-n)
  @inbounds for j in eachindex(out)
    tmp = zero(eltype(x))
    # Again, I'm sure this isn't the best way to do this. But you see the point.
    @simd for k in 1:length(conv_coefs)
      tmp += conv_coefs[k]*x[j+k-1]
    end
    out[j] = tmp
  end
  out 
end

I bet with LoopVectorization.jl something like this could fly, although I see that you don’t want to add any more dependencies.

rocco_sprmnt21 · July 15, 2022, 4:41pm

it is not clear to me what kind of use can be done with the result produced by these functions…

comparing on a small matrix a function with reduce () and one with nth (iterated (diff)) the latter seems to be more efficient (at least than my reduce ())

diff2(v)=diff(v,dims=2)
ndiff3(v,n)=nth(iterated(diff2,v),n)

Random.seed!(1)
A = randn(5, 3);
julia> @btime reduce((s,c)->diff(s, dims=2) ,1:2,init=A)
  297.200 ns (6 allocations: 336 bytes)
5×1 Matrix{Float64}:
 -0.31968564074647304
  1.6818135824352962
  1.2511855101108242
  1.2215110676216048
  1.3895094561306975

julia> @btime ndiff3(A, 3)
  93.165 ns (2 allocations: 240 bytes)5×1 Matrix{Float64}:
 -0.31968564074647304
  1.6818135824352962
  1.2511855101108242
  1.2215110676216048
  1.3895094561306975

julia> A = randn(5, 30);

julia> @btime ndiff3(A, 30)
  2.962 μs (29 allocations: 19.25 KiB)
5×1 Matrix{Float64}:
 -3.418403825078213e8
  3.636354157326177e7
 -3.2392377884552043e7
  2.8511979487494624e8
 -2.8843616239420044e8

julia> @btime reduce((s,c)->diff(s, dims=2) ,1:29,init=A)
  3.200 μs (33 allocations: 19.34 KiB)
5×1 Matrix{Float64}:
 -3.418403825078213e8
  3.636354157326177e7
 -3.2392377884552043e7
  2.8511979487494624e8
 -2.8843616239420044e8

julia> A = randn(5, 300);

julia> @btime ndiff3(A, 300)
  256.600 μs (299 allocations: 1.75 MiB)
5×1 Matrix{Float64}:
  7.268371206834352e88
  2.8213726309711136e88
  3.4632886383215444e89
 -6.612909793775844e88
  2.1018201135043341e89

julia> @btime reduce((s,c)->diff(s, dims=2) ,1:299,init=A)
  259.000 μs (303 allocations: 1.75 MiB)
5×1 Matrix{Float64}:
  7.268371206834352e88
  2.8213726309711136e88
  3.4632886383215444e89
 -6.612909793775844e88
  2.1018201135043341e89

for values greater than 1000 columns these functions diverge at Inf and then end in NaN in the next step

fipelle · July 16, 2022, 1:43am

My bad, I posted a bugged version of these functions - I have amended the previous post.

The below refers to the up-to-date functions:

Your ndiff3 shows marginal improvements under the same tests:

julia> @btime ndiff3(A, 10);
  44.090 μs (18 allocations: 686.45 KiB)

However, as mentioned, I would prefer not to add further dependencies, unless strictly necessary.

I have also tried a different option reported in that link (appropriately edited to support keyword arguments):

ndiff4(n) = ∘(ntuple(_ -> diff, n)...);

This was supposed to be faster for small n, but the results are still in the same order of magnitude:

julia> @btime ndiff4(10)(A, dims=2);
  47.632 μs (28 allocations: 761.03 KiB)

rocco_sprmnt21 · July 16, 2022, 5:02am

I was unable to reproduce any useful results with the function in your form.
Instead I tried to change it in the following way.
But in comparison with “my” reduce() version it seems 10X less performing, for A = randn(5, 30);

julia> diff2(v)=diff(v,dims=2)
diff2 (generic function with 1 method)

julia> ndiff4(n) = ∘(ntuple(_ -> diff2, n)...)
ndiff4 (generic function with 1 method)

julia> @btime ndiff4(29)(A)
  39.800 μs (30 allocations: 19.30 KiB)
5×1 Matrix{Float64}:
  2.7278188522516155e8
  3.2284838861503527e6
 -1.794536175117959e7
 -9.667977844172284e7
 -3.432410437907561e8

julia> @btime reduce((s,c)->diff(s, dims=2) ,1:29,init=A)
  3.150 μs (33 allocations: 19.34 KiB)
5×1 Matrix{Float64}:
  2.7278188522516155e8
  3.2284838861503527e6
 -1.794536175117959e7
 -9.667977844172284e7
 -3.432410437907561e8

rafael.guerra · July 16, 2022, 12:46pm

FWIW, the reduce() function seems to be equivalent here to foldl():

foldl((y,_) -> diff(y; dims=2), 1:29; init=A)

Topic		Replies	Views
Is there diff() method for other operations? General Usage question	6	6075	February 7, 2019
Diff! (in-place or preallocated output version of diff) Performance question	2	752	June 24, 2020
Numpy Diff in Julia DataFrames? New to Julia question	3	1139	April 4, 2022
Creating AD-friendly functions General Usage differentiation , autodiff	2	453	June 28, 2023
ANN: XDiff.jl - an expression differentiation package Machine Learning	25	4122	August 22, 2017

Efficient way to chain a series of `diff` calls

Related topics