Loop faster than `reduce`?

Hello,

Is there a way to do better than such a loop:

for t in 2:n
     jarray = jack(t, jarray)
end

I tried jarray = reduce((a, t) -> jack(t, a), 2:n; init = jarray), but this is slower than the loop.

Please provide a MWE that includes all necessery code to run it. For example, what’s jack? What’s jarray?

5 Likes

Hello,

The jack function is complex and not short. I’m afraid nobody will try to help me if I post it “as is”. I will try with a simple one to see if the behavior is the same. See you later.

3 Likes

I think it is a little strange that reduce is slower than the loop if jack is a complex function. If most of the processing was being done by jack calls any differences between reduce and the loop would be minimal.

2 Likes

Let me first say that your operation looks to me more like a foldr than a plain reduce (unless there is some commutativity going on…)

As others said, both the for loop and the higher-order function should achieve the same kind of performance in such cases. See for example the following functions:

foo(x, y) = x=>y

function f1(n)
    res = 1
    for t in 2:n
        res = foo(t, res)
    end
    res
end

f2(n) = foldr(foo, n:-1:2, init=1)

which yield the same results:

julia> f1(5)
5 => (4 => (3 => (2 => 1)))

julia> f2(5)
5 => (4 => (3 => (2 => 1)))

and benchmark very similarly:

julia> using BenchmarkTools

julia> @benchmark f1(100)
BenchmarkTools.Trial: 
  memory estimate:  41.70 KiB
  allocs estimate:  99
  --------------
  minimum time:     15.080 ÎĽs (0.00% GC)
  median time:      16.603 ÎĽs (0.00% GC)
  mean time:        21.520 ÎĽs (5.91% GC)
  maximum time:     1.549 ms (97.01% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark f2(100)
BenchmarkTools.Trial: 
  memory estimate:  41.70 KiB
  allocs estimate:  99
  --------------
  minimum time:     16.240 ÎĽs (0.00% GC)
  median time:      17.434 ÎĽs (0.00% GC)
  mean time:        21.865 ÎĽs (6.05% GC)
  maximum time:     1.780 ms (96.66% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark f1(1000)
BenchmarkTools.Trial: 
  memory estimate:  3.90 MiB
  allocs estimate:  1488
  --------------
  minimum time:     1.195 ms (0.00% GC)
  median time:      1.297 ms (0.00% GC)
  mean time:        1.529 ms (12.72% GC)
  maximum time:     5.582 ms (57.30% GC)
  --------------
  samples:          3255
  evals/sample:     1

julia> @benchmark f2(1000)
BenchmarkTools.Trial: 
  memory estimate:  3.90 MiB
  allocs estimate:  1488
  --------------
  minimum time:     1.188 ms (0.00% GC)
  median time:      1.295 ms (0.00% GC)
  mean time:        1.554 ms (13.43% GC)
  maximum time:     5.287 ms (43.73% GC)
  --------------
  samples:          3206
  evals/sample:     1

(the same is true for even less costly operations, such as a plain floating-point sum, and should apply even more for more costly functions…)

I guess this is related to the hypergeometric function you mentioned in a previous post?

You could provide a link to the repository, for those who are interested. But the best thing you could do is to try to compress the problem into a self-contained piece of code that captures the essential part of that function.

It also looks like you are passing in an array, and getting an updated version of that array out. That looks like it might have some performance issues. For example, are you updating the array in-place, or creating a new one for each iteration? How big is the array?

2 Likes

Sorry there was only 10% difference actually. This could be normal fluctuations.

I’m pretty sure you can get a significant speedup if you are willing to share a bit more about what you’re doing. I have a hunch, from looking at

for t in 2:n
     jarray = jack(t, jarray)
end

that it’s really inefficient.