More upgrade, more slow down?

Here is small benchmark code:

using BenchmarkTools

powersign(i) = ifelse(i % 2 == 0, 1, -1)

function leibniz(n)
    s = 0
    for i = 0:n
        s += powersign(i) / (2i + 1)
    end
    4s
end

n = 10^5
@benchmark leibniz(n)

and the result for multiple versions of Julia:

v1.0.2 v1.1.0 v1.4.2 v1.5.0
min. 173.301 186.601 199.325 204.133
median 179.043 186.773 199.453 204.204
mean 181.39 190.515 203.224 207.635
max. 369.603 536.942 464.297 424.371

(all units are μs)

Why later versions are slower?

On my machine:

v1.0.5 v1.1.1 v1.2.0 v1.3.1 v1.4.2 v1.5.0
minimum 133.699 133.699 135.899 138.000 151.299 155.300
median 138.100 138.100 138.100 138.100 151.499 155.599
mean 140.788 145.842 141.484 150.637 154.460 159.071
maximum 736.101 674.799 659.399 639.000 715.301 767.301

And before someone mentions it, these timings are with interpolated benchmark argument:
@benchmark leibniz($n)

1 Like

On mine, 1.5 looks a little slower, by roughly the same % as @greg_plowman :

v1.3.0

  minimum time:     117.799 μs (0.00% GC)
  median time:      120.200 μs (0.00% GC)
  mean time:        123.794 μs (0.00% GC)
  maximum time:     612.300 μs (0.00% GC)

v1.5.0-beta1

  minimum time:     134.399 μs (0.00% GC)
  median time:      135.399 μs (0.00% GC)
  mean time:        140.439 μs (0.00% GC)
  maximum time:     663.200 μs (0.00% GC)

It seems to depend on CPU or OS.

Intel Core i5-5250U / Ubuntu 18.04.5 v1.0.5 v1.1.1 v1.2.0 v1.3.1 v1.4.2 v1.5.0
minimum 178.450 185.680 173.397 173.299 199.317 204.126
median 179.032 186.778 179.037 179.031 199.456 204.186
mean 182.881 189.929 183.635 183.448 203.279 208.294
maximum 354.239 404.800 370.055 368.178 391.177 401.432
ratio of median to v1.0.5 1.000 1.043 1.000 1.000 1.114 1.141
ratio of mean to v1.0.5 1.000 1.039 1.004 1.003 1.112 1.139
Intel Core i7-7660U / macOS 10.15.6 v1.0.5 v1.1.1 v1.2.0 v1.3.1 v.1.4.2 v1.5.0
minimum 117.882 126.724 125.325 125.344 150.376 136.314
median 125.338 127.701 125.335 125.356 150.405 137.267
mean 127.036 130.031 127.712 127.969 153.818 139.832
maximum 903.391 343.934 402.069 436.871 451.494 349.415
ratio of median to v1.0.5 1.000 1.019 1.000 1.000 1.200 1.095
ratio of mean to v1.0.5 1.000 1.024 1.005 1.007 1.211 1.101
using BenchmarkTools

function powersign(i)
	if i % 2 == 0
		1
	else
		-1
	end
end

function leibniz(n)
	s = 0
	for i = 0:n
		s += powersign(i) / (2i + 1)
	end
	4s
end

n = 10^5
result = @benchmark leibniz($n)
show(stdout, MIME"text/plain"(), result)
1 Like

Can you try changing the initialisation of s from s = 0 to s = 0.0? Note that the first case is type unstable, as s gets converted to Float64 inside the loop.

If I make that change, runtimes drop by about 30% using Julia 1.5 on my machine. For instance, the median drops from 128 μs to 91 μs. I’m curious to see if older Julia versions are still faster in that case.

2 Likes

On my 7-year old computer there is virtually no difference. I also find no performance difference between s=0 and s0.0 or leibniz(n) and leibniz($n)

Intel Corei7-4770 / MX Linux 19.1 v1.0.5 v1.5.0
minimum 359.830 359.832
median 359.838 359.839
mean 366.613 368.265
maximum 684.231 668.273
ratio of median to v1.0.5 1.000 1.0000002
ratio of mean to v1.0.5 1.000 1.0045

What about v1.6.0?

Here’s a more compact way to write this function:

h(i) = (i%2==0 ? 1 : -1)/(2i+1)
leibniz2(n) = 4sum(h,0:n)

On my laptop with Julia 1.5.0, this is marginally faster (by about 1%) than the original function. Not sure why.

leibniz2(10^5) gives a result that differs in the 15th digit from leibniz(10^5). Not sure which of the two is more accurate. This has probably to do with the non-associativity of float addition, as has been discussed many times here on Discourse.

I think it would not be fair to compare to 1.6.0 yet, as it is not out.

Virtually no difference for me as well:

Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 9 3950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver1)

v1.0.5:

BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     108.900 μs (0.00% GC)
  median time:      112.001 μs (0.00% GC)
  mean time:        113.777 μs (0.00% GC)
  maximum time:     155.100 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

v1.5.0:

BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     110.099 μs (0.00% GC)
  median time:      112.000 μs (0.00% GC)
  mean time:        113.242 μs (0.00% GC)
  maximum time:     154.400 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Are these performance differences extra pronounced for this particular function? Otherwise it seems really problematic…