Benchmark showing 2x slower result between nearly identical functions on Julia 0.6

jmert · January 27, 2017, 6:17pm

I’ve been doing some performance testing of variations of a math kernel operation, and in doing so, I’ve come across a benchmarking result which sensitively depends on whether a supposedly useless line is included or not.

I’ve reduced the test case down to:

using BenchmarkTools

function kernel1(ℓ,x)
    const T = typeof(x)
    y = x*x
    g = 1 / sqrt(ℓ)
    return ifelse(y<eps(T), 0.0, y)
end

function kernel2(ℓ,x)
    const T = typeof(x)
    const U = typeof(ℓ)
    y = x*x
    g = 1 / sqrt(ℓ)
    return ifelse(y<eps(T), 0.0, y)
end

ℓ = 700
x = 0.5
a = @benchmark kernel1($ℓ, $x)
b = @benchmark kernel2($ℓ, $x)

ratio(minimum(b),minimum(a))

Originally, it was the addition/removal of the const U = typeof(ℓ) line that caused the change I was seeing, but in trying to reduce down the test case, I’ve gotten to the point where apparently changing almost anything else in the function removes the performance difference. (I.e. even though g is unused, removing that line causes both to perform the same.)

When I inspect the LLVM or native code with code_llvm and code_native, respectively, they both appear to give identical results.

Am I doing something stupid in how I’m invoking @benchmark?

The behavior is being seen on:

Julia Version 0.6.0-dev.2375
Commit 1303dfb96* (2017-01-26 06:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

and

julia> Pkg.status("BenchmarkTools")
 - BenchmarkTools                0.0.6

Edit: I forgot to mention that I don’t see this difference on Julia 0.5 (though the benchmarks seem to have less resolution on 0.5??).

yuyichao · January 27, 2017, 6:42pm

The difference is that the second one barely passes the inlining threashold while the first one is barely lower than it.

Note that you should not use const and they are not doing anything good here.

jmert · January 27, 2017, 6:46pm

Thank you. Is there a way to inspect that that is what’s happened, or is that something you recognize from experience?

ChrisRackauckas · January 27, 2017, 7:04pm

Is there something that describes what this threshold is?

yuyichao · January 27, 2017, 8:22pm

Sort of. I’ve seen sth similar before so I checked the code_warntype of another function calling this function e.g. g(l, x) = kernel1(l, x) and you can clearly see that one is inlined whereas the other is not.

Some count of expressions after inlining (of other functions into this function). it currently doesn’t really know about the difference between the cost of different operations and other optimization passes were not able to delete all the unused code before it and that’s why the two function have different inlining behaviors even though they are supposed to be the same…

Topic		Replies	Views
Why is my GPU kernel an order of magnitude slower than my CPU function? GPU question	8	190	June 4, 2025
.= vs = speed difference New to Julia	2	551	June 12, 2019
Performance penalty of `>(1)` vs `x -> x >1`? Performance	36	1036	May 11, 2023
Comparing performance of 2 simple averaging functions - why is one faster? Performance	5	500	August 31, 2020
Function that executes two functions slower than two function separately New to Julia optimization	12	496	July 27, 2023

Benchmark showing 2x slower result between nearly identical functions on Julia 0.6

Related topics