Help with performance improvements with repeated computations


#1

Guys,

I have a question about how “smart” the Julia compiler is. Sorry, maybe this is already answered but I could not find maybe because I do not even know what is the English term that describes what I want to say :smiley:

In my code, sometimes I have a lot of repeated operations like:

            g211 =    3.6160 -  13.2470e_0 + 16.29000e_0^2
            g310 = - 19.3020 + 117.3900e_0 - 228.4190e_0^2 + 156.5910e_0^3
            g322 = - 18.9068 + 109.7927e_0 - 214.6334e_0^2 + 146.5816e_0^3
            g410 = - 41.1220 + 242.6940e_0 - 471.0940e_0^2 + 313.9530e_0^3
            g422 = - 146.407 + 841.8800e_0 - 1629.014e_0^2 + 1083.435e_0^3
            g520 = - 532.114 + 3017.977e_0 - 5740.032e_0^2 + 3708.276e_0^3

Question: is the compiler is smart enough to see that all those ^ are the same operations or should I compute it before and store in a variable?

The same happens for a lot of sin and cos.


#2

Many times it will put this in another form that will reduce the unnecessary computations. But this isn’t what you’re really looking for. This way of computing polynomials is not numerically stable so you need to be careful, especially when some of the coefficients sum to almost zero. You’ll want to use @evalpoly instead.


#3

Here’s a reference on the computation reduction. It’s called the common subexpression elimination pass in LLVM

http://transit.iut2.upmf-grenoble.fr/cgi-bin/dwww/usr/share/doc/llvm-3.8-doc/html/tutorial/OCamlLangImpl4.html#llvm-optimization-passes


#4

Thanks @ChrisRackauckas, you were very helpful!


#5

Well, it seems that Julia does not fully optimize those repeated expressions. For example:

function test1(a::Number)
    b = 1 + a + a^2 + a^3 + a^4
    c = 2 + 2a + 2a^2 + 2a^3 + 2a^4
    d = 3 + 3a + 3a^2 + 3a^3 + 3a^4
end

function test2(a::Number)
    a2 = a*a
    a3 = a*a*a
    a4 = a*a*a*a
    b = 1 + a + a2 + a3 + a4
    c = 2 + 2a + 2a2 + 2a3 + 2a4
    d = 3 + 3a + 3a2 + 3a3 + 3a4
end

function test3(a::Number)
    b = @evalpoly a 1 1 1 1 1
    c = @evalpoly a 2 2 2 2 2
    d = @evalpoly a 3 3 3 3 3
end

We have

julia> @benchmark test1(10.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     20.082 ns (0.00% GC)
  median time:      20.165 ns (0.00% GC)
  mean time:        20.371 ns (0.00% GC)
  maximum time:     83.892 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark test2(10.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.195 ns (0.00% GC)
  median time:      3.465 ns (0.00% GC)
  mean time:        3.513 ns (0.00% GC)
  maximum time:     26.934 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark test3(10.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.237 ns (0.00% GC)
  median time:      3.267 ns (0.00% GC)
  mean time:        3.330 ns (0.00% GC)
  maximum time:     33.811 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Hence, I will use @evalpoly for polynomials, but I have to store the repeated computations for complex expressions like:

    C2 = QOMS2T*ξ^4*nll_0*aux1*
         ( all_0*( 1 + (3/2)η^2 + 4e_0*η + e_0*η^3) +
           3/2*(k_2*ξ)/aux0*(-1/2 + (3/2)θ²)*(8 + 24η^2 + 3η^4) )

    C1 = bstar*C2

    C3 = (e_0 > 1e-4) ? QOMS2T*ξ^5*A_30*nll_0*AE*sin_i_0/(k_2*e_0) : T(0)

    C4 = 2nll_0*QOMS2T*aux2*
         ( 2η*(1+e_0*η) + (1/2)e_0 + (1/2)η^3 -
           2*k_2*ξ/(all_0*aux0)*
                (3*(1-3θ²)*(1 + (3/2)η^2 - 2*e_0*η - (1/2)e_0*η^3) +
                3/4*(1-θ²)*(2η^2 - e_0*η - e_0*η^3)*cos(2*ω_0)))

In this case, it is better to store the repeated computations.

Am I right or am I missing something?


#6

Do note that a*a*a and a^3 are not equivalent: a^3 is guaranteed to be exact to machine precision, a*a*a is not, and so the former is faster than the latter. Use @fastmath to give the compiler license to perform substitutions that are equivalent in infinite precision.


#7

I might be wrong, but I remember a professor saying (long time ago) never use -ffast-math for scientific programming. Is it still true today?

Nice!!! Using the ^ make test1 and test2 have the same benchmarks. The compiler is very smart :slight_smile:


#8

If you’re feeling adventurous, you could try out my common-subexpression-elimination tool: https://github.com/rdeits/CommonSubexpressions.jl . It aggressively removes repeated computations by making wild assumptions about your code (specifically that every function call is pure). It can sometimes do better than the compiler because it’s much easier to assume purity than for the compiler to prove purity. Of course, if that assumption is wrong you will get the wrong answer. Buyer beware.


#9

Good! This will be awesome to use in some expressions! I will do some tests here. Thanks for the info!