Help with performance improvements with repeated computations

Ronis_BR · June 2, 2018, 6:06pm

Guys,

I have a question about how “smart” the Julia compiler is. Sorry, maybe this is already answered but I could not find maybe because I do not even know what is the English term that describes what I want to say

In my code, sometimes I have a lot of repeated operations like:

            g211 =    3.6160 -  13.2470e_0 + 16.29000e_0^2
            g310 = - 19.3020 + 117.3900e_0 - 228.4190e_0^2 + 156.5910e_0^3
            g322 = - 18.9068 + 109.7927e_0 - 214.6334e_0^2 + 146.5816e_0^3
            g410 = - 41.1220 + 242.6940e_0 - 471.0940e_0^2 + 313.9530e_0^3
            g422 = - 146.407 + 841.8800e_0 - 1629.014e_0^2 + 1083.435e_0^3
            g520 = - 532.114 + 3017.977e_0 - 5740.032e_0^2 + 3708.276e_0^3

Question: is the compiler is smart enough to see that all those ^ are the same operations or should I compute it before and store in a variable?

The same happens for a lot of sin and cos.

ChrisRackauckas · June 2, 2018, 6:13pm

Many times it will put this in another form that will reduce the unnecessary computations. But this isn’t what you’re really looking for. This way of computing polynomials is not numerically stable so you need to be careful, especially when some of the coefficients sum to almost zero. You’ll want to use @evalpoly instead.

ChrisRackauckas · June 2, 2018, 6:20pm

Here’s a reference on the computation reduction. It’s called the common subexpression elimination pass in LLVM

https://gitter.im/symengine/symengine?at=5ae772c615c9b031145157d9

http://transit.iut2.upmf-grenoble.fr/cgi-bin/dwww/usr/share/doc/llvm-3.8-doc/html/tutorial/OCamlLangImpl4.html#llvm-optimization-passes

Ronis_BR · June 2, 2018, 6:22pm

Thanks @ChrisRackauckas, you were very helpful!

Ronis_BR · June 2, 2018, 6:41pm

Well, it seems that Julia does not fully optimize those repeated expressions. For example:

function test1(a::Number)
    b = 1 + a + a^2 + a^3 + a^4
    c = 2 + 2a + 2a^2 + 2a^3 + 2a^4
    d = 3 + 3a + 3a^2 + 3a^3 + 3a^4
end

function test2(a::Number)
    a2 = a*a
    a3 = a*a*a
    a4 = a*a*a*a
    b = 1 + a + a2 + a3 + a4
    c = 2 + 2a + 2a2 + 2a3 + 2a4
    d = 3 + 3a + 3a2 + 3a3 + 3a4
end

function test3(a::Number)
    b = @evalpoly a 1 1 1 1 1
    c = @evalpoly a 2 2 2 2 2
    d = @evalpoly a 3 3 3 3 3
end

We have

julia> @benchmark test1(10.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     20.082 ns (0.00% GC)
  median time:      20.165 ns (0.00% GC)
  mean time:        20.371 ns (0.00% GC)
  maximum time:     83.892 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark test2(10.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.195 ns (0.00% GC)
  median time:      3.465 ns (0.00% GC)
  mean time:        3.513 ns (0.00% GC)
  maximum time:     26.934 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark test3(10.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.237 ns (0.00% GC)
  median time:      3.267 ns (0.00% GC)
  mean time:        3.330 ns (0.00% GC)
  maximum time:     33.811 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Hence, I will use @evalpoly for polynomials, but I have to store the repeated computations for complex expressions like:

    C2 = QOMS2T*ξ^4*nll_0*aux1*
         ( all_0*( 1 + (3/2)η^2 + 4e_0*η + e_0*η^3) +
           3/2*(k_2*ξ)/aux0*(-1/2 + (3/2)θ²)*(8 + 24η^2 + 3η^4) )

    C1 = bstar*C2

    C3 = (e_0 > 1e-4) ? QOMS2T*ξ^5*A_30*nll_0*AE*sin_i_0/(k_2*e_0) : T(0)

    C4 = 2nll_0*QOMS2T*aux2*
         ( 2η*(1+e_0*η) + (1/2)e_0 + (1/2)η^3 -
           2*k_2*ξ/(all_0*aux0)*
                (3*(1-3θ²)*(1 + (3/2)η^2 - 2*e_0*η - (1/2)e_0*η^3) +
                3/4*(1-θ²)*(2η^2 - e_0*η - e_0*η^3)*cos(2*ω_0)))

In this case, it is better to store the repeated computations.

Am I right or am I missing something?

antoine-levitt · June 2, 2018, 7:15pm

Do note that a*a*a and a^3 are not equivalent: a^3 is guaranteed to be exact to machine precision, a*a*a is not, and so the former is faster than the latter. Use @fastmath to give the compiler license to perform substitutions that are equivalent in infinite precision.

Ronis_BR · June 2, 2018, 7:53pm

I might be wrong, but I remember a professor saying (long time ago) never use -ffast-math for scientific programming. Is it still true today?

Nice!!! Using the ^ make test1 and test2 have the same benchmarks. The compiler is very smart

rdeits · June 2, 2018, 8:52pm

If you’re feeling adventurous, you could try out my common-subexpression-elimination tool: https://github.com/rdeits/CommonSubexpressions.jl . It aggressively removes repeated computations by making wild assumptions about your code (specifically that every function call is pure). It can sometimes do better than the compiler because it’s much easier to assume purity than for the compiler to prove purity. Of course, if that assumption is wrong you will get the wrong answer. Buyer beware.

Ronis_BR · June 2, 2018, 8:56pm

Good! This will be awesome to use in some expressions! I will do some tests here. Thanks for the info!

Topic		Replies	Views
Are there tricks to reduce compile-time for this code? General Usage	4	805	December 7, 2017
CAS benchmarks (Symbolics.jl and Maxima) Performance symbolic	36	9195	April 5, 2021
Tricks to compile away step in calculation? General Usage	20	969	November 9, 2023
Improving Computational Performance Performance fast-math	6	942	June 20, 2021
Fma's and poly eval Performance performance , polynomials	4	760	May 19, 2021

Help with performance improvements with repeated computations

Related topics