CAS benchmarks (Symbolics.jl and Maxima)

Elrod · April 3, 2021, 9:02am

More readable if using Intel syntax and removing debug info:

julia> @cn h(1)
        .text
        lea     rax, [rdi + 2*rdi]
        add     rax, rax
        add     rax, 4
        ret
        nop     dword ptr [rax]

we can see that it first calculates rdi + 2*rdi (== 3*rdi), assigning it to rax, which it then adds to itself (same as *2), and finally adds 4.
This is exactly what we would have gotten from f(x) = 6x+4 (i.e., the compiler would split the *6 into lea and add).
This is different than what you reported, which seems to have replaced the last two adds with

        lea     rax, [rax + rax + 4]

Which does the same thing, but uses one less instruction.

I tried a couple different Julia+LLVM versions and got the two adds each time, so I don’t think this difference is because of LLVM version.
I assume this was on your Zen2 computer?

Checking Agner Fog’s instruction tables, lea is faster on Zen2 than it is on Skylake(-X).
so LLVM picked the version fastest for our specific CPUs.

Reciprocal throughputs (lower is better):

Instruction	Zen2	Skylake-X
lea-2	1/4	1/2
lea-3	1/4	1
add	1/3	1/4

The N in lea-N means how many arguments. So lea rax, [rax + rax + 4] would be lea-3, which would be much faster on Zen2 than on Skylake-X.

Topic		Replies	Views
Symbolics.jl: Speed of derivative computations vs. SymEngine.jl Performance question , symbolics	2	1081	September 27, 2021
[ANN] Symbolics.jl: A Modern Computer Algebra System for a Modern Language Package Announcements	146	48460	May 10, 2024
Benchmarking Symbolics.jl against SymEngine.jl Performance performance	8	1161	December 15, 2023
Small benchmark Performance benchmark	14	2704	November 21, 2018
[ANN] New benchmark comparison General Usage benchmark	5	1027	March 13, 2018

CAS benchmarks (Symbolics.jl and Maxima)

Related topics