A good measure of performance is length of generated machine code. As a package writer, I would like to monitor how well the functions I am writing are in terms of concise assembly. This would improve the performance of my packages and loading times in the long-term.
For a practical example, consider the following alternatives of computing the maximum between two numbers:
maximum((1,2))
maximum([1,2])
At first, I guessed that the former option would be more efficient because tuples are immutable and the compiler could do all sorts of things with them. However, when I generate the code, this is what I get:
maximum((1,2))
julia> @code_native maximum((1,2))
.text
Filename: reduce.jl
pushq %rbp
movq %rsp, %rbp
pushq %r15
pushq %r14
pushq %r12
pushq %rbx
subq $64, %rsp
movq %rdi, %r15
movq %fs:0, %rbx
addq $-10888, %rbx # imm = 0xD578
leaq -64(%rbp), %r14
vxorps %ymm0, %ymm0, %ymm0
vmovups %ymm0, -64(%rbp)
movq $10, -88(%rbp)
movq (%rbx), %rax
movq %rax, -80(%rbp)
leaq -88(%rbp), %rax
movq %rax, (%rbx)
movq $0, -72(%rbp)
Source line: 454
movabsq $140402333763728, %r12 # imm = 0x7FB1F73AC890
leaq 398277240(%r12), %rax
movq %rax, -64(%rbp)
leaq 398277144(%r12), %rax
movq %rax, -56(%rbp)
leaq 398277080(%r12), %rax
movq %rax, -48(%rbp)
movabsq $jl_gc_pool_alloc, %rax
movl $1456, %esi # imm = 0x5B0
movl $32, %edx
movq %rbx, %rdi
vzeroupper
callq *%rax
leaq 397451744(%r12), %rcx
movq %rcx, -8(%rax)
vmovups (%r15), %xmm0
vmovups %xmm0, (%rax)
movq %rax, -40(%rbp)
movabsq $jl_invoke, %rax
movl $4, %edx
movq %r12, %rdi
movq %r14, %rsi
callq *%rax
movq %rax, -72(%rbp)
movq (%rax), %rax
movq -80(%rbp), %rcx
movq %rcx, (%rbx)
addq $64, %rsp
popq %rbx
popq %r12
popq %r14
popq %r15
popq %rbp
retq
nopw %cs:(%rax,%rax)
maximum([1,2])
julia> @code_native maximum([1,2])
.text
Filename: reduce.jl
pushq %rbp
movq %rsp, %rbp
Source line: 454
callq _mapreduce
popq %rbp
retq
nopl (%rax,%rax)
So clearly, I cannot trust my intuition in many other cases. What is the workflow you suggest for tracking these types of changes? Is there any package to facilitate diagnostics? I wonder if something like __precompile()__
could be added to warn package writers whenever a function is re-implemented and causes giant machine code increase.
Related to this issue, it would be nice if I could start Julia in a “warn_type” mode. That is, every single command I type in the REPL gives me a warning if there is type instability. Adding @code_warntype
everywhere by hand is not very efficient from the perspective of someone that is only interested in implemented a cool new feature in the package. I’d rather have the warning from the start than having to go back in a second pass to optimize code.