This should not be necessary. I don’t know assembly or LLVM primitives either, but by looking at the output of
@code_llvm you can get a pretty good idea of what is going on, what the compiler figured out, and what it didn’t. But even that is not necessary in most cases, go with
@code_warntype 99% of the time.
I would also argue that
@code_native is less useful than
@code_llvm for most cases, since it is rather a lot of information with even less structure, and easy to get lost in. It is also a common mistake to assume that assembly code length somehow maps to execution speed; this does not necessarily hold when optimizing the same computation.
Also, about the original post: I think that “best possible performance” is not a good goal except in very special situations. Diminishing returns kick in rather early in Julia (because it is so fast already) — typically, in order of descending importance, after you have
- chosen the right algorithm and data structures,
- made sure things are concretely typed,
- made sure the compiler can figure out types (
- took care of major allocations in inner loops,
- profiled, optimized, and annotated some parts (eg
@inbounds), but strictly in Julia.
These days, especially with
master, this usually gets you within 10–20% of what you could get with micro-optimizations which are much, much more costly to maintain in the long run. Be wary of the latter: a lot of libraries are littered with hacks that looked like a good idea at the time, but then 3 years later are still around and may even be suboptimal until someone bothers to benchmark and refactor (I am guilty of doing this too, but the punishment is self-contained when I need to touch said code).
It sometimes happens that you run into issues where the compiler could figure out something but currently doesn’t. In these cases, you can either wait for the fix and not worry about it in the meantime, or separate the functionality to a kernel function that contains the workaround, and make a note about the issue so that you can remove it once it is fixed.
The strategy I would recommend is writing clean Julia code, and trusting that the compiler is getting more and more clever, and confine micro-optimizations to 1% of the cases where it is really needed.