Code_native opcode annotations

milesf · October 6, 2019, 7:41pm

For those of us not yet fluent in reading assembly, code_native output could be enhanced with human-readable annotations of what the opcodes represent. This will save us from constantly searching through docs to find meanings of the more obscure opcodes.

Are there any existing annotation tools? I searched and did not find anything. I’m surprised that no one already built a simple web app that annotates blobs of assembly.

This doesn’t seem too difficult to introduce to Julia’s code_native function. Biggest challenge seems to be handling different arg counts and types for some opcodes, and manually writing concise descriptions for all the opcodes. First version could just be a simple opcode lookup.

Here’s the proposal:

# Existing original output
julia> code_native(summation, (UInt64,); debuginfo=:none, syntax=:intel)
        .text
        test    rdi, rdi
        je      L32
        lea     rdx, [rdi - 1]
        lea     rax, [rdi - 2]
        mulx    rcx, rax, rax
        shld    rcx, rax, 63
        lea     rax, [rcx + 2*rdi]
        add     rax, -1
        ret
L32:
        xor     eax, eax
        ret
        nop     word ptr cs:[rax + rax]

# Proposed annotated output with intel_annotate flag
julia> code_native(summation, (UInt64,); debuginfo=:none, syntax=:intel_annotate)
        .text
        test    rdi, rdi                ; test S1, S2       Set flags to result of S2 AND S2
        je      L32                     ; je L              Jump to label if equal/zero
        lea     rdx, [rdi - 1]          ; lea D, S          Load effective address of source into destination
        lea     rax, [rdi - 2]          ; lea D, S          Load effective address of source into destination
        mulx    rcx, rax, rax           ; mulx D1, D2, S    Multiply S by rdx. Product upper half to D2, lower half to D2
        shld    rcx, rax, 63            ; shld D R N        Shift D left N bits and shift in R
        lea     rax, [rcx + 2*rdi]      ; lea D, S          Load effective address of source into destination
        add     rax, -1                 ; add D, S          Add source to destination
        ret                             ; ret               Return from procedure
L32:
        xor     eax, eax                ; xor D, S          Exclusive OR. Store in D
        ret                             ; ret               Return from procedure
        nop     word ptr cs:[rax + rax] ; nop               No operation

yuyichao · October 6, 2019, 10:58pm

You should almost never use code native. There are very few cases where code native is the right tool and for all of them, this kind of annotation is the last thing you need to worry about. (Basically, if you are looking at it where this level of annotation solves your problem, you shouldn’t look at the assembly code, you should at most look at the llvm code.)

Most of the info you are showing now are also just annoy for anyone that knows them. I’ll say that when you remember most of the common instructions, one of the most annoy things reading assembly is actually keeping track of dependencies, it’ll actually be useful if you can generally do that accurately but that’s nothing like a simple table lookup.

I’m not saying that such annotation is useless for everyone, but they adds a lot of noise and the particular form you are proposing isn’t that useful in real cases where you need to read assembly. You should be better off developing an independent asm debugging / learning tool.

Also, on the implementation side, Julia knows little about assembly, this is an info you should add to/query from llvm instead.

milesf · October 7, 2019, 1:16am

This feature is more aimed towards folks who are just beginning to peek at the assembly output out of curiosity, rather than the experts who are already familliar with most of the mnemonics.

For example in this thread, an embedded reference on operand order could have saved a bunch of time and effort.

But yes, this could be an external tool. It just seems straightforward enough to offer within Julia though an additional flag.

Good point about looking at the llvm output. I previously assumed that the llvm code is further refined before becoming machine code, but I now see the optimized form is displayed by default.

yuyichao · October 7, 2019, 1:59am

Well, the question is why are you doing it. If your goal is to learn anything about julia, then please don’t do that or you are just confusing yourself unnecessarily. I’m not implying that you are not smart enough to understand assembly but the assembly code is really not designed to be easy to understand. Sure if you are smart enough you can get most of the info from it, but you’ll still be wasting your time by going through that path when there are much better ways to get what you want.
Now if you are just trying to learn assembly, then I don’t think @code_native should really be optimized for teaching people that. That’s why it should really be an external tool. This also means that features that helps analyse assembly code could be added, but AFAICT those are much harder to implement, and might also have to be interactive, which means that they should probably also be an added on in a different package instead…

That is a perfect example for what I’m talking about. Basically alll useful information in that thread is shown in the LLVM code and most of the confusion actually come from the weirdness of (x86) assembly. All of the time wasted that you want to save can simply be solved by using the correct tool, by looking at things that are intrinsically more clear.

The point is also that, as I already mentioned above, such annotation can easily be annoy for the real target user. Adding debug info already caused people to complain about the verbosity of the output and I’d say that debug info (line numbers) is actually much more important/useful in real debugging using that output then a copy-paste of the instruction description.

No. As I said, julia doesn’t even contain any of related information.

There is some optimization when generating machine code (entirely in LLVM) but there’s very little. That’s the few things I said that do genuinely requires looking at the assembly code.

yuyichao · October 7, 2019, 2:09am

And just to give a few examples of features actually useful for users of all levels during real debugging using the assembly code. There are probably other useful features but I have not use much fancier tools =(…

gdb annotate pc relative addressing in the code with the real address
perf-report add pointers between jump instructions and jump target

jw3126 · October 7, 2019, 5:26am

FWIW I would be such a curious person. (I understand that it will not make me a better julia programmer).

I also think that this could live in a package:

using CodeNativeAnnotations: @code_native

milesf · October 7, 2019, 6:15am

It’s curiosity for me too, and to appreciate how Julia can generate efficient code. Probably the same reason why code_native output is demonstrated in so many Julia guides and presentations. Maybe code_llvm should be showcased instead. I assume native assembly is chosen because most folks are more familiar with what the machine code represents and trust it more than LLVM IR. At least in my experience, I’ve read and written assembly for lower-level devices, but Julia has been my first exposure to LLVM output.

It’s wonderful how easy it is to inspect code at different stages of compilation in an interactive Julia environment, and so I’ve been taking advantage of this feature to challenge assumptions about what the compiler can optimize away.

Now there’s another great learning opportunity to become more familiar with how LLVM IR maps to native assembly, and it wouldn’t hurt to optionally make the esoteric x86 assembly output more accessible.

Absolutely.
Although based on the usage of @code_native in introductory tutorials, shouldn’t we consider the Julia novice to be a target user too?
Is an opt-in flag for annotation a fair compromise?

I agree that Julia has no more information beyond a blob of assembly text. I meant that it is straightforward to handle an additional syntax flag value in the code_native function in InteractiveUtils and invoke the annotator there, rather than have this feature be part of an external tool that wraps/redefines/extends code_native. But the later approach is relatively straightforward too, so that seems like a better path to pursue to avoid the controversy.

I appreciate hearing the case for using llvm output instead though, and I will incorporate that into my workflow.

milesf · October 7, 2019, 6:28am

How about calling the package InteractiveUtilsExtensions, and include @code_native_diff and @code_llvm_diff in there too as discussed in the code diff thread?

Tamas_Papp · October 7, 2019, 6:36am

I am not sure how one would showcase this. The facilities are available, but working through LLVM IR is not a Julia-specific skill.

That said, even though I could read some assembly before using Julia, I found LLVM code much easier to understand from the very beginning. It is nicely annotated, and you don’t get lost in less-relevant details. LLVM constructs are reasonably well-documented and easy to search for.

The only thing one would miss is when LLVM doesn’t compile to the most efficient native code. Those things happen, but investigating and fixing that requires a level of expertise few users have.

yuyichao · October 7, 2019, 12:17pm

Yes, that would be totally fine.

Nah, it’s shown often because it makes people feel better but it usually doesn’t show anything useful. (I believe the example Jeff liked to use is a slide showing a function’s “very short” assembly code which is just a call (or maybe jmp) = = …)

No, rather, those tutorial should really remove the use of it. As long as it’s a julia intro material, the mentioning of code_native for any user that doesn’t already know assembly well is just misleading.

Again, what I’m saying is that there are two completely different use of code_native.

If you want to understand how julia is compiled, you really shouldn’t look at it because you’ll be lost in the detail. (Or if you are an assembly expert, you can look at it but you won’t need any of the basic annotation either…)
If you want to learn assembly, and I totally agree that the interactive nature of julia makes it a very easy tool to do that, you should be aware that this would not help with your julia problem >99% of the time and this isn’t the goal of code_native. Also, I believe when you are going down this path, as soon as you started to actually understand the assembly code, there are so much more information/tricks/patterns that you’ll learn in the assembly code that could use annotation depending on the context. I have no doubt that code_native (or at least it’s interface since it’s no more than an LLVM wrapper) makes a very good starting point for a learning tool but given the complexity of assembly and the variety of things you may want to learn I think it’ll definately fit better in a package. Once you’ve got more information in and once it’s more matured, if it still looks similar to code_native (I kind of question that a little, some info might be interactive) it’s certain possible to hook that in as an option.

Topic		Replies	Views
Why does `code_native` output 32-bit assembly in 64-bit Julia? General Usage	9	1421	July 10, 2018
Code_native diff General Usage	3	962	October 7, 2019
@code_native output to file New to Julia	1	299	June 5, 2023
Code_llvm and and code_native should return their output Internals & Design	9	1135	July 14, 2021
[ANN] ColoredLLCodes: syntax highlighting for `code_llvm` and `code_native` Package Announcements	1	612	October 2, 2020

Code_native opcode annotations

Related topics