Output for @code_llvm changed from 0.6 to 0.7, 1.0

John_Gibson · August 28, 2018, 1:50am

The output of @code_llvm is more verbose on 0.7 and 1.0 than it was on 0.6.

On 0.6.4

julia> f(x) = 3x+2
f (generic function with 1 method)

julia> @code_llvm f(1.0)

define double @julia_f_62581(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, 3.000000e+00
  %2 = fadd double %1, 2.000000e+00
  ret double %2
}

On 1.0.0

julia> f(x) = 3x+2
f (generic function with 1 method)

julia> @code_llvm f(1.0)

; Function f
; Location: REPL[6]:1
define double @julia_f_36035(double) {
top:
; Function *; {
; Location: promotion.jl:314
; Function *; {
; Location: float.jl:399
  %1 = fmul double %0, 3.000000e+00
;}}
; Function +; {
; Location: promotion.jl:313
; Function +; {
; Location: float.jl:395
  %2 = fadd double %1, 2.000000e+00
;}}
  ret double %2
}

The more compact 0.6 output seems better for showing how Julia generates efficient machine code. Is there a good way to remove all the added comments? None of the keyword settings suggested by the help mode help.

StefanKarpinski · August 28, 2018, 3:15am

@Jameson decided that the new very verbose output is such a significant improvement that not only should it be the default, but there shouldn’t even be an option to turn it off. Personally, I disagree and think that much less verbose output should not only be an option but be the default as it once was. A pull request to quell some of this very, very long output would be much appreciated. I would do it myself but I’m going on vacation soon and won’t have time until I get back.

yuyichao · August 28, 2018, 4:29am

I have not preference about an option to disable debug info (though I do think it should be on by default), but this is absolutely wrong. There are countless cases where longer llvm/native code gives better performance. I know some people like to show it, but doing so is just misleading and really shows nothing significant at all.

John_Gibson · August 28, 2018, 12:06pm

When giving a talk, I’d rather show two lines of unadorned LLVM code than ask the audience to filter out all the comments by eye and see that what’s left is two lines of LLVM code. That’s all I mean.

yuyichao · August 28, 2018, 12:11pm

And showing that in a talk without actually explaining each instructions is exactly the misleading thing I was talking about… I’ve heard/seen cases where the code is very short but with a call in it.

StefanKarpinski · August 28, 2018, 12:44pm

C’mon, @yuyichao, I have the exact same complaint for the exact same reason. The previous level of light inline comments were about right for understanding where some code comes from. The current level is so much that there’s about twelve lines of comments per line of code—it’s impossible to read anything. I’ve forgotten what the previous line was by the time I get to the next one; also my beard has grown long and I’ve forgotten my own name.

jameson · August 28, 2018, 5:32pm

I have a PR open for improving the layout and structure. It wasn’t critical for 1.0, so I haven’t had time to finish it yet.

Yichao is correct though—a long function is generally going to be much faster than a short one, so any analysis of the length of the code_llvm is flawed. That said, I think what’s interesting to show here is specifically that optimizations are able to cut across multiple user functions (and without impacting debug info—having this accurate printing at all levels is what has helped reduce the error rate in our backtrace/profile information)

John_Gibson · August 29, 2018, 12:37am

That is my intent. I’m trying to illustrate this capability of Julia with something like the following

function iterator(g, N)
    
    # construct gᴺ, the Nth iterate of g
    function gᴺ(x)
       for i ∈ 1:N             
          x = g(x)
       end        
       return x
    end    
    
    return gᴺ
end

f(x)  = 4*x*(1-x)

fᴺ = iterator(f, 10^6);

With julia-0.6.4,@code_llvm fᴺ(0.3) returns the fairly comprehensible

define double @"julia_g\E1\B4\BA_62655"(%"#g\E1\B4\BA#1"* nocapture readonly dereferenceable(8), double) #0 !dbg !5 {
top:
  %2 = getelementptr inbounds %"#g\E1\B4\BA#1", %"#g\E1\B4\BA#1"* %0, i64 0, i32 1
  %3 = load i64, i64* %2, align 8
  %4 = icmp slt i64 %3, 1
  br i1 %4, label %L14, label %if.preheader

if.preheader:                                     ; preds = %top
  br label %if

if:                                               ; preds = %if.preheader, %if
  %x.03 = phi double [ %8, %if ], [ %1, %if.preheader ]
  %"#temp#.02" = phi i64 [ %5, %if ], [ 1, %if.preheader ]
  %5 = add i64 %"#temp#.02", 1
  %6 = fmul double %x.03, 4.000000e+00
  %7 = fsub double 1.000000e+00, %x.03
  %8 = fmul double %6, %7
  %9 = icmp eq i64 %"#temp#.02", %3
  br i1 %9, label %L14.loopexit, label %if

L14.loopexit:                                     ; preds = %if
  br label %L14

L14:                                              ; preds = %L14.loopexit, %top
  %x.0.lcssa = phi double [ %1, %top ], [ %8, %L14.loopexit ]
  ret double %x.0.lcssa
}

showing that Julia has inlined the f function into the iterator and optimized them down to a simple for loop.

However, wiith julia-1.0.0, the output of @code_llvm is so excessively laden with comments that it’s hard for a talk audience to see this optimization has occured:

; Function gᴺ
; Location: REPL[22]:5
define double @"julia_g\E1\B4\BA_36152"({ i64 } addrspace(11)* nocapture nonnull readonly dereferenceable(8), double) {
top:
  %2 = getelementptr inbounds { i64 }, { i64 } addrspace(11)* %0, i64 0, i32 0
; Function Colon; {
; Location: range.jl:5
; Function Type; {
; Location: range.jl:255
; Function unitrange_last; {
; Location: range.jl:260
; Function >=; {
; Location: operators.jl:333
; Function <=; {
; Location: int.jl:428
  %3 = load i64, i64 addrspace(11)* %2, align 8
  %4 = icmp sgt i64 %3, 0
;}}}}}
  br i1 %4, label %L9.L13_crit_edge, label %L28

L9.L13_crit_edge:                                 ; preds = %top
  br label %L13

L13:                                              ; preds = %L13, %L9.L13_crit_edge
  %value_phi2 = phi i64 [ 1, %L9.L13_crit_edge ], [ %9, %L13 ]
  %value_phi3 = phi double [ %1, %L9.L13_crit_edge ], [ %7, %L13 ]
; Location: REPL[22]:6
; Function f; {
; Location: REPL[16]:1
; Function -; {
; Location: promotion.jl:315
; Function -; {
; Location: float.jl:397
  %5 = fsub double 1.000000e+00, %value_phi3
;}}
; Function *; {
; Location: operators.jl:502
; Function *; {
; Location: promotion.jl:314
; Function *; {
; Location: float.jl:399
  %6 = fmul double %value_phi3, 4.000000e+00
;}}
; Function *; {
; Location: float.jl:399
  %7 = fmul double %6, %5
;}}}
; Function iterate; {
; Location: range.jl:575
; Function ==; {
; Location: promotion.jl:425
  %8 = icmp eq i64 %value_phi2, %3
;}
; Location: range.jl:576
; Function +; {
; Location: int.jl:53
  %9 = add nuw i64 %value_phi2, 1
;}}
  br i1 %8, label %L28, label %L13

L28:                                              ; preds = %L13, %top
  %value_phi6 = phi double [ %1, %top ], [ %7, %L13 ]
; Location: REPL[22]:8
  ret double %value_phi6
}

This example is slightly artificial. I would like to show the same with an ODE integrator (e.g. rungekutta4) and an a user-defined dx/dt=f(x), but the above suffices to make the point.

Trust me, I’m not equating the length of the @code_llvm output or the LLVM IR with the efficiency of its execution., This is all about pedagogy and clarity.

jameson · August 29, 2018, 2:13am

Sure, I’m all for improving the formatting. The current PR was literally just to show it was possible, and to be able to see why our line numbers were wrong. Tim worked on some formatting idea, and has proposed this:

;  @ REPL[1]:5 within `gᴺ'
define double @"julia_gᴺ_65337"({ i64 } addrspace(11)* nocapture nonnull readonly dereferenceable(8), double) !dbg !5 {
top:
  %2 = getelementptr inbounds { i64 }, { i64 } addrspace(11)* %0, i64 0, i32 0, !dbg !7
; ┌ @ range.jl:5 within `Colon'
; │┌ @ range.jl:255 within `Type'
; ││┌ @ range.jl:260 within `unitrange_last'
; │││┌ @ operators.jl:333 within `>='
; ││││┌ @ int.jl:428 within `<='
       %3 = load i64, i64 addrspace(11)* %2, align 8, !dbg !8, !tbaa !21, !invariant.load !4
       %4 = icmp sgt i64 %3, 0, !dbg !8
; ┘┘┘┘┘
  br i1 %4, label %L9.L13_crit_edge, label %L28, !dbg !7

L9.L13_crit_edge:                                 ; preds = %top
  br label %L13, !dbg !7

L13:                                              ; preds = %L13, %L9.L13_crit_edge
  %value_phi2 = phi i64 [ 1, %L9.L13_crit_edge ], [ %9, %L13 ]
  %value_phi3 = phi double [ %1, %L9.L13_crit_edge ], [ %7, %L13 ]
;  @ REPL[1]:6 within `gᴺ'
; ┌ @ REPL[2]:1 within `f'
; │┌ @ promotion.jl:315 within `-'
; ││┌ @ float.jl:397 within `-'
     %5 = fsub double 1.000000e+00, %value_phi3, !dbg !24
; │┘┘
; │┌ @ operators.jl:502 within `*'
; ││┌ @ promotion.jl:314 within `*'
; │││┌ @ float.jl:399 within `*'
      %6 = fmul double %value_phi3, 4.000000e+00, !dbg !34
; ││┘┘
; ││┌ @ float.jl:399 within `*'
     %7 = fmul double %6, %5, !dbg !40
; ┘┘┘
; ┌ @ range.jl:575 within `iterate'
; │┌ @ promotion.jl:425 within `=='
    %8 = icmp eq i64 %value_phi2, %3, !dbg !41
; │┘
; │ @ range.jl:576 within `iterate'
; │┌ @ int.jl:53 within `+'
    %9 = add nuw i64 %value_phi2, 1, !dbg !45
; ┘┘
  br i1 %8, label %L28, label %L13, !dbg !33

L28:                                              ; preds = %L13, %top
  %value_phi6 = phi double [ %1, %top ], [ %7, %L13 ]
;  @ REPL[1]:8 within `gᴺ'
  ret double %value_phi6, !dbg !48
}

Aside: from looking at this, I notice that it might be very useful to collapse chains of identically named functions, so that we don’t indicate inlining depth changes, but simply note the recursion information on the left:

; │┌ @ float.jl:399 within `*' @ promotion.jl:314 @ operators.jl:502

e3c6 · August 29, 2018, 12:35pm

Looks nice. It would also be nice to have an option to hide comments.

StefanKarpinski · August 29, 2018, 4:14pm

The indentation helps decipher the structure but it doesn’t really help with the fundamental problem that this output is too verbose for most common cases. The default should be less verbosity with an option for verbose output which can be as fancy and verbose as one wants since someone has explicitly asked for it.

jekbradbury · August 29, 2018, 5:17pm

The Unicode box drawings that Keno uses in the current code_warntype printing seem like a decent way of conveying similar source-line information more compactly; is there a reason that wouldn’t extend to code_llvm? I recognize that the needs of Jameson’s web-based profiler/IR explorer are a little different and more verbose output seems like a good fit there.

Evizero · August 29, 2018, 5:36pm

Have you played around with the idea of making the code bold and/or the comments gray ?

Seif_Shebl · August 29, 2018, 7:34pm

We wish you the best vacation ever! after all the terrific work you and friends have achieved!

simonfxr · August 29, 2018, 8:03pm

Here are a few little definitions which add code_native_nocomment and code_llvm_nocomment, maybe someone might find it useful
https://gist.github.com/simonfxr/d85d537499f84abb9731b257d21b2284
Just put it in your startup.jl

Keno · August 29, 2018, 8:12pm

I was reading Redirect and was reminded of this issue again. I really do think that the default should omit line information.

jameson · August 31, 2018, 1:52pm

The code in that PR shows code_native results, which had this formatting for many years. The change in v0.7 was to use the same formatting everywhere.

jameson · August 31, 2018, 2:01pm

Yes—though currently the only way to do this is to run the output through a tool like pygments. However, we can do it for all output in the terminal eventually. However, first we need to make sure the output is good without formatting in order to make sure it is possible to output to a file and will copy/paste successfully. Then some judicious use of extra styling can be used as that bit of slight extra improvement.

Topic		Replies	Views
Code_llvm and and code_native should return their output Internals & Design	9	1154	July 14, 2021
Readability of `@code_warntype` General Usage	3	835	September 20, 2018
Code_native opcode annotations General Usage	9	1289	October 7, 2019
Generate independent LLVM code from Julia General Usage question , llvm	0	674	April 5, 2017
Why does Julia not optimize this code when C++ (LLVM) can? Performance	14	1321	February 29, 2020

Output for @code_llvm changed from 0.6 to 0.7, 1.0

Related topics