Output for @code_llvm changed from 0.6 to 0.7, 1.0

The output of @code_llvm is more verbose on 0.7 and 1.0 than it was on 0.6.

On 0.6.4

julia> f(x) = 3x+2
f (generic function with 1 method)

julia> @code_llvm f(1.0)

define double @julia_f_62581(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, 3.000000e+00
  %2 = fadd double %1, 2.000000e+00
  ret double %2
}

On 1.0.0

julia> f(x) = 3x+2
f (generic function with 1 method)

julia> @code_llvm f(1.0)

; Function f
; Location: REPL[6]:1
define double @julia_f_36035(double) {
top:
; Function *; {
; Location: promotion.jl:314
; Function *; {
; Location: float.jl:399
  %1 = fmul double %0, 3.000000e+00
;}}
; Function +; {
; Location: promotion.jl:313
; Function +; {
; Location: float.jl:395
  %2 = fadd double %1, 2.000000e+00
;}}
  ret double %2
}

The more compact 0.6 output seems better for showing how Julia generates efficient machine code. Is there a good way to remove all the added comments? None of the keyword settings suggested by the help mode help.

3 Likes

@Jameson decided that the new very verbose output is such a significant improvement that not only should it be the default, but there shouldn’t even be an option to turn it off. Personally, I disagree and think that much less verbose output should not only be an option but be the default as it once was. A pull request to quell some of this very, very long output would be much appreciated. I would do it myself but I’m going on vacation soon and won’t have time until I get back.

12 Likes

I have not preference about an option to disable debug info (though I do think it should be on by default), but this is absolutely wrong. There are countless cases where longer llvm/native code gives better performance. I know some people like to show it, but doing so is just misleading and really shows nothing significant at all.

2 Likes

When giving a talk, I’d rather show two lines of unadorned LLVM code than ask the audience to filter out all the comments by eye and see that what’s left is two lines of LLVM code. That’s all I mean.

2 Likes

And showing that in a talk without actually explaining each instructions is exactly the misleading thing I was talking about… I’ve heard/seen cases where the code is very short but with a call in it.

2 Likes

C’mon, @yuyichao, I have the exact same complaint for the exact same reason. The previous level of light inline comments were about right for understanding where some code comes from. The current level is so much that there’s about twelve lines of comments per line of code—it’s impossible to read anything. I’ve forgotten what the previous line was by the time I get to the next one; also my beard has grown long and I’ve forgotten my own name.

16 Likes

I have a PR open for improving the layout and structure. It wasn’t critical for 1.0, so I haven’t had time to finish it yet.

Yichao is correct though—a long function is generally going to be much faster than a short one, so any analysis of the length of the code_llvm is flawed. That said, I think what’s interesting to show here is specifically that optimizations are able to cut across multiple user functions (and without impacting debug info—having this accurate printing at all levels is what has helped reduce the error rate in our backtrace/profile information)

5 Likes

That is my intent. I’m trying to illustrate this capability of Julia with something like the following

function iterator(g, N)
    
    # construct gᴺ, the Nth iterate of g
    function gᴺ(x)
       for i ∈ 1:N             
          x = g(x)
       end        
       return x
    end    
    
    return gᴺ
end

f(x)  = 4*x*(1-x)

fᴺ = iterator(f, 10^6);

With julia-0.6.4,@code_llvm fᴺ(0.3) returns the fairly comprehensible

define double @"julia_g\E1\B4\BA_62655"(%"#g\E1\B4\BA#1"* nocapture readonly dereferenceable(8), double) #0 !dbg !5 {
top:
  %2 = getelementptr inbounds %"#g\E1\B4\BA#1", %"#g\E1\B4\BA#1"* %0, i64 0, i32 1
  %3 = load i64, i64* %2, align 8
  %4 = icmp slt i64 %3, 1
  br i1 %4, label %L14, label %if.preheader

if.preheader:                                     ; preds = %top
  br label %if

if:                                               ; preds = %if.preheader, %if
  %x.03 = phi double [ %8, %if ], [ %1, %if.preheader ]
  %"#temp#.02" = phi i64 [ %5, %if ], [ 1, %if.preheader ]
  %5 = add i64 %"#temp#.02", 1
  %6 = fmul double %x.03, 4.000000e+00
  %7 = fsub double 1.000000e+00, %x.03
  %8 = fmul double %6, %7
  %9 = icmp eq i64 %"#temp#.02", %3
  br i1 %9, label %L14.loopexit, label %if

L14.loopexit:                                     ; preds = %if
  br label %L14

L14:                                              ; preds = %L14.loopexit, %top
  %x.0.lcssa = phi double [ %1, %top ], [ %8, %L14.loopexit ]
  ret double %x.0.lcssa
}

showing that Julia has inlined the f function into the iterator and optimized them down to a simple for loop.

However, wiith julia-1.0.0, the output of @code_llvm is so excessively laden with comments that it’s hard for a talk audience to see this optimization has occured:

; Function gᴺ
; Location: REPL[22]:5
define double @"julia_g\E1\B4\BA_36152"({ i64 } addrspace(11)* nocapture nonnull readonly dereferenceable(8), double) {
top:
  %2 = getelementptr inbounds { i64 }, { i64 } addrspace(11)* %0, i64 0, i32 0
; Function Colon; {
; Location: range.jl:5
; Function Type; {
; Location: range.jl:255
; Function unitrange_last; {
; Location: range.jl:260
; Function >=; {
; Location: operators.jl:333
; Function <=; {
; Location: int.jl:428
  %3 = load i64, i64 addrspace(11)* %2, align 8
  %4 = icmp sgt i64 %3, 0
;}}}}}
  br i1 %4, label %L9.L13_crit_edge, label %L28

L9.L13_crit_edge:                                 ; preds = %top
  br label %L13

L13:                                              ; preds = %L13, %L9.L13_crit_edge
  %value_phi2 = phi i64 [ 1, %L9.L13_crit_edge ], [ %9, %L13 ]
  %value_phi3 = phi double [ %1, %L9.L13_crit_edge ], [ %7, %L13 ]
; Location: REPL[22]:6
; Function f; {
; Location: REPL[16]:1
; Function -; {
; Location: promotion.jl:315
; Function -; {
; Location: float.jl:397
  %5 = fsub double 1.000000e+00, %value_phi3
;}}
; Function *; {
; Location: operators.jl:502
; Function *; {
; Location: promotion.jl:314
; Function *; {
; Location: float.jl:399
  %6 = fmul double %value_phi3, 4.000000e+00
;}}
; Function *; {
; Location: float.jl:399
  %7 = fmul double %6, %5
;}}}
; Function iterate; {
; Location: range.jl:575
; Function ==; {
; Location: promotion.jl:425
  %8 = icmp eq i64 %value_phi2, %3
;}
; Location: range.jl:576
; Function +; {
; Location: int.jl:53
  %9 = add nuw i64 %value_phi2, 1
;}}
  br i1 %8, label %L28, label %L13

L28:                                              ; preds = %L13, %top
  %value_phi6 = phi double [ %1, %top ], [ %7, %L13 ]
; Location: REPL[22]:8
  ret double %value_phi6
}

This example is slightly artificial. I would like to show the same with an ODE integrator (e.g. rungekutta4) and an a user-defined dx/dt=f(x), but the above suffices to make the point.

Trust me, I’m not equating the length of the @code_llvm output or the LLVM IR with the efficiency of its execution., This is all about pedagogy and clarity.

Sure, I’m all for improving the formatting. The current PR was literally just to show it was possible, and to be able to see why our line numbers were wrong. Tim worked on some formatting idea, and has proposed this:

;  @ REPL[1]:5 within `gᴺ'
define double @"julia_gᴺ_65337"({ i64 } addrspace(11)* nocapture nonnull readonly dereferenceable(8), double) !dbg !5 {
top:
  %2 = getelementptr inbounds { i64 }, { i64 } addrspace(11)* %0, i64 0, i32 0, !dbg !7
; ┌ @ range.jl:5 within `Colon'
; │┌ @ range.jl:255 within `Type'
; ││┌ @ range.jl:260 within `unitrange_last'
; │││┌ @ operators.jl:333 within `>='
; ││││┌ @ int.jl:428 within `<='
       %3 = load i64, i64 addrspace(11)* %2, align 8, !dbg !8, !tbaa !21, !invariant.load !4
       %4 = icmp sgt i64 %3, 0, !dbg !8
; ┘┘┘┘┘
  br i1 %4, label %L9.L13_crit_edge, label %L28, !dbg !7

L9.L13_crit_edge:                                 ; preds = %top
  br label %L13, !dbg !7

L13:                                              ; preds = %L13, %L9.L13_crit_edge
  %value_phi2 = phi i64 [ 1, %L9.L13_crit_edge ], [ %9, %L13 ]
  %value_phi3 = phi double [ %1, %L9.L13_crit_edge ], [ %7, %L13 ]
;  @ REPL[1]:6 within `gᴺ'
; ┌ @ REPL[2]:1 within `f'
; │┌ @ promotion.jl:315 within `-'
; ││┌ @ float.jl:397 within `-'
     %5 = fsub double 1.000000e+00, %value_phi3, !dbg !24
; │┘┘
; │┌ @ operators.jl:502 within `*'
; ││┌ @ promotion.jl:314 within `*'
; │││┌ @ float.jl:399 within `*'
      %6 = fmul double %value_phi3, 4.000000e+00, !dbg !34
; ││┘┘
; ││┌ @ float.jl:399 within `*'
     %7 = fmul double %6, %5, !dbg !40
; ┘┘┘
; ┌ @ range.jl:575 within `iterate'
; │┌ @ promotion.jl:425 within `=='
    %8 = icmp eq i64 %value_phi2, %3, !dbg !41
; │┘
; │ @ range.jl:576 within `iterate'
; │┌ @ int.jl:53 within `+'
    %9 = add nuw i64 %value_phi2, 1, !dbg !45
; ┘┘
  br i1 %8, label %L28, label %L13, !dbg !33

L28:                                              ; preds = %L13, %top
  %value_phi6 = phi double [ %1, %top ], [ %7, %L13 ]
;  @ REPL[1]:8 within `gᴺ'
  ret double %value_phi6, !dbg !48
}

Aside: from looking at this, I notice that it might be very useful to collapse chains of identically named functions, so that we don’t indicate inlining depth changes, but simply note the recursion information on the left:

; │┌ @ float.jl:399 within `*' @ promotion.jl:314 @ operators.jl:502
1 Like

Looks nice. It would also be nice to have an option to hide comments.

1 Like

The indentation helps decipher the structure but it doesn’t really help with the fundamental problem that this output is too verbose for most common cases. The default should be less verbosity with an option for verbose output which can be as fancy and verbose as one wants since someone has explicitly asked for it.

8 Likes

The Unicode box drawings that Keno uses in the current code_warntype printing seem like a decent way of conveying similar source-line information more compactly; is there a reason that wouldn’t extend to code_llvm? I recognize that the needs of Jameson’s web-based profiler/IR explorer are a little different and more verbose output seems like a good fit there.

Have you played around with the idea of making the code bold and/or the comments gray ?

1 Like

We wish you the best vacation ever! after all the terrific work you and friends have achieved!

5 Likes

Here are a few little definitions which add code_native_nocomment and code_llvm_nocomment, maybe someone might find it useful :slight_smile:
https://gist.github.com/simonfxr/d85d537499f84abb9731b257d21b2284
Just put it in your startup.jl

2 Likes

I was reading Redirect and was reminded of this issue again. I really do think that the default should omit line information.

4 Likes

The code in that PR shows code_native results, which had this formatting for many years. The change in v0.7 was to use the same formatting everywhere.

Yes—though currently the only way to do this is to run the output through a tool like pygments. However, we can do it for all output in the terminal eventually. However, first we need to make sure the output is good without formatting in order to make sure it is possible to output to a file and will copy/paste successfully. Then some judicious use of extra styling can be used as that bit of slight extra improvement.