Compiler optimization for variables and functions

C++ compilers can opt-out some extra variables and unused parts of the code.
It seems that Julia doesn’t have such optimizations, e.g.:

function addone(x::Int)
    local y::Int = 10
    local z::Int = 3
    y = y % x
    x += 1
    x
end

Running @code_native addone(3) shows that that rem function is called, though it has result in local variables.
Can you explain this?

Put things in functions:

julia> f() = addone(3)
f (generic function with 1 method)

julia> @code_llvm f()

; Function f
; Location: REPL[21]:1
define i64 @julia_f_241591136() {
top:
  ret i64 4
}

What you did was equivalent of writing

code_llvm(addone, Tuple{Int})

which doesn’t give the compiler any hint that the about the value of the argument.

2 Likes

Thanks, but why it’s Tuple{Int} ?
Is it about bad performance with global variables, even when their type is known?
If I write

x = 3 # typeof(x) == Int64
f(x) = addone(x)
@code_llvm f(x)

it calls rem, but if I wrap it with some main function, it becomes constant again:

function main()
    x = 3
    addone(x)
end

Does this affect @btime addone($x) and @time addone(x) macro?
So, should I put any code inside some main() or f() functions with no input arguments and do NOT benchmark any functions with global input arguments?

1 Like

The Tuple{..., ...} is a list of the argument types to the functions.

If it is not const the type is not known because you can change it at any point (e.g. by doing x = 3.0). If you declare it const then it works fine:

julia> const x = 3
3

julia> f() = addone(x)
f (generic function with 1 method)

julia> @code_llvm f()

; Function f
; Location: REPL[3]:1
define i64 @julia_f_312611515() {
top:
  ret i64 4
}

You should read the performance tips carefully and use https://github.com/JuliaCI/BenchmarkTools.jl to benchmark. But yes, putting things into functions and not using global variables is indeed correct.

1 Like

Which version are you using? We are not very good at recognizing and optimizing out pure functions in general but this case shouldn’t have that problem and on 1.0 no rem is called.

What the compiler can’t do though, is to remove the error check. y % x throws an error when x is 0 and that cannot be optimized out.

That’s unrelated.

That’s just how code_native the function should be used.

No, and none of your example, above or below, uses global variable anyway…

He didn’t have any of those problems…

3 Likes

Yeah, I guess y should have been eliminated anyway.

The answer was about benchmarking in general.

Ok, but in normal use cases the input value is not known, and even if i write

const z = 3
@code_native addone(z)

then it calls rem on locals.

And even if I put it inside a function and call it with an unknown argument value

function main()
    x::Int64 = rand(Int64)
    addone(x)
end

@code_native main()

it again calls rem on locals…

1 Like

Sure, I know you are talking about benchmark but none of the code above uses global variables… They are the difference between compilers doing constant propagation or not by having the variable as a function input rather than locally known value…

1 Like

Thare are really multiple problems you are seeing and all what I’m compilaing about @kristoffer.carlsson is that you bring up many issues that unnecessarily complicated the discussion…

For a start, code_native (either the function or the macro) does not show you the code that corresponds to the expression you give it, it shows the function you’ll call with it. So having the input to the macro as a constant or not will not change the result.

Secondly, because of this, in no part of any of the code above does global variable play any role. You aren’t seeing any issue caused by type instability so bringing it up is just complicating things… It is a thing to be careful about when doing benchmarks but @code_native isn’t a benchmark and you aren’t running the code.

Should go without saying that putting in a function (it already is) does not change the result.

And now the real question. This shouldn’t be the case. Are you just refering to the rem in the debug info? or are you actually seeing a call to the rem function. If it’s the former, it’s just the error issue I said above and if it’s the latter, what’s your version and what’s the code you see?

And finally, something unrelated to the main issue but worth bringing up: none of the local and the ::Int or ::Int64 on the local variables you have are doing anything and you should get rid of them. There are cases where using them is good but in this case it just make the code harder to read (If you are from C and like explicit variable declaration that’s fine but it won’t help the compiler).

4 Likes

And as for why code_native for addone and f() = addone(3) are different in wrt the rem but not in C. I believe % does not have side effect in C (as in integer divide by 0 is UB in C). However, this is not the case in julia, (integer divide by 0 always throws an error). This means that in C % where the result is not used is a no-op and can be removed but in julia that is impossible unless you can proof that there is no error (edit: the error can’t be removed but the actual divide will be). This is why when you check for that explicitly, there will be no code left for the rem even if the input is not known.

julia> function addone(x::Int)
           local y::Int = 10
           local z::Int = 3
           if x != 0
               y = y % x
           end
           x += 1
           x
       end
addone (generic function with 1 method)

julia> @code_llvm addone(3)

; Function addone
; Location: REPL[1]:2
define i64 @julia_addone_36031(i64) {
top:
; Location: REPL[1]:7
; Function +; {
; Location: int.jl:53
  %1 = add i64 %0, 1
;}
; Location: REPL[1]:8
  ret i64 %1
}

I also don’t think you are actually seeing a rem call in any case since

  1. Unless you disabled inlining, rem the julia function should be inlined.
  2. There isn’t a C function rem that you’ll call for this function. On all supported platforms it should be implemented as (inlined) assembly in the llvm backend and you shouldn’t see any call.

This also suggests that since you are probably not super familiar with the assembly code, you should probably use code_llvm for this. (This is a case where inference, i.e. code_warntype, won’t give you enough info but code_llvm is more than enough.)

4 Likes

Thanks, just as you said, with zero-check it have no rem, but without it the output is as follows:

; Function addone
; Location: REPL[1]:2
; Function Attrs: uwtable
define i64 @julia_addone_34873(i64) #0 {
top:
; Location: REPL[1]:4
; Function rem; {
; Location: int.jl:233
  %cond = icmp eq i64 %0, 0
  br i1 %cond, label %fail, label %after_srem

fail:                                             ; preds = %top
  call void @jl_throw(%jl_value_t addrspace(12)* addrspacecast (%jl_value_t* inttoptr (i64 142147408 to %jl_value_t*) to %jl_value_t addrspace(12)*))
  unreachable

after_srem:                                       ; preds = %top
;}
; Location: REPL[1]:5
; Function +; {
; Location: int.jl:53
  %1 = add i64 %0, 1
;}
; Location: REPL[1]:6
  ret i64 %1
}

Seems like I took the wrong operations to test on local variables…
By the way, I run Julia 0.7, so it’s not about the version - just the rem exception handling.

And here, there’s no rem function call (read, no divide calculated) in your code.
The Function rem here is just the debug info telling you that code is inlined from the rem function. So depending on what you mean by rem is still there,

  1. There’s no rem call.
  2. There’s no division
  3. Part of the rem function that has sideeffect that doesn’t exist in C is still there.

So the compiler is equally good at removing no-ops but the def of no-op is different and the safer semantics prevented the removal of a branch here (which should be pretty cheap…)

Yes, pretty much… And you should be able to see the difference by just calling addone(0)

4 Likes

Ok, so there are some functions, that can’t be dropped-out for optimization.

I tried some number generators such as:
y = rand(Int)
y = time_ns()
And allocating local arrays:
y = zeros(Int, 10, 10)
y = Vector{Int}(undef, 10, 10)

And none of these have been optimized.
I wonder if there are some guidance about which variables and functions can be optimized, and which cannot?

rand(Int) cannot be optimized out, it change the state of the global RNG. There isn’t anything defined about this and won’t be. Things that calls external functions will have a hard time to be optimized out

2 Likes

Also, filling array with some values and overwriting these values with another, have no optimization as well - the native code is slightly longer.

function fillone(x::Vector{Int})
    L = length(x)
    v = Int(15)
    for i=1:L
        # x[i] = v # ! no opt-out
        x[i] = 1
    end
    x
end

And even with one element:

function setone(x::Vector{Int}, i::Int)
    if length(x) < i return x end
    v = Int(15)
    # x[i] = v # ! no opt-out
    x[i] = 1
    x
end

I suppose that container types and in particular setindex! functions are too complex to optimize.

So, if you use arrays to store some calculation results, these calculations are less optimized.

No. And again, the fact that you are not seeing a function call to that means that it’s not that complex.

And no, just don’t do that…

  1. As I said, it seems that you don’t actually read assembly all that well, so use code_llvm please.
  2. It is never ever a good idea to just count how many lines there are in, well, basically any code to see what optimization is done. You failed in this once and now again…

You are just seeing the bounds check, which for one reason or another isn’t being optimized out since llvm’s range analysis isn’t super happy about the mix of signed and unsigned comparison…

And again, the “longer” code you see (or at least I see from your code) is just debug info. When you have two setindex!, the first one throws the bounds error instead of the second one so you end up with the debug info of two functions instead of one. Again, just stop counting lines and actually read the code. The debug info shows perfectly where you can find the bounds check and the store.

1 Like

Ok, thank you for pointing out my mistakes, I will check code_llvm then, instead of code_native.