Do C++ performance optimisations apply to Julia?

So how much of this applies to immutable structs? Nobody would bother writing one with 2500 fields, but I am wondering how immutables are implemented. Probably off-topic, but my curiosity started over whether a compiler could edit an individual field in an immutable struct instance (which may also be an Array element rather than living on the stack) under the hood rather than the semantic action of creating a new instance. It seems like a useful optimization for setfield!-like syntax and operations if there’s no need for the change to be shared by multiple references (a job for mutable structs with all their downsides). I know about Setfield.jl but I have yet to figure out how to edit a field of an Array element without copying the whole array.

Thank you for checking in with the community on this. My general sense is that the 27 points are likely more applicable to Julia than not. The main question is how important are they. If they are not applicable to Julia, I would point to a specific Julia feature rather than philosophical differences.

In Julia, the advice to avoid is global dynamically-typed variables. In some cases, global constant references may be a useful optimization. Constant references with concrete types are useful because they can be statically typed in that they can take new values but cannot change type. Another approach would be passing reusing an argument rather creating a new variable. Methods ending with ! particularly emphasize this pattern.

Note that structs in Julia are by default immutable. This is an important language feature.

Mutable structs are passed by reference in Julia.

Having type-stable stable functions is important in Julia. If the compiler can determine the return type from the argument types, then less dynamic dispatch is needed. Thus, this advice applies to Julia as well. If no return value is needed, consider returning nothing.

Some operations can be optimized through fusion. In Julia also consider is the use of .= and operations such as .*=. For example, note the allocation when using f! as opposed to g!.

julia> f!(A) =  A .= A*2
f! (generic function with 1 method)

julia> g!(A) =  A .*= 2
g! (generic function with 1 method)

...

julia> @time f!(A);
  0.000018 seconds (1 allocation: 8.125 KiB)

julia> @time g!(A);
  0.000011 seconds

I always find this comment weird. Academics write a lot of code in many languages and often release it as open source. The overall prevalence of open source code written by academics is high. It really comes down to the individual experience of the each author.

Julia structs are often directly mappable to C structs. We rely on this for interoperability with C via @ccall. See the function sizeof.

We think about this a lot in Julia because numerical computing is a major application of the language. We also have a type promotion system. On a small scale, it probably does not matter. On a large scale, memory can be important due to processor memory caches.

If you use smaller types, you can potential fit more of them into a 256 or 512 bit SIMD register and thus do more operations per cycle. Not all of this is as automatic as one would like though, so sometimes using SIMD explicitly is helpful.

2 Likes

No, Julia (like many lisps, Java, Perl, Python, Ruby, and others) uses passing by sharing. This is also documented.

7 Likes

How is that different from pass-by-reference?

Function arguments themselves act as new variable bindings (new locations that can refer to values), but the values they refer to are identical to the passed values.

Quoting from the linked Wikipedia article:

The semantics of call by sharing differ from call by reference: “In particular it is not call by value because mutations of arguments performed by the called routine will be visible to the caller. And it is not call by reference because access is not given to the variables of the caller, but merely to certain objects”.[36] So, for example, if a variable was passed, it is not possible to simulate an assignment on that variable in the callee’s scope.[37] However, since the function has access to the same object as the caller (no copy is made), mutations to those objects, if the objects are mutable, within the function are visible to the caller, which may appear to differ from call by value semantics. Mutations of a mutable object within the function are visible to the caller because the object is not copied or cloned—it is shared.

5 Likes

Okay, I see the subtle difference between pass-by-reference and pass-by-sharing, but I still think the original quote that mutable structs are passed by reference is true (and more insightful). In that specific case a reference to the struct is passed (and so the struct can be mutated in the caller), even though pass-by-sharing is more accurate when conceptually describing how values of all types are passed in Julia.

It’s not though. If it were, the following would return [1,2,3,]

function modx(x)
  x = [1,2,3]
end
function f()
  x = [0]
  modx(x)
   ## returns [0] because x is passed-by-sharing, not reference
  return x
end

Compare pass-by-reference in C++

#include <iostream>
#include <string>
#include <vector>

void modx(std::vector<int>& x){
    x = {1,2,3};
}
void f() {
    std::vector<int> x = {0};
    modx(x);  
    for (int n: x) {
        std::cout<<n<<",";
    }
}

int main()
{
  f(); ## prints '1,2,3,'
}

While the former behavior is sane and passing-by-reference as the default will lead to disaster (IMHO), telling someone who comes from e.g. C that Julia is pass-by-reference, might mislead and confuse them.

13 Likes

Hmm, seems I overlooked the case of assigning to the reference. I was mostly thinking of this usage (in which I see no difference between pass-by-sharing and pass-by-reference):

julia> mutable struct S
           f::Float32
       end

julia> s = S(1.234)
S(1.234f0)

julia> function mods(s)
           s.f = 456
       end
mods (generic function with 1 method)

julia> s
S(1.234f0)

julia> mods(s)
456

julia> s
S(456.0f0)

But okay, I learned something today on pass-by-sharing :slight_smile:

Note that assignment never modifies the memory object pointed to by a variable but rebinds the variable to something else. Your modx doesn’t really modify in-place anything. If for example you push! to x you’re actually modifying it and the caller would see it.

I think that was the point @skleinbo was trying to make about Julia’s pass-by-sharing. In contrast, in C++ you can use a reference variable to modify in-place (as the second example shows).

1 Like

But that doesn’t have anything to do with evaluation strategy, but with assignment semantic.

I felt I was making that exact point :slight_smile: If x were passed as a reference in Julia, it would behave just like the C++ program. Edit Bad example. If I want that behavior in Julia, I explicitly need to pass a Ref to modx and dereference it,
while in C++ I can write a function that accepts a reference unbeknownst to the caller with all the hideous side effects.

Again, no, that’s because x[] = is not pure assignment but a syntactic sugar for setindex! which is an in-place operation. The difference you’re showing is in the semantic of assignment between C/C++ and Julia.

In a sense I may agree with you that there isn’t much difference with passing by reference for mutable objects. My point was that there aren’t different evaluation strategies, it’s always pass-by-sharing, strategy which is fairly popular in dynamic languages, not even specific to Julia or other niche languages. I tend to summaries it with “mutable objects can be mutated and immutable ones can’t” (and in fact the use of Ref suggested above meant to wrap an immutable object inside a mutable container to simulate the ability to mutate it), but for example this is very different from fortran where you can modify pretty much everything (unless limited by in/out scoping probably?). In Julia it’s fairly easy to know whether a type is mutable or not, for example I find that more opaque in python (probably all classes are mutable?)

8 Likes

Hmm, I’ve been unable to confirm this. Looking at the code below, you find that the only case where anything other than a chain of conditions is evaluated is in the boundary case, where the value is determined before the function is compiled and so it can just return a single value every time. I’m not seeing any switch-like behaviour.

function switch(n)
    m = n ≤ 0.1 ? 0.0 :
        n ≤ 0.6 ? 1.0 :
        n ≤ 0.8 ? 1.1 :
        n ≤ 1.0 ? 1.2 :
        n ≤ 1.3 ? 1.35 :
        2.2        #=otherwise=#
    m
end

let
    n = 1.1
    @code_llvm switch(n) # Long
end

begin
    const a = 1.1
    @code_llvm switch(a) # Long
end

function boundary()
    a = 1.1
    switch(a)
end

@code_llvm boundary()

Output:

;  @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\switch_statement.jl:2 within `switch`
; Function Attrs: uwtable
define double @julia_switch_1237(double %0) #0 {
top:
;  @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\switch_statement.jl:3 within `switch`
; ┌ @ float.jl:447 within `<=`
   %1 = fcmp ugt double %0, 1.000000e-01
; └
  br i1 %1, label %L4, label %L21

L4:                                               ; preds = %top
; ┌ @ float.jl:447 within `<=`
   %2 = fcmp ugt double %0, 6.000000e-01
; └
  br i1 %2, label %L7, label %L21

L7:                                               ; preds = %L4
; ┌ @ float.jl:447 within `<=`
   %3 = fcmp ugt double %0, 8.000000e-01
; └
  br i1 %3, label %L10, label %L21

L10:                                              ; preds = %L7
; ┌ @ float.jl:447 within `<=`
   %4 = fcmp ugt double %0, 1.000000e+00
; └
  %5 = fcmp ugt double %0, 1.300000e+00
  %. = select i1 %5, double 2.200000e+00, double 1.350000e+00
  %value_phi3 = select i1 %4, double %., double 1.200000e+00
  br label %L21

L21:                                              ; preds = %L10, %L7, %L4, %top
  %value_phi = phi double [ 0.000000e+00, %top ], [ 1.000000e+00, %L4 ], [ 1.100000e+00, %L7 ], [ %value_phi3, %L10 ]
;  @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\switch_statement.jl:9 within `switch`
  ret double %value_phi
}
;  @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\switch_statement.jl:2 within `switch`
; Function Attrs: uwtable
define double @julia_switch_1239(double %0) #0 {
top:
;  @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\switch_statement.jl:3 within `switch`
; ┌ @ float.jl:447 within `<=`
   %1 = fcmp ugt double %0, 1.000000e-01
; └
  br i1 %1, label %L4, label %L21

L4:                                               ; preds = %top
; ┌ @ float.jl:447 within `<=`
   %2 = fcmp ugt double %0, 6.000000e-01
; └
  br i1 %2, label %L7, label %L21

L7:                                               ; preds = %L4
; ┌ @ float.jl:447 within `<=`
   %3 = fcmp ugt double %0, 8.000000e-01
; └
  br i1 %3, label %L10, label %L21

L10:                                              ; preds = %L7
; ┌ @ float.jl:447 within `<=`
   %4 = fcmp ugt double %0, 1.000000e+00
; └
  %5 = fcmp ugt double %0, 1.300000e+00
  %. = select i1 %5, double 2.200000e+00, double 1.350000e+00
  %value_phi3 = select i1 %4, double %., double 1.200000e+00
  br label %L21

L21:                                              ; preds = %L10, %L7, %L4, %top
  %value_phi = phi double [ 0.000000e+00, %top ], [ 1.000000e+00, %L4 ], [ 1.100000e+00, %L7 ], [ %value_phi3, %L10 ]
;  @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\switch_statement.jl:9 within `switch`
  ret double %value_phi
}
;  @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\switch_statement.jl:22 within `boundary`
; Function Attrs: uwtable
define double @julia_boundary_1241() #0 {
top:
;  @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\switch_statement.jl:24 within `boundary`
  ret double 1.350000e+00
}
3 Likes

I thought it existed because of this comment from Jeff in a issue about adding or not a switch keyword for Julia. However, in a duplicate Jeff admits he is not sure if LLVM had really done it at some point.

1 Like

On 1.8 I tested and it emits a jump table sometimes.

2 Likes

Edit: Marking this as the solution as it best summarises or “answers” the original question. Feel kinda weird doing it but I think for discoverability it might help.

While writing the article I mentioned in the OP, I ended up not being satisfied with the end result of many of the conclusions and I think a much more interesting article can be written about a subset of the tips. I’m going to focus on that subset for the final article.

With that said, I thought it might be useful or interesting to those who stumble upon this thread in the future to read a nicely formatted document with code examples so I’ll leave the link here to an unlisted draft article on Forem:

Also, there is a GitHub repository containing all of the code snippets.

Thanks to everyone who contributed in this thread!

2 Likes

Can you post an example of the code?

I saw this example from this old thread which I have been able to recreate:

const CODES = (2, 3, 5, 7, 11)
const RCODES = reverse(CODES)

f(x) = 0
f(x, y, ys...) = x == y ? 1 + length(ys) : f(x, ys...)
findcode3(x) = f(x, RCODES...)

@code_llvm findcode3(3)

Additionally, I can edit the code to get close to what I wanted originally as we can see that there’s a table lookup happening in the LLVM:

const RETURNVALUES = (0, 1, 5, 10, 11, 13)
const RRETURNVALUES = reverse(RETURNVALUES)

h(x) = RETURNVALUES[0]
h(x, y, ys...) = x == y ? RRETURNVALUES[length(ys)+1] : h(x, ys...)
hindcode3(x) = h(x, RCODES...)

hindcode3(3)

@code_llvm hindcode3(3)
; Function Attrs: uwtable
define i64 @julia_hindcode3_1356(i64 signext %0) #0 {
top:
; ┌ @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\ss_try2.jl:23 within `h`
   %switch.tableidx = add i64 %0, -2
   %1 = icmp ult i64 %switch.tableidx, 10
   br i1 %1, label %switch.hole_check, label %L16

L16:                                              ; preds = %switch.hole_check, %top
; │ @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\ss_try2.jl:23 within `h` @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\ss_try2.jl:23 @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\ss_try2.jl:23 @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\ss_try2.jl:23 @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\ss_try2.jl:23
   %2 = call nonnull {}* @j_h_1358(i64 signext %0) #0
   call void @llvm.trap()
   unreachable

switch.hole_check:                                ; preds = %top
; │ @ c:\Users\jmmsm\Documents\GitHub\speedy_julia\examples\ss_try2.jl:23 within `h`
   %switch.maskindex = trunc i64 %switch.tableidx to i16
   %switch.shifted = lshr i16 555, %switch.maskindex
   %3 = and i16 %switch.shifted, 1
   %switch.lobit.not = icmp eq i16 %3, 0
   br i1 %switch.lobit.not, label %L16, label %switch.lookup

switch.lookup:                                    ; preds = %switch.hole_check
   %switch.gep = getelementptr inbounds [10 x i64], [10 x i64]* @switch.table.julia_hindcode3_1356, i64 0, i64 %switch.tableidx
   %switch.load = load i64, i64* %switch.gep, align 8
   ret i64 %switch.load
; └
}

I have a couple of examples of optimizations.


#simple LUT

julia> function foo(x)
           if x == 1
               7
           elseif x == 2
               8
           elseif x == 3
               9
           elseif x == 4
               10
           elseif x == 5
               11
           elseif x == 6
               12
           elseif x == 7
               13
           elseif x == 8
               14
           elseif x == 9
               15
           elseif x == 10
               16
           elseif x == 11
               17
           else
               0
           end
       end
foo (generic function with 1 method)

julia> @code_llvm foo(4)
;  @ REPL[1]:1 within `foo`
define i64 @julia_foo_102(i64 signext %0) #0 {
top:
;  @ REPL[1]:2 within `foo`
  %switch.tableidx = add i64 %0, -1
  %1 = icmp ult i64 %switch.tableidx, 10
  %.not10 = icmp eq i64 %0, 11
  %. = select i1 %.not10, i64 17, i64 0
  %switch.offset = add i64 %0, 6
  %common.ret.op = select i1 %1, i64 %switch.offset, i64 %.
;  @ REPL[1] within `foo`
  ret i64 %common.ret.op
}

#jump table julia> function foo(x)
           if x == 1
               x * 7
           elseif x == 2
               x * 8
           elseif x == 3
               x * 9
           elseif x == 4
               x * 10
           elseif x == 5
               x * 11
           elseif x == 16
               x * 12
           elseif x == 71
               x * 13
           elseif x == 8
               x * 14
           elseif x == 9
               x * 15
           elseif x == 10
               x * 16
           elseif x == 11
               x * 17
           else
               x * 0
           end
       end
foo (generic function with 1 method)

julia> @code_llvm foo(4)
;  @ REPL[5]:1 within `foo`
define i64 @julia_foo_119(i64 signext %0) #0 {
top:
;  @ REPL[5]:2 within `foo`
  switch i64 %0, label %common.ret [
    i64 1, label %L3
    i64 2, label %L7
    i64 3, label %L11
    i64 4, label %L15
    i64 5, label %L19
    i64 16, label %L23
    i64 71, label %L27
    i64 8, label %L31
    i64 9, label %L35
    i64 10, label %L39
    i64 11, label %L43
  ]

common.ret:                                       ; preds = %L43, %L39, %L35, %L31, %L27, %L23, %L19, %L15, %L11, %L7, %L3, %top
  %common.ret.op = phi i64 [ 7, %L3 ], [ 16, %L7 ], [ 27, %L11 ], [ 40, %L15 ], [ 55, %L19 ], [ 192, %L23 ], [ 923, %L27 ], [ 112, %L31 ], [ 135, %L35 ], [ 160, %L39 ], [ 187, %L43 ], [ 0, %top ]
;  @ REPL[5] within `foo`
  ret i64 %common.ret.op

L3:                                               ; preds = %top
  br label %common.ret

L7:                                               ; preds = %top
  br label %common.ret

L11:                                              ; preds = %top
  br label %common.ret

L15:                                              ; preds = %top
  br label %common.ret

L19:                                              ; preds = %top
  br label %common.ret

L23:                                              ; preds = %top
  br label %common.ret

L27:                                              ; preds = %top
  br label %common.ret

L31:                                              ; preds = %top
  br label %common.ret

L35:                                              ; preds = %top
  br label %common.ret

L39:                                              ; preds = %top
  br label %common.ret

L43:                                              ; preds = %top
  br label %common.ret
}

#more sophisticated LUT

julia> @code_llvm foo(a)
;  @ REPL[9]:1 within `foo`
define i64 @julia_foo_150(i64 signext %0) #0 {
top:
;  @ REPL[9]:2 within `foo`
  %switch.tableidx = add i64 %0, -1
  %1 = icmp ult i64 %switch.tableidx, 10
  br i1 %1, label %switch.lookup, label %L31

switch.lookup:                                    ; preds = %top
  %switch.gep = getelementptr inbounds [10 x i64], [10 x i64]* @switch.table.julia_foo_150, i64 0, i64 %switch.tableidx
  %switch.load = load i64, i64* %switch.gep, align 8
  br label %common.ret

common.ret:                                       ; preds = %L31, %switch.lookup
  %common.ret.op = phi i64 [ %., %L31 ], [ %switch.load, %switch.lookup ]
;  @ REPL[9] within `foo`
  ret i64 %common.ret.op

L31:                                              ; preds = %top
;  @ REPL[9]:22 within `foo`
; ┌ @ promotion.jl:477 within `==`
   %.not10 = icmp eq i64 %0, 11
   %. = select i1 %.not10, i64 17, i64 0
   br label %common.ret
; └
}
2 Likes