Passing struct vs struct fields as function arguments

Hi,

I have noticed that passing struct fields as function arguments generally results in better performance compared to passing the struct directly. For instance,

using BenchmarkTools;
using Random;

struct MyStruct
    A::Vector{Float64}
    B::Matrix{Float64}
end

Random.seed!(1);
something = MyStruct(randn(100), randn(100,100));

function test1(A::Vector{Float64}, B::Matrix{Float64})
    return B*A;
end

call_test1(something::MyStruct) = test1(something.A, something.B);

function test2(something::MyStruct)
    return something.B*something.A;
end

@inline function test3(something::MyStruct)
    return something.B*something.A;
end
julia> @btime call_test1($something);
  2.331 ΞΌs (1 allocation: 896 bytes)

julia> @btime test2($something);
  2.517 ΞΌs (1 allocation: 896 bytes)

julia> @btime test3($something);
  2.598 ΞΌs (1 allocation: 896 bytes)

julia> @benchmark call_test1($something)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.428 ΞΌs … 94.531 ΞΌs  β”Š GC (min … max): 0.00% … 95.36%
 Time  (median):     2.899 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   2.978 ΞΌs Β±  1.030 ΞΌs  β”Š GC (mean Β± Οƒ):  0.30% Β±  0.95%

          β–β–…β–‡β–ˆβ–ƒ                                               
  β–β–‚β–„β–…β–†β–†β–…β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–†β–†β–†β–…β–ƒβ–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β– β–‚
  2.43 ΞΌs        Histogram: frequency by time        4.86 ΞΌs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark test2($something)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.664 ΞΌs … 95.310 ΞΌs  β”Š GC (min … max): 0.00% … 95.29%
 Time  (median):     3.091 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   3.363 ΞΌs Β±  1.503 ΞΌs  β”Š GC (mean Β± Οƒ):  0.27% Β±  0.95%

    β–‚β–‡β–ˆβ–†β–                                                     
  β–β–ƒβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–…β–ƒβ–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β– β–‚
  2.66 ΞΌs        Histogram: frequency by time        7.51 ΞΌs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark test3($something)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.379 ΞΌs … 97.867 ΞΌs  β”Š GC (min … max): 0.00% … 96.19%
 Time  (median):     3.214 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   3.586 ΞΌs Β±  1.513 ΞΌs  β”Š GC (mean Β± Οƒ):  0.26% Β±  0.96%

  ▁▃ β–ƒβ–†β–ˆβ–ˆβ–‡β–†β–†β–…β–†β–†β–…β–ƒβ–  ▁▁▁▁                                     β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–†β–‡β–ˆβ–ˆβ–‡β–†β–‡β–‡β–‡β–†β–‡β–†β–‡β–†β–‡β–‡β–†β–‡β–†β–†β–†β–†β–†β–†β–†β–†β–†β–†β–…β–… β–ˆ
  2.38 ΞΌs      Histogram: log(frequency) by time     8.93 ΞΌs <

 Memory estimate: 896 bytes, allocs estimate: 1.

Can someone explain why? Any link to the appropriate documentation (if available) would be appreciated.

my two cents is that that is only benchmarking noise:

julia> @btime call_test1($something);
  2.614 ΞΌs (1 allocation: 896 bytes)

julia> @btime test2($something);
  2.808 ΞΌs (1 allocation: 896 bytes)

julia> @btime test3($something);
  2.821 ΞΌs (1 allocation: 896 bytes)

julia> @btime call_test1($something);
  2.805 ΞΌs (1 allocation: 896 bytes)

julia> @btime test2($something);
  2.879 ΞΌs (1 allocation: 896 bytes)

julia> @btime test3($something);
  2.765 ΞΌs (1 allocation: 896 bytes)

julia> @btime call_test1($something);
  2.735 ΞΌs (1 allocation: 896 bytes)

julia> @btime test2($something);
  3.511 ΞΌs (1 allocation: 896 bytes)

julia> @btime test3($something);
  3.441 ΞΌs (1 allocation: 896 bytes)

julia> @btime call_test1($something);
  3.692 ΞΌs (1 allocation: 896 bytes)

julia> @btime test2($something);
  2.986 ΞΌs (1 allocation: 896 bytes)

julia> @btime test3($something);
  3.778 ΞΌs (1 allocation: 896 bytes)


3 Likes

Yeah, I get exactly the same performance for all three:

julia> @btime call_test1($something);
  2.579 ΞΌs (1 allocation: 896 bytes)

julia> @btime test2($something);
  2.525 ΞΌs (1 allocation: 896 bytes)

julia> @btime test3($something);
  2.552 ΞΌs (1 allocation: 896 bytes)

which makes sense–I wouldn’t expect there to be any difference.

3 Likes

Thank you both. I agree with you, I was not expecting to find any differences. However, running many times @btime I keep getting the output I posted above (in relative terms). Can it be something specific to my laptop?

@btime reports the minimal time - under load, your laptop may throttle its speed, making your benchmark appear slower. Since your timings seem to get slower and slower, I’d guess during the first benchmark the CPU is running with a higher clockspeed than with subsequent benchmarks.

What CPU do you have?

Thanks. I have an Intel(R) Coreβ„’ i7-1060NG7 CPU @ 1.20GHz.