Hi,
I have noticed that passing struct fields as function arguments generally results in better performance compared to passing the struct directly. For instance,
using BenchmarkTools;
using Random;
struct MyStruct
A::Vector{Float64}
B::Matrix{Float64}
end
Random.seed!(1);
something = MyStruct(randn(100), randn(100,100));
function test1(A::Vector{Float64}, B::Matrix{Float64})
return B*A;
end
call_test1(something::MyStruct) = test1(something.A, something.B);
function test2(something::MyStruct)
return something.B*something.A;
end
@inline function test3(something::MyStruct)
return something.B*something.A;
end
julia> @btime call_test1($something);
2.331 ΞΌs (1 allocation: 896 bytes)
julia> @btime test2($something);
2.517 ΞΌs (1 allocation: 896 bytes)
julia> @btime test3($something);
2.598 ΞΌs (1 allocation: 896 bytes)
julia> @benchmark call_test1($something)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min β¦ max): 2.428 ΞΌs β¦ 94.531 ΞΌs β GC (min β¦ max): 0.00% β¦ 95.36%
Time (median): 2.899 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 2.978 ΞΌs Β± 1.030 ΞΌs β GC (mean Β± Ο): 0.30% Β± 0.95%
ββ
βββ
ββββ
βββ
ββββββββββββ
βββββββββββββββββββββββββββββββββββββββ β
2.43 ΞΌs Histogram: frequency by time 4.86 ΞΌs <
Memory estimate: 896 bytes, allocs estimate: 1.
julia> @benchmark test2($something)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min β¦ max): 2.664 ΞΌs β¦ 95.310 ΞΌs β GC (min β¦ max): 0.00% β¦ 95.29%
Time (median): 3.091 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 3.363 ΞΌs Β± 1.503 ΞΌs β GC (mean Β± Ο): 0.27% Β± 0.95%
βββββ
ββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββ β
2.66 ΞΌs Histogram: frequency by time 7.51 ΞΌs <
Memory estimate: 896 bytes, allocs estimate: 1.
julia> @benchmark test3($something)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min β¦ max): 2.379 ΞΌs β¦ 97.867 ΞΌs β GC (min β¦ max): 0.00% β¦ 96.19%
Time (median): 3.214 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 3.586 ΞΌs Β± 1.513 ΞΌs β GC (mean Β± Ο): 0.26% Β± 0.96%
ββ ββββββββ
βββ
ββ ββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β
2.38 ΞΌs Histogram: log(frequency) by time 8.93 ΞΌs <
Memory estimate: 896 bytes, allocs estimate: 1.
Can someone explain why? Any link to the appropriate documentation (if available) would be appreciated.