If we get rid of the random number, can’t the compiler just calculate the result? I think that is what I am seeing when I make it simpler.
Results differ on my PC vs laptop (which are also different Julia versions…) and depending on the optimization flag.
Slimmed MWE
using Parameters, StaticArrays, Accessors, BenchmarkTools
using Distributions, Random
@with_kw struct Str00{N}
fat :: SVector{N, Float64}
sh_c0 :: SVector{N, Float64}
sh_cm :: SVector{N, Float64}
end
Random.seed!(1234)
bg1 = Str00(fat = SVector{16}(fill(Float64(1.0), 16)),
sh_c0 = SVector{16}(rand(Uniform(4, 25), 16)),
sh_cm = SVector{16}(rand(Uniform(4, 25), 16)))
bg2 = Str00(fat = SVector{16}(fill(Float64(1.0), 16)),
sh_c0 = SVector{16}(rand(Uniform(4, 25), 16)),
sh_cm = SVector{16}(rand(Uniform(4, 25), 16)))
gt = (bg1, bg2)
function test1(gt)
for t in 1:100
@reset gt[1].sh_cm .= gt[1].sh_c0 .* gt[1].fat
@reset gt[2].sh_cm .= gt[2].sh_c0 .* gt[2].fat
end
return gt
end
function test2(gt)
for t in 1:100
gt = map(gt) do x
@reset x.sh_cm .= x.sh_c0 .* x.fat
return x
end
end
return gt
end
@btime test1($gt);
@btime test2($gt);
res1 = test1(gt);
res2 = test2(gt);
res1 == res2
System info
Laptop (i5-6300u Julia 1.10.4)
PC (2990wx Julia 1.11.0-rc-1)
Laptop
- No flag, does not depend on outer loop length
julia> @btime test1($gt);
23.730 ns (0 allocations: 0 bytes)
julia> @btime test2($gt);
24.456 ns (0 allocations: 0 bytes)
Unless the -O1 optimization flag is used (but not -O2/-O3):
julia> @btime test1($gt);
12.909 μs (0 allocations: 0 bytes)
julia> @btime test2($gt);
12.497 μs (0 allocations: 0 bytes)
PC
- No flag, test2 depends on outer loop
julia> @btime test1($gt);
24.407 ns (0 allocations: 0 bytes)
julia> @btime test2($gt);
6.592 μs (0 allocations: 0 bytes)
PC -O1 flag
julia> @btime test1($gt);
8.403 μs (0 allocations: 0 bytes)
julia> @btime test2($gt);
9.490 μs (0 allocations: 0 bytes)
PC -O2 (or -03) flag
julia> @btime test1($gt);
25.980 ns (0 allocations: 0 bytes)
julia> @btime test2($gt);
6.594 μs (0 allocations: 0 bytes)