Can someone explain why the memory allocation is differnet between foo and foo2 and increased time between foo2(21) and foo2(22).
using Distributed,SharedArrays,BenchmarkTools,StatsBase
function foo(m::Int64)
z = SharedArray{Int64, 1}(m)
for i = 1:m
z[i] = i
end
StatsBase.mean(z)
end
foo (generic function with 1 method)
function foo2(m::Int64)
z ::Float64 = 0.0
for i = 1:m
z +=i/m
end
z
end
foo2 (generic function with 1 method)
# function foo3(m::Int64)
# z=0.0
# for i = 1:m
# z += i/m
# end
# z
# end
@btime foo(10)
@btime foo(21)
@btime foo(22)
@btime foo2(10)
@btime foo2(21)
@btime foo2(22)
27.077 μs (104 allocations: 4.66 KiB)
27.077 μs (104 allocations: 4.66 KiB)
27.897 μs (104 allocations: 4.66 KiB)
1.230 ns (0 allocations: 0 bytes)
1.230 ns (0 allocations: 0 bytes)
21.743 ns (0 allocations: 0 bytes)
11.5
@btime foo2(100)
100.228 ns (0 allocations: 0 bytes)
50.5
foo2(10)
5.5
1.2ns is so short the entire loop is probably compiled away by the compiler. After a certain limit on the loop length, it seems the compiler no longer does this?
Inspected the output of @code_llvm or @code_native?
Since you use constants, the compiler smart to constant fold the results and giving the results at runtime until it reaches a milestone. If constant folding is suppressed I get
julia> for i in 17:23
println(i); @btime foo2($i)
end
17
25.539 ns (0 allocations: 0 bytes)
18
26.766 ns (0 allocations: 0 bytes)
19
27.859 ns (0 allocations: 0 bytes)
20
29.131 ns (0 allocations: 0 bytes)
21
30.288 ns (0 allocations: 0 bytes)
22
31.514 ns (0 allocations: 0 bytes)
23
32.530 ns (0 allocations: 0 bytes)
I have difficulty in understanding the concept of constant folding.
You mean to say by prefixing the argument to foo2(), the constant folding is suppressed. Since I haven’t not used constant folding, there is something happening at foo2(22j which leads to a sudden jump in runtime. Is it related to architecture of compiler or the machine I am using (Win10 : i7)? Can it be altered by the user?
I am yet to learn how to interpret the code_llvm or code_native output. Is there any simple and easy documentation around this?
some more results. why the foo takes over?
@btime foo(45)
@btime foo2(45)
27.487 μs (104 allocations: 4.66 KiB)
44.717 ns (0 allocations: 0 bytes)
PS sorry it it not. I misunderstood the timing.
Constant fold is something like this, say there is hypothetical function g(x) = x*foo2(10)
. When you call g(...)
, the compiler is smart so the result of foo2(10)
will be replaced the value itself. When using @btime foo2(10)
vs x=10; @btime foo2($x)
, the former just uses result of foo2(10)
at runtime but the latter computes foo2(x)
each time it is called.
1 Like
@tomaklutfu: Thanks for explaining.