Memrory allocation

Can someone explain why the memory allocation is differnet between foo and foo2 and increased time between foo2(21) and foo2(22).

using Distributed,SharedArrays,BenchmarkTools,StatsBase
function foo(m::Int64)
   z = SharedArray{Int64, 1}(m)
   for i = 1:m
       z[i] = i
   end
   StatsBase.mean(z)
end
foo (generic function with 1 method)
function foo2(m::Int64)
   z ::Float64 = 0.0
     for i = 1:m
        z +=i/m
   end
   z
end
foo2 (generic function with 1 method)
# function foo3(m::Int64)
#     z=0.0
#     for i = 1:m
#         z +=  i/m
#    end
#    z
# end
@btime foo(10)
@btime foo(21)
@btime foo(22)
@btime foo2(10)
@btime foo2(21)
@btime foo2(22)
  27.077 μs (104 allocations: 4.66 KiB)
  27.077 μs (104 allocations: 4.66 KiB)
  27.897 μs (104 allocations: 4.66 KiB)
  1.230 ns (0 allocations: 0 bytes)
  1.230 ns (0 allocations: 0 bytes)
  21.743 ns (0 allocations: 0 bytes)
11.5
@btime foo2(100)
  100.228 ns (0 allocations: 0 bytes)
50.5
foo2(10)
5.5

1.2ns is so short the entire loop is probably compiled away by the compiler. After a certain limit on the loop length, it seems the compiler no longer does this?

Inspected the output of @code_llvm or @code_native?

Since you use constants, the compiler smart to constant fold the results and giving the results at runtime until it reaches a milestone. If constant folding is suppressed I get

julia> for i in 17:23
       println(i); @btime foo2($i)
       end
17
  25.539 ns (0 allocations: 0 bytes)
18
  26.766 ns (0 allocations: 0 bytes)
19
  27.859 ns (0 allocations: 0 bytes)
20
  29.131 ns (0 allocations: 0 bytes)
21
  30.288 ns (0 allocations: 0 bytes)
22
  31.514 ns (0 allocations: 0 bytes)
23
  32.530 ns (0 allocations: 0 bytes)

I have difficulty in understanding the concept of constant folding.
You mean to say by prefixing the argument to foo2(), the constant folding is suppressed. Since I haven’t not used constant folding, there is something happening at foo2(22j which leads to a sudden jump in runtime. Is it related to architecture of compiler or the machine I am using (Win10 : i7)? Can it be altered by the user?

I am yet to learn how to interpret the code_llvm or code_native output. Is there any simple and easy documentation around this?

some more results. why the foo takes over?

@btime foo(45)
@btime foo2(45)

  27.487 μs (104 allocations: 4.66 KiB)
  44.717 ns (0 allocations: 0 bytes)

PS sorry it it not. I misunderstood the timing.

Constant fold is something like this, say there is hypothetical function g(x) = x*foo2(10) . When you call g(...), the compiler is smart so the result of foo2(10) will be replaced the value itself. When using @btime foo2(10) vs x=10; @btime foo2($x), the former just uses result of foo2(10) at runtime but the latter computes foo2(x) each time it is called.

1 Like

@tomaklutfu: Thanks for explaining.