Performance of zeros() vs. Array{T}()?

In the following function, I see 1.56X speedup when using zeros() function for one input instead of using Array{T}(), which seems odd to me. When I use @btime though, I get the same timings.

julia> function vander(v, x, N::Int)
         M = length(x)
         if N > 0
           v[:,1] .= 1
         if N > 1
           for i = 2:N
             v[:,i] = x
           accumulate(v, v)
         return v
vander (generic function with 1 method)

julia> function accumulate(input, output)
         M, N = size(input)
         for i = 2:N
           for j = 1:M
             output[j,i] *= input[j,i-1]
accumulate (generic function with 1 method)

Now compare the timings of f() and g():

julia> function f()
         M, N = 10^8, 4
         x = rand(M)
         v = Array{Float64}(undef,M,N) # <-----
         t = @elapsed vander(v, x, N)
f (generic function with 1 method)

julia> f()

julia> function g()
         M, N = 10^8, 4
         x = rand(M)
         v = zeros(M,N)  # <-----
         t = @elapsed vander(v, x, N)
g (generic function with 1 method)

julia> g()

The overall function timing is telling you the whole story here —

For large uninitialized arrays, the operating system will sometimes lie to you and give you back a pointers to some space but it won’t have actually done any of the dirty work of allocating it for you. Zeros pays that cost for you upon writing zero to every element. The uninitialized Array constructor can sometimes defer that cost to the first time you write to it (or even to each page). See, e.g.:


Many thanks, I was thinking about something like this.

Am I reading this wrong? It seems to me that the timing indicated that the code was faster when
the array v was created with zeros. Only the function vander was timed. So why was it faster?
vander initializes v. Why would it matter how v was initialized (or not) before it was passed to vander ?

EDIT: by the way I am getting a speed up of 2.5 for using an array initialized to 0.0 instead of not initialized.

Julia Version 0.7.0
Commit a4cb80f3ed (2018-08-08 06:46 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

Sorry, I just got it. @mbauman is saying that the time is spent either up-front (initialized array), or later (uninitialized array). My timings of the entire f() or g() confirm that. Interesting…