Performance of zeros() vs. Array{T}()?

In the following function, I see 1.56X speedup when using zeros() function for one input instead of using Array{T}(), which seems odd to me. When I use @btime though, I get the same timings.

julia> function vander(v, x, N::Int)
         M = length(x)
         if N > 0
           v[:,1] .= 1
         end
         if N > 1
           for i = 2:N
             v[:,i] = x
           end
           accumulate(v, v)
         end
         return v
       end
vander (generic function with 1 method)

julia> function accumulate(input, output)
         M, N = size(input)
         for i = 2:N
           for j = 1:M
             output[j,i] *= input[j,i-1]
           end
         end
       end
accumulate (generic function with 1 method)

Now compare the timings of f() and g():

julia> function f()
         M, N = 10^8, 4
         x = rand(M)
         v = Array{Float64}(undef,M,N) # <-----
         t = @elapsed vander(v, x, N)
       end
f (generic function with 1 method)

julia> f()
1.2380960570000001

julia> function g()
         M, N = 10^8, 4
         x = rand(M)
         v = zeros(M,N)  # <-----
         t = @elapsed vander(v, x, N)
       end
g (generic function with 1 method)

julia> g()
0.77949281

The overall function timing is telling you the whole story here —

For large uninitialized arrays, the operating system will sometimes lie to you and give you back a pointers to some space but it won’t have actually done any of the dirty work of allocating it for you. Zeros pays that cost for you upon writing zero to every element. The uninitialized Array constructor can sometimes defer that cost to the first time you write to it (or even to each page). See, e.g.:

8 Likes

Many thanks, I was thinking about something like this.

Am I reading this wrong? It seems to me that the timing indicated that the code was faster when
the array v was created with zeros. Only the function vander was timed. So why was it faster?
vander initializes v. Why would it matter how v was initialized (or not) before it was passed to vander ?

EDIT: by the way I am getting a speed up of 2.5 for using an array initialized to 0.0 instead of not initialized.

Julia Version 0.7.0
Commit a4cb80f3ed (2018-08-08 06:46 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

Sorry, I just got it. @mbauman is saying that the time is spent either up-front (initialized array), or later (uninitialized array). My timings of the entire f() or g() confirm that. Interesting…