How does Vector{Int64}(undef, 20) differ in a multi-threaded environment?

finb · May 27, 2022, 5:02pm

I’m running a program that allocates an array of 20 integers, which I want to be initialized with zeros. So to start off with, I did this:

myfunc(Vector{Int64}(undef, 20))

This worked as the array would be filled with zeros. But when I did this, it allocated arrays with numbers and that would lead to overflows, even if the threadcount was 1:

Threads.@threaded for i = 1:Threads.nthreads()
  myfunc(Vector{Int64}(undef, 20))
end

I now know that I should use zeros(Int64, 20) instead of allocating the array directly. But just for my information, how does multithreading affect whether calling the Vector constructor allocates zeros vs allocating garbage?

StefanKarpinski · May 27, 2022, 5:11pm

It doesn’t, except that what happens to be in memory might be different. Array initialization with undef gives you whatever happens to be in memory, which could be zeros but isn’t guaranteed to be in either case. If you keep doing that in either situation you’ll get non-zeros at some point.

StefanKarpinski · May 27, 2022, 5:12pm

For example, I just did that on my machine with a single thread and got this:

julia> Vector{Int64}(undef, 20)
20-element Vector{Int64}:
  2
 21
 23
 25
 26
 40
 43
 44
 46
 48
 49
 54
 55
 60
 61
 80
 82
 84
 85
 86

Then I did it again and got this:

julia> v = Vector{Int64}(undef, 20)
20-element Vector{Int64}:
 5337514192
 4717709280
 4717709280
 4356005896
 4356005896
 4356005896
 4717709280
 4717709280
 5337857376
 5337857664
 5337856848
 4717709280
 4717709280
 4356005896
 4356005896
 4356005896
 4356005896
 4356005896
 4356005896
 4356005896

There happen to be large chunks of memory that are all zeros though, but it’s just luck.

Elrod · May 27, 2022, 6:48pm

To be clear, undef is an UndefInitializer():

julia> undef
UndefInitializer(): array initializer with undefined values

By creating an undef array, you are explicitly asking for an array filled with junk values, with whatever happens to be in memory. Sometimes, zeros may be what just happens to be there.
zeros is for if you want zeros.

julia> zeros(Int, 20)
20-element Vector{Int64}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0

johnmyleswhite · May 27, 2022, 8:45pm

I did something like the following long ago and found it instructive about how often you can expect to get all zero memory when using undef for Int64 arrays:

function sim(n = 10, n_reps = 1_000_000)
    all_zeros = 0
    sum_zeros = 0

    for _ in 1:n_reps
        x = Array{Int}(undef, n)
        count_zeros = 0
        for i in 1:n
            count_zeros += (x[i] == 0)
        end
        all_zeros += (count_zeros == n)
        sum_zeros += count_zeros
    end

    (all_zeros / n_reps, sum_zeros / (n_reps * n))
end

all_zeros_freq, elementwise_zero_freq = sim()

On my laptop, opening a REPL and then running that whole block (including redefining the function) many times gradually drops the frequency of all zero memory as more and more dirty memory gets reused. The first round produces something like (0.997094, 0.9973374), but I can get it down below 90% if I cheap copying and pasting the same code.

Topic		Replies	Views
When using Vector{T}(undef,n) constructor, what are the values set to? General Usage	3	503	January 26, 2022
Help me understand vector initialization Performance	7	574	January 18, 2023
Array{Float64}(undef, X) gives wrong results General Usage bug	9	1468	September 23, 2019
Argument initialisation affects performance of @threads? New to Julia question	3	342	August 8, 2019
Meaning and alternatives to "undef" when initializing vectors New to Julia	11	2464	June 4, 2020

How does Vector{Int64}(undef, 20) differ in a multi-threaded environment?

Related topics