Array{Float64}(undef, X) gives wrong results

I am writing a complex hydrological model in JUILIA.

https://github.com/joseph-pollacco/HyCatch-1D/tree/master/JULIA

Problem in file:
In file WaterBalance.jl

I was finding that when I run the model from the Julia REPL that rarely I get the correct results and other most of the times the results did not make sense (e.g. NaN in the computed Array).

I found that the problem was caused by initializing an empty array such as:

WaterBalance =Array{Float64}(undef, 50000)

I needed to perform a simple loop.

∑Evaporation= Array{Float64}(undef, 50000)
 function GLOBAL()
 for iT=2:N_iT
 …
  ∑Evaporation[iT] = ∑Evaporation[iT-1] + discret.ΔZ[N_iZevapo] * Qevaporation[iT] * ΔT[iT]

end

I required that

∑Evaporation [1] = 0.0

For reasons which I do not understand the solution I found was to initialize the Array by using zeros:

WaterBalance = zeros(Float64, 50000)

So my understanding is that there is a bug in Array.

While that is theoretically possible, it is much more likely that there is a bug in your code — you fail to overwrite some elements, and an Array{T}(undef, ...) can have random memory contents.

Isolating an MWE would be helpful, try to simplify your problem so that you can post it here.

3 Likes

Array{Float64}(undef, 50000) does not initialise the elements. That’s the undef part :slightly_smiling_face:.

4 Likes

Just put exactly this before your loop?

Exactly. Your routine starts with iT = 2 (from iT=2:N_iT) and then access the first element in ∑Evaporation[iT-1].

As @Tamas_Papp and @greg_plowman pointed out, undef will not initialise anything in your array, so you will get a random piece of memory.

In fact, you can see below that the probability to get a zero with an undef-Array with length 100000 on my machine is around 6% (YMMV and especially there are some memory management things going on which will make this not-so-predictable), so I guess that roughly reflects your observations:

I was finding that when I run the model from the Julia REPL that rarely I get the correct results and other most of the times the results did not make sense ( e.g. NaN in the computed Array ).

julia> sum(iszero.(Array{Float64}(undef, 100000))).  # in a fresh REPL session
6325

Note that executing this again indicates some kind of caching:

julia> sum(iszero.(Array{Float64}(undef, 100000)))
100000

julia> sum(iszero.(Array{Float64}(undef, 100000)))
100000
1 Like

Thanks for your answer. I did initialize before the loop.

∑Evaporation [1] = 0.0

But it did not sort out the problem. Sometimes the code run fine and other times no. The only solution was to initialize with zeros.

Thanks for providing a valid explanations of what I observed. Therefore what are the advantages of using:

Array{Float64}(undef, 50000)

Compared to use the traditional:

WaterBalance = zeros(Float64, 50000)

The first is faster if you are going to start with some nonzero value.

If you create an array with undef “elements”, you basically instruct your computer to allocate that amount of memory without caring what’s inside it. If you use zeros however, you tell that you want to fill them with zeros after the allocation (you initialise it).

Initialisation to a specific value takes an extra amount of time and there are cases where this might be performance relevant.

It depends on your algorithm whether you explicitly need initial values or not. If you use that array to dump values in it and you are sure that by design you will never (read) access any elements of them which are uninitialised – which will lead to unexpected values/behaviour – you can go with undef and squeeze a bit more performance out of the computer. This however is clearly not true for you routine above.

Here are some measurements with BenchmarkTools:

julia> using BenchmarkTools

julia> @btime Array{Float64}(undef, 50000);
  1.318 μs (2 allocations: 390.70 KiB)

julia> @btime zeros(Float64, 50000);
  11.076 μs (2 allocations: 390.70 KiB)
1 Like

Thanks Tamas, you beautifully explained the cons/pros of using undef.