Is there any reason to use NaN instead of missing?

I know that NaN isn’t going away because of the IEEE 754 standard on floating point behavior, but is there ever a reason to use NaN instead of converting and using missings as much as possible? Both are propagating non-values, only NaN seems to be less good at it e.g. NaN^0 evaluates to 1.0. The extra bit of type instability of Missing unions seems well-contained by Missing propagation and optimized by small Union optimizations, so checking ismissing never seemed different from checking isnan.

I’ve also heard about two kinds of NaNs in the IEEE standard, quiet vs signalling, but I haven’t seen this distinction in Julia yet. Is that important for anything that missing doesn’t cover?

4 Likes

I think the type instability is relevant in many settings, for example

using BenchmarkTools

x = rand(1000)
y = rand(1000)

y[500] = NaN
z = vcat(y[1:499], missing, y[501:end])

function test_add(x,y)
    for i in 1:1000 
        y[i] = x[i] + y[i] 
    end
    y
end

@btime test_add($x, $y)   # 723.846 ns (0 allocations: 0 bytes)
@btime test_add($x, $z)   # 966.667 ns (0 allocations: 0 bytes)
3 Likes

Yes, there are lots of performance (type stability) reasons to use NaN!
The overhead with missing can be more than an order of magnitude even in this simple example:

julia> x_nan = [([1., 2., 3.],), ([1., 2., 3., NaN],)]

julia> x_missing = [([1., 2., 3.],), ([1., 2., 3., missing],)]

julia> @btime map(x -> sum.(x), $x_nan)
  35.578 ns (1 allocation: 80 bytes)

julia> @btime map(x -> sum.(x), $x_missing)
  378.745 ns (8 allocations: 240 bytes)
7 Likes

NaN and missing denote semantically different things. Idiomatically, NaN is an indication of an arithmetic failure (eg 0.0*Inf) or is sometimes used to poison computations. The math rules for NaN mostly makes sense if one interprets NaN as “this is number, but I don’t know which number it should be” (hence NaN^0.0==1.0, since any number to the zero power is one - although 0^0 is sometimes debated). missing refers to a value that never existed in the first place or was derived from such a number.

Sometimes, one doesn’t need this semantic distinction, in which case NaN can often serve the same role as missing (mostly as a poison value).

There will rarely be a performance benefit to using missing, but it can be make the code much more sensible in places. That said, the performance penalty is usually not huge (although can be significant in micro-benchmarks). I wouldn’t avoid missing for performance reasons if you’re actually trying to represent missing values.

As for quiet qNaN versus signaling sNaN, Julia does not make any distinction between the two. As far as I understand, Julia will never produce a sNaN in typical operation. Calling reinterpret is the primary way I’d expect them to arise in Julia. That said, most arithmetic operations on a sNaN will produce a qNaN, so they don’t tend to propagate very far.

5 Likes

Just to summarize this thread.
NaN represents a real number that is unknown from the input expression. NaN does not intend to represent missing data that always propagates in operations, or even all the indeterminate forms arising from limits:

julia> NaN^0, 1^NaN, NaN > 0 # nonpropagating cases
(1.0, 1.0, false)

julia> 0/0, Inf/Inf, 0*Inf, Inf-Inf, 0^0, 1^Inf, Inf^0 # indeterminate forms
(NaN, NaN, NaN, NaN, 1, 1.0, 1.0)

I should use missing and skipmissing when it’s proper and accept some performance hits as necessary. And if it helps performance to condense non-missing data, I could use collect or append! on Base.SkipMissing instances.

1 Like

NaNs are very often used to represent missing data in grids. For grids, it makes no sense to have them replace with missing specially if one want to save those grids in disk files where libraries like netCDF have no idea what a missing is.

3 Likes

IMO, this makes no sense. NaN stands for NotaNumber so NaN^0 should be NaN, not 1

In Matlab

>> NaN^0

ans =

   NaN

According to what wikipedia says about IEEE754-2008, there are several different versions of the power function that should have different NaN behavior. One of which is that qNaN^0.0 should return 1.0.

1 Like

Quoting section 9.2.1 of IEEE 754-2008: “pow(x, ±0) is 1 for any x (even a zero, quiet NaN, or infinity)”. Ditto for “pown(x, 0)”, which is for integer exponents. Maybe one could consider IEEE’s intepretation of “not a number” to be more like “not 1 particular number”.

Adherence to IEEE 754 isn’t absolute. For another example, “squareRoot if the operand is less than zero” is listed as an operation that should return a quiet NaN by default. Julia throws a DomainError, Python’s math.sqrt raises a ValueError, and NumPy’s sqrt prints a RuntimeWarning the first time and returns nan.

All this said, if your data are naturally all floating point numbers and for any reason you want to use NaN like one would normally use missing, you certainly can do so (e.g. GitHub - brenhinkeller/NaNStatistics.jl: Fast summary statistics, histograms, and binning – ignoring NaNs) – in that particular case it has some potential performance benefits (doesn’t require an extra mask, can be SIMD’d more easily, etc.).

3 Likes

Makes me wonder if Union{<:AbstractFloat, Missing} elements can be implemented with NaNs instead of a UInt8 type tags somehow. However, a particular concrete type not sharing the structure of the parametric supertype would be a weird outlier, and I can’t imagine how to make indexing a missing element from a float array consistent with doing so from any other array. NaNStatistics.jl doesn’t deal with missings at all, so it gets to use Aᵢ==Aᵢ as a !isnan for all Numbers and to skip float NaNs like missings (although, I think I can break this with a custom non-float Number where == is weirdly false).

1 Like

In principle you certainly could, because there’s something like 48 bits of payload in Float64 NaNs which can take arbitrary values – and apparently people do all sorts of wild things with this. As I learned from this thread Slack about tinylisp, modern x86 64-bit pointers only actually use 48 bits, so you can fit a whole pointer in the payload of a NaN (which is apparently called “NaN boxing”)

2 Likes