Hello everyone! I am pleased to announce the release of a new package: MissingsAsFalse.jl. As the name implies, it helps you write code where comparisons with missing are treated as false. Which is often what is needed in data-cleaning operations.
MissingsAsFalse.jl provides a single macro, @mfalse, which sets treats missing values as false in select operations. These include
- Comparisons,
==,>,<,>=, and<=, as well as their broadcasted equivalents for arrays. - Control flow, meaning
ifandelseif, as well as ternary commands,a ? b : c - Short circuiting compairons,
&&and|| - Boolean indexing,
y[x]wherexis a boolean array which containsmissing.
In Julia, missing values are represents by missing. This is equivalent to NA in R and . in Stata. In Julia, Boolean operations with missing also propagate missing-ness.
julia> missing == 1
missing
This is philosophically satisfying. Because missing values represent what we “don’t know”, it makes sense that we “don’t know” the outcome of a comparison between missing and a known object. But this propagation becomes increasingly burdensome when writing complicated code. In particular, control flow in Julia, like if statements, error on missing values.
julia> if missing == 1
println("Hello, Earth")
else
println("Hello, Mars")
end
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
[1] top-level scope
@ REPL[3]:1
In almost all cases, we want the above statement to print "Hello, Mars", instead of throwing an error.
Proper handling of missing values requires people to use isequal(missing, 1) instead of missing == 1. The former will return false while the latter will return missing, as shown above. But writing isequal everywhere is burdensome. To help, MissingsAsFalse.jl provides the macro @mfalse. Inside code affected by @mfalse, the Boolean comparisons which normally return missing instead return false.
This works on control flow
julia> @mfalse if missing == 1
println("Hello, world")
else
println("Hello, Mars")
end
Hello, Mars
Greater than and less than comparisons
julia> x = missing;
julia> @mfalse if x > 100
1
else
100
end
100
As well as vectorized comparisons .== , .>, .< etc. Note: These will allocate a new array. I would appreciate help on how to accomplish this without too many extra allocations.
Short-circuiting
julia> @mfalse x == 1 && true
false
julia> @mfalse x == 1 || true
true
and boolean indexing
julia> y = [1, 2, 3, 4];
julia> inds = [true, false, true, missing];
julia> @mfalse y[inds]
2-element Vector{Int64}:
1
3
For complete documentation, see julia> ? @mfalse.