Why are missing values not ignored by default?

mkitti · November 28, 2023, 7:06pm

There is a difference between changing the API at its root for everyone and changing the APIs for preference.

I’m still very confused why the majority of the conversation here is about changing some established API whereas Julia gives the user many facilities to customize the API both through packages and at the user level.

I’m particularly concerned about these comments. The Julia 2 reference is really a fanciful distraction. In reality, it is probably quite far away and ultimately will likely not contain the changes that you want. My expectation for Julia 2 is that it will only introduce some minor but really important breaking changes but keep functioning as is for the large majority of Julia 1 code.

More importantly, I’m not particularly convinced that this requires a breaking change to Julia or a fundamental change to how any of the packages mentioned work.

There are two approaches to customizing the API.

Shadow the functions
Introduce new types

Introducing new functions to shadow existing APIs

I’ve shown examples about how methods can be shadowed above. Essentially, this is reminiscent about how things are in Python. We emulate APIs by placing similarly named APIs in distinct namespaces. This is often discouraged in Julia because it by itself does not compose as well. However, it is significantly simpler and creates less risk of introducing compilation related issues such as invalidation. We can actually have it both ways in Julia by having core packages which implement their own APIs in separate namespaces and separate packages which overload Base methods and forward to the namedspaced versions.

Instead of mangling the name, why not just scope them into the module. They could be sm.mean and sm.sum? We could actually do both approaches. There could be a submodule that is meant for scoped names and another one with prefixed names, both pointing to the same underlying implementation.

Using Types to Modify the API

There are a few ways to introduce new types in order to customize an API. The one that seems to have been discussed above is to introduce a new kind of Missing. For UnsafeMissing I just want to point out that it is completely possible to implement that in a package.

Another approach for introducing new types is to create wrappers. In this example, we could wrap a DataFrame in order the change the behavior. This in turn could returned wrapped columns, which return wrapped or replaced missing values when indexed. This would allow us to effectively overlay our API preferences over the existing API.

Here’s an lightweight example of this.

Setup Code

julia> using CSV, DataFrames, Statistics


julia> struct SkipMissingDataFrame
           parent::DataFrame
       end

julia> Base.parent(smdf::SkipMissingDataFrame) = getfield(smdf, :parent)

julia> Base.getproperty(smdf::SkipMissingDataFrame, sym::Symbol) = skipmissing(Base.getproperty(parent(smdf), sym))

julia> write("blah.csv","""
       "col1", "col2"
       "5", "6"
       "1", "2"
       "30", "31"
       "22", "23"
       "NA"
       "50"
       """)
65

julia> df = CSV.read("blah.csv", DataFrame; silencewarnings=true);
julia> smdf = SkipMissingDataFrame(df)
SkipMissingDataFrame(6×2 DataFrame
 Row │ col1     col2    
     │ String3  Int64?  
─────┼──────────────────
   1 │ 5              6
   2 │ 1              2
   3 │ 30            31
   4 │ 22            23
   5 │ NA       missing 
   6 │ 50       missing )

julia> smdf.col2 |> mean
15.5

julia> smdf.col2 |> x->Iterators.filter(>(10),x) |> mean
27.0

Topic		Replies	Views
What workflows for missing values are more ergonomic in Julia? Internals & Design	2	363	November 30, 2023
Compute mean of array where all values could be missing New to Julia	5	392	April 21, 2021
DataFrames, aggregate with missings Data dataframes	2	560	May 4, 2020
Using `isnan()` with missing values leads to hard to find bugs General Usage	6	520	April 12, 2020
Missing of a certain data type General Usage	5	486	February 15, 2019

Why are missing values not ignored by default?

Introducing new functions to shadow existing APIs

Using Types to Modify the API

Related topics