Why are missing values not ignored by default?

There is a difference between changing the API at its root for everyone and changing the APIs for preference.

I’m still very confused why the majority of the conversation here is about changing some established API whereas Julia gives the user many facilities to customize the API both through packages and at the user level.

I’m particularly concerned about these comments. The Julia 2 reference is really a fanciful distraction. In reality, it is probably quite far away and ultimately will likely not contain the changes that you want. My expectation for Julia 2 is that it will only introduce some minor but really important breaking changes but keep functioning as is for the large majority of Julia 1 code.

More importantly, I’m not particularly convinced that this requires a breaking change to Julia or a fundamental change to how any of the packages mentioned work.

There are two approaches to customizing the API.

  1. Shadow the functions
  2. Introduce new types

Introducing new functions to shadow existing APIs

I’ve shown examples about how methods can be shadowed above. Essentially, this is reminiscent about how things are in Python. We emulate APIs by placing similarly named APIs in distinct namespaces. This is often discouraged in Julia because it by itself does not compose as well. However, it is significantly simpler and creates less risk of introducing compilation related issues such as invalidation. We can actually have it both ways in Julia by having core packages which implement their own APIs in separate namespaces and separate packages which overload Base methods and forward to the namedspaced versions.

Instead of mangling the name, why not just scope them into the module. They could be sm.mean and sm.sum? We could actually do both approaches. There could be a submodule that is meant for scoped names and another one with prefixed names, both pointing to the same underlying implementation.

Using Types to Modify the API

There are a few ways to introduce new types in order to customize an API. The one that seems to have been discussed above is to introduce a new kind of Missing. For UnsafeMissing I just want to point out that it is completely possible to implement that in a package.

Another approach for introducing new types is to create wrappers. In this example, we could wrap a DataFrame in order the change the behavior. This in turn could returned wrapped columns, which return wrapped or replaced missing values when indexed. This would allow us to effectively overlay our API preferences over the existing API.

Here’s an lightweight example of this.

Setup Code
julia> using CSV, DataFrames, Statistics


julia> struct SkipMissingDataFrame
           parent::DataFrame
       end

julia> Base.parent(smdf::SkipMissingDataFrame) = getfield(smdf, :parent)

julia> Base.getproperty(smdf::SkipMissingDataFrame, sym::Symbol) = skipmissing(Base.getproperty(parent(smdf), sym))

julia> write("blah.csv","""
       "col1", "col2"
       "5", "6"
       "1", "2"
       "30", "31"
       "22", "23"
       "NA"
       "50"
       """)
65
julia> df = CSV.read("blah.csv", DataFrame; silencewarnings=true);
julia> smdf = SkipMissingDataFrame(df)
SkipMissingDataFrame(6×2 DataFrame
 Row │ col1     col2    
     │ String3  Int64?  
─────┼──────────────────
   1 │ 5              6
   2 │ 1              2
   3 │ 30            31
   4 │ 22            23
   5 │ NA       missing 
   6 │ 50       missing )

julia> smdf.col2 |> mean
15.5

julia> smdf.col2 |> x->Iterators.filter(>(10),x) |> mean
27.0
3 Likes