Assignment of a `missing` value fails in DataFrames 0.11.1

I just mean that AFAIK there’s no situation in Julia where an assignment can change the type of the container. When you assign a String to a Vector{Int}, it doesn’t turn into a Vector{Any}.

I think that at the very least this thread shows that there is a contingent that would appreciate automatic promotion.

The purpose of a library like DataFrames is to abstract from these behaviors, and I think the user would not be concerned by this behavior of columns.

Thanks for all the hard work on DataFrames! I’m excited to start exploring it’s new features.

1 Like

Right, right – that’d probably wreak havoc on type stability.

Well, I’m still in favor of defaulting to Union{Missings, OTHER_TYPE} for DataFrames (especially if @StefanKarpinski’s suggestion of using same memory is workable), but I will also admit learning of the allowmissing!(df) function does reduce my sense of its importance.

(Side note: as @pdeffebach pointed out: you DataFrame developers are doing amazing work – thanks so much for everything you’re doing!)

I for one am absolutely concerned with the behavior of columns. Any time you want to plug the column into a function or do anything at all with it, you potentially have to worry about this.

Now if not performing the promotion interfered with the “normal” operation of the dataframe, such as groupby’s, join’s, stacking, I’d agree there is something wrong; but one of the greatest features of Julia dataframes is that you don’t need specialized data structures to do this. Nothing is stopping you from putting any type of vector you want into the dataframe, but the default behavior should be to do nothing.

You may find that after spending more time using DataFrames you will come around to this way of thinking. When I first started using it, I’m not sure I had an opinion on this, but after having survived pandas, DataArrays, NullableArrays, I am completely convinced that DataFrames ability to use any AbstractVector object and to use simple Vectors by default is one of its “killer” features.

3 Likes

I would also like to add that the addition of allowmissing!(df) is great and solves all of these problems. With piping it is super easy to add that at the top of your code.

Once keyword arguments get finalized, maybe adding an option AllowMissings in the DataFrame constructor would be great, although this would probably involve more importing functions as well.

Regardless, thanks for the new function. This makes everything much easier.

Is there notallowmissing!(df; cols) or such in the works? I had coded an allowmissing! which I am happy to change for calling the DataFrames implementation. However, I would like have the opposite as well… especially helpful after calling dropmissing!.

The name should probably be disallowmissing! for such a function.

2 Likes

The name is getting kind of long, but if somebody wants to make a PR to add disallowmissing!, why not (should be quite easy). Also if somebody has an idea for a shorter/better replacement of the term “nullable”… :slight_smile:

This is an absolute lifesaver. Thanks a lot @nalimilan for this fix :+1: