Wouldn’t it be desirable to have the first filter call work? Potentially it could amount to the second one though it’s probably not ideal as per small benchmark below, likely better can be done:
julia> λ(e) = (e>0);
julia> filter_skipmissing(λ, x) = filter(e->(!ismissing(e) && λ(e)), x)
julia> x = [(rand()<0.3) ? missing : randn() for i = 1:500_000];
julia> @btime filter($λ, collect(skipmissing($x)));
8.071 ms (38 allocations: 8.00 MiB)
julia> @btime filter_skipmissing($λ, $x);
5.984 ms (18 allocations: 3.38 MiB)
I guess more generally something like this could be used:
julia> filter_itr(λ, itr) = [e for e ∈ itr if λ(e)];
julia> @btime filter_itr($λ, skipmissing($x));
5.672 ms (22 allocations: 3.00 MiB)
filter is for filtering arrays. Since skipmissing is an iterable, you can instead use Iterators.filter. An advantage of this approach is that it keeps the lazy property of the iterable. And if you do need to collect the output, call collect on the result:
As a side note, I hardly find Base.filter useful in my use cases. Is there a good example that motivates the existence of non-lazy versions in Base or it is just user-friendliness to avoid explicit calls to collect?
I think the interface accumulated historically, also there are some tricky questions (eg should filter narrow type like collect — currently it doesn’t, and it would be come type unstable).
This is also going to cause problems if you initially write code for a subset of data that doesn’t have missing values. Then when you realize the data actually does have missing values, you can’t just sprinkle skipmissing to all your function calls.
In addition to what Tamas mentioned, there’s also filter! which can be very useful. It’d be confusing I think if filter returned an iterable while filter! modified an array.