The filter function is non-intuitive

TheLateKronos · August 28, 2020, 10:46am

So this is the current behaviour of the filter function, when given a function that returns a boolean and a collection (taken from the documentation):

julia> a = 1:10
1:10

julia> filter(isodd, a)
5-element Array{Int64,1}:
 1
 3
 5
 7
 9

I would argue that this is counter-inntuitive, at least to me, and I would like to discuss why too see what others think.

So my problem with it is the fact that a filter removes, ~~by definition~~ in my intuition. Something is filtered out from the whole. So to my mind, the function-call filter(isodd, a) is a filter that applies to the elements of a that are odd. This means that when the odd numbers are filtered, the function should, in my mind, return the even numbers.

Am I alone in this opinion? And as it is a breaking change, it is even a point to discuss this? I am ~~thinking it could be changed for 2.0, and that it is therefore a worthwhile discussion~~ interested in hearing other opinions on this.

GunnarFarneback · August 28, 2020, 11:11am

This change would be massively breaking and contrary to every other programming language with a filter function, but in no way improve the functional power of the function. No, this will not be considered.

pkofod · August 28, 2020, 11:14am

Sieve would probably be a better name but… yeah as stated above you’d have to really argue that it’s worth it

Tamas_Papp · August 28, 2020, 11:19am

Not really. A filter usually separates something into two parts. It is up to the user to decide what is kept (could be both, just needed to be separate). Eg an aggregate grading sieve would keep all parts, all of which are needed for the analysis.

The lesson is that one should not rely on intuition for these things — just read the docs if you are unsure.

Probably not, see

pfitzseb · August 28, 2020, 11:32am

Also, fwiw, all filter implementations I’ve ever seen work this way and return an array/iterable with elements for which the predicate returns true.

StefanKarpinski · August 28, 2020, 11:33am

This trips me up often as well because I think of the function filtering things out, but as others have said, there’s already a traditional meaning of this higher order function in other languages to consider. What might be more plausible is introducing a reject verb and maybe a corresponding select, although that already has many other meanings.

anon37204545 · August 28, 2020, 12:21pm

You can define something like this

filterout(f, a) = filter(!f, a)

to remedy your pain (from the phrase filter something out).

To me it makes more sense to apply a condition directly (i.e. the current filter approach), instead of negating it. Something akin to

if condition
    A
else
    B
end

instead of

if !condition
     B
else
     A
end

It simply requires less mental gymnastics.

xiaodai · August 28, 2020, 1:01pm

where is less ambiguous. filter was used by dplyr which uses odd verbs like arrange, mutate.

mbaz · August 28, 2020, 1:36pm

Depends on the context. In signal processing, a filter rejects part of the signal – there’s no way to get it back. One says, for example, “I need to filter this interference” to mean “reject the interference”. You can use filters to decompose a signal into its components, but you need more than one filter (for example, in a sound equalizer).

Coming from an EE, not computing, background, the behavior of filter in programming languages is counter-intuitive to me too, but I’ve gotten used to it.

ianfiske · August 28, 2020, 1:46pm

Although I completely agree with the sentiment that this should not be changed for all the reasons mentioned above, it’s worth noting that even in common use like “coffee filter”, not just EE use, “filter” often implies the removal. So confusion is warranted for any relatively new programmers, though they’d experience this regardless of language choice.

But it’s just something to get used to since, like others have said, this is the convention in all programming languages.

edit: also see Filter (higher-order function) - Wikipedia

non-Jedi · August 28, 2020, 1:55pm

For what it’s worth, in the domain of manufacturing filtration is an operation which separates a mixed liquid/solid stream into a liquid stream and solid stream. We call the liquid stream the “filtrate”, and the solids form a “cake”. Usually the filtrate is the product you’re most interested in but not always. If we treat the filter function as an analogy to this kind of filtration, it might make sense to have a keyword argument that controls which stream is returned (this would also allow getting both if wanted). Something like product=:filtrate, product=:cake, or product=:both.

Personally, I don’t think this is very beneficial, but I thought I would throw it out there.

Tamas_Papp · August 28, 2020, 2:23pm

Sure, there are lots of intuitive interpretations that go either way. The point is that there isn’t a single one, so people should just read the docs.

The only unambiguous approach I can imagine is to spell it out, cf COMMON-LISP:REMOVE-IF-NOT & friends.

The Wikipedia page for filter has a nice table summarizing syntax in various languages. I guess that filter in particular comes from the ML family historically, but I am not sure. In any case, Julia’s usage seems to be the common one in programming.

Zach_Christensen · August 28, 2020, 2:32pm

I have an idea. It might be a bit radical but I think it could solve this problem without any new functions or breaking changes.

filter(!isodd, 1:10)

GunnarFarneback · August 28, 2020, 2:42pm

And as it happens some of the most common filters like the lowpass filters specify by the name what is to be kept, just like programming languages do with their predicate.

TheLateKronos · August 28, 2020, 3:16pm

Thanks for enlightening replies. So it seems like I am not alone in my intuition, but that as it is massivly breaking and there is a lot of presidence for this implementation, things are fine for now.

If I keep having to think hard to have this make sense, I will implement my own filterout function, as suggested by @anon37204545. Thanks for everyone’s time and opinions

pauljurczak · August 28, 2020, 3:34pm

How about keep and reject?

mbaz · August 28, 2020, 3:49pm

Indeed – and the filter’s output is “the filtered signal”. I guess that’s why I never had as much trouble with filter as defined in programming as the OP. The point I was trying to make is that, in many contexts, a filter does not “separate” something into two parts which you get to keep, which was @Tamas_Papp’s definition.

TheLateKronos · August 28, 2020, 8:00pm

That is a fine band-aid, and I have concidered it. The original problem was however that I currently feel the need to do mental gymnastics to understand the code I write well, and this just feels like adding a flip to those gymnastics… Then it is better to just accept the current way the function works IMO

TheLateKronos · August 29, 2020, 8:55am

Huge fan of this idea. Really like the lack of ambiguity.

I personally like keep/discard or reject/select, as they feel like natural conceptual pairs.

DNF · August 29, 2020, 11:30am

A filter can remove, and a filter can keep. A dust filter removes dust, an air filter lets through air. A bandpass filter passes through some frequency bands. A bandstop filter removes them.

The specification depends on whether it’s easier to enumerate the removed or kept parts. Neither is more intuitive or obvious.

What does a coffee filter let through? Coffee!

Topic		Replies	Views
Why no single argument `filter!(f::Function)` or `Iterators.filter(f::Function)`? Internals & Design	19	668	August 6, 2025
Why not filt, filter as filter? General Usage	5	1068	June 24, 2017
Why does "filter" use an uncommon position for the input dataframe? General Usage dataframes	9	413	November 5, 2021
Function chaining with \|> and filter function General Usage question , piping	5	3685	November 1, 2018
Why no single argument `filter(f::function)`? Internals & Design	8	609	January 21, 2023

The filter function is non-intuitive

Related topics