The filter function is non-intuitive

So this is the current behaviour of the filter function, when given a function that returns a boolean and a collection (taken from the documentation):

julia> a = 1:10
1:10

julia> filter(isodd, a)
5-element Array{Int64,1}:
 1
 3
 5
 7
 9

I would argue that this is counter-inntuitive, at least to me, and I would like to discuss why too see what others think.

So my problem with it is the fact that a filter removes, by definition in my intuition. Something is filtered out from the whole. So to my mind, the function-call filter(isodd, a) is a filter that applies to the elements of a that are odd. This means that when the odd numbers are filtered, the function should, in my mind, return the even numbers.

Am I alone in this opinion? And as it is a breaking change, it is even a point to discuss this? I am thinking it could be changed for 2.0, and that it is therefore a worthwhile discussion interested in hearing other opinions on this.

3 Likes

This change would be massively breaking and contrary to every other programming language with a filter function, but in no way improve the functional power of the function. No, this will not be considered.

14 Likes

Sieve would probably be a better name but… yeah as stated above you’d have to really argue that it’s worth it :slight_smile:

3 Likes

Not really. A filter usually separates something into two parts. It is up to the user to decide what is kept (could be both, just needed to be separate). Eg an aggregate grading sieve would keep all parts, all of which are needed for the analysis.

The lesson is that one should not rely on intuition for these things — just read the docs if you are unsure.

Probably not, see

5 Likes

Also, fwiw, all filter implementations I’ve ever seen work this way and return an array/iterable with elements for which the predicate returns true.

3 Likes

This trips me up often as well because I think of the function filtering things out, but as others have said, there’s already a traditional meaning of this higher order function in other languages to consider. What might be more plausible is introducing a reject verb and maybe a corresponding select, although that already has many other meanings.

19 Likes

You can define something like this

filterout(f, a) = filter(!f, a)

to remedy your pain (from the phrase filter something out).

To me it makes more sense to apply a condition directly (i.e. the current filter approach), instead of negating it. Something akin to

if condition
    A
else
    B
end

instead of

if !condition
     B
else
     A
end

It simply requires less mental gymnastics.

7 Likes

where is less ambiguous. filter was used by dplyr which uses odd verbs like arrange, mutate.

1 Like

Depends on the context. In signal processing, a filter rejects part of the signal – there’s no way to get it back. One says, for example, “I need to filter this interference” to mean “reject the interference”. You can use filters to decompose a signal into its components, but you need more than one filter (for example, in a sound equalizer).

Coming from an EE, not computing, background, the behavior of filter in programming languages is counter-intuitive to me too, but I’ve gotten used to it.

2 Likes

Although I completely agree with the sentiment that this should not be changed for all the reasons mentioned above, it’s worth noting that even in common use like “coffee filter”, not just EE use, “filter” often implies the removal. So confusion is warranted for any relatively new programmers, though they’d experience this regardless of language choice.

But it’s just something to get used to since, like others have said, this is the convention in all programming languages.

edit: also see Filter (higher-order function) - Wikipedia

4 Likes

For what it’s worth, in the domain of manufacturing filtration is an operation which separates a mixed liquid/solid stream into a liquid stream and solid stream. We call the liquid stream the “filtrate”, and the solids form a “cake”. Usually the filtrate is the product you’re most interested in but not always. If we treat the filter function as an analogy to this kind of filtration, it might make sense to have a keyword argument that controls which stream is returned (this would also allow getting both if wanted). Something like product=:filtrate, product=:cake, or product=:both.

Personally, I don’t think this is very beneficial, but I thought I would throw it out there.

3 Likes

Sure, there are lots of intuitive interpretations that go either way. The point is that there isn’t a single one, so people should just read the docs.

The only unambiguous approach I can imagine is to spell it out, cf COMMON-LISP:REMOVE-IF-NOT & friends.

The Wikipedia page for filter has a nice table summarizing syntax in various languages. I guess that filter in particular comes from the ML family historically, but I am not sure. In any case, Julia’s usage seems to be the common one in programming.

3 Likes

I have an idea. It might be a bit radical but I think it could solve this problem without any new functions or breaking changes.

filter(!isodd, 1:10)

:smiley:

1 Like

And as it happens some of the most common filters like the lowpass filters specify by the name what is to be kept, just like programming languages do with their predicate.

6 Likes

Thanks for enlightening replies. So it seems like I am not alone in my intuition, but that as it is massivly breaking and there is a lot of presidence for this implementation, things are fine for now.

If I keep having to think hard to have this make sense, I will implement my own filterout function, as suggested by @anon37204545. Thanks for everyone’s time and opinions :slight_smile:

5 Likes

How about keep and reject?

7 Likes

Indeed – and the filter’s output is “the filtered signal”. I guess that’s why I never had as much trouble with filter as defined in programming as the OP. The point I was trying to make is that, in many contexts, a filter does not “separate” something into two parts which you get to keep, which was @Tamas_Papp’s definition.

1 Like

That is a fine band-aid, and I have concidered it. The original problem was however that I currently feel the need to do mental gymnastics to understand the code I write well, and this just feels like adding a flip to those gymnastics… Then it is better to just accept the current way the function works IMO

Huge fan of this idea. Really like the lack of ambiguity.

I personally like keep/discard or reject/select, as they feel like natural conceptual pairs.

2 Likes

A filter can remove, and a filter can keep. A dust filter removes dust, an air filter lets through air. A bandpass filter passes through some frequency bands. A bandstop filter removes them.

The specification depends on whether it’s easier to enumerate the removed or kept parts. Neither is more intuitive or obvious.

What does a coffee filter let through? Coffee! :wink:

11 Likes