Filter by array of tuples


#1

I found weird behavior of tuple mapping on variables. I need to get indices of elements by a rule.

This code is working fine too:

names = ["aa", "ab", "ca"]
filter((i) -> startswith(i[2], "a"), collect(enumerate(names)))

I specified one variable for the tuple and, next I’m taking value by index - i[2]. And it is not good.

But the code:

names = ["aa", "ab", "ca"]
filter((i, name) -> startswith(name, "a"), collect(enumerate(names)))

doesn’t work. I’m getting error:

ERROR: MethodError: no method matching (::getfield(Main, Symbol("##143#144")))(::Tuple{Int64,Symbol})
Closest candidates are:
  #143(::Any, ::Any) at none:1
Stacktrace:
 [1] mapfilter(::getfield(Main, Symbol("##143#144")), ::typeof(push!), ::Array{Tuple{Int64,Symbol},1}, ::Array{Tuple{Int64,Symbol},1}) at ./abstractset.jl:336
 [2] filter(::Function, ::Array{Tuple{Int64,Symbol},1}) at ./array.jl:2352
 [3] top-level scope at none:0

At the same time, following code works fine:

names = ["aa", "ab", "ca"]
@show arr = collect(enumerate(names))
(i, name) = arr[1]
@show i
@show name

I’m getting:

arr = collect(enumerate(names)) = Tuple{Int64,String}[(1, "aa"), (2, "ab"), (3, "ca")]
i = 1
name = "aa"

Is it a bug or I’m doing something wrong?
Julia 1.0.3


#2

As you probably know, the syntax

i -> startswith(i[2], "a")

creates an anonymous function similar to

function __anonymous__(i)
      startswith(i[2], "a")
end

Now the syntax

(i, name) -> startswith(name, "a")

creates an anonymous function taking two arguments (as opposed to a function taking one argument which is a tuple of two values, as in the first, working, case):

function __anonymous__(i, name)
      startswith(name, "a")
end

Hence the error message telling you that the correct method is not found: since enumerate produces a collection of Tuple{Int64,String}, this is what your function has to take as argument.


So if you want to “deconstruct” the tuple, you have to do it inside the function (at least I don’t know of any other way):

julia> filter(tuple-> let (i, name)=tuple; startswith(name, "a") end,
              collect(enumerate(["aa", "ab", "ca"])))
2-element Array{Tuple{Int64,String},1}:
 (1, "aa")
 (2, "ab")

BTW, using names as the name of your test variable is probably not a good idea, as it collides with Base.names


#3

ok, thanks. Explicit assignment is a little bit non obvious way to expand arguments for lambda. Too many code…

Regarding names, sure. It was a fragment of code for DataFrame column names analysis. Therefore it was names.

And regarding error message, it is really hard to understand that (::getfield(Main, Symbol("##143#144"))) means inappropriate number of arguments…


#4

You can create an anonymous function that does the tuple de-structuring if you want, it just requires an extra tailing comma in the arguments:

julia> f = ((index, item),) -> println("index: $index, item: $item")
#9 (generic function with 1 method)

julia> map(f, enumerate(["a", "b", "c"]))
index: 1, item: a
index: 2, item: b
index: 3, item: c

You need the comma to differentiate ((a, b),) -> ..., which is a function taking a single argument (a tuple) from (a, b) -> ..., which is a function taking two arguments, as mentioned above.


#5

Yes, I understand this message might be confusing. Here is how to read it:

no method matching (::getfield(Main, Symbol("##143#144")))(::Tuple{Int64,Symbol})
#                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^
#                       this part is the "anonymous"         this is the argument
#                       function's "name": not               type that Julia was
#                       interesting                          expecting: a tuple
Closest candidates are:
# This is what Julia found:
  %143(::Any, ::Any) at none:1
# ^^^^  ^^^^^^^^^^^
# name   two args
#        instead of a tuple

But I agree that it is hard to read…


#6

That’s a really nice annotation. I bet it would be possible to modify julia to produce something like that by default. The rust compiler does a great job of using a little bit of annotation and ascii art to make its error messages much friendlier, and I bet we could do the same.


#7

Thank you for the answers.
((index, item),) -> ... is much better than tuple -> let (index, item) = tuple; ...

It is still confusing syntax comparing with other languages experience.

It would be good to have just differentiating between:
tuple -> something(tuple[2]) ...
and
(index, item) -> ...
in any case it is visually distinguishable by brackets.

And already syntax like ((index, item)) -> ... or tuple(index, item) -> .... is more clear than ((index, item),) ->... with additional comma… But ok. May be it is just because of bad interpreting of the error messages…

ERROR: MethodError: no method matching (::getfield(Main, Symbol("##143#144")))(::Tuple{Int64,Symbol})
Closest candidates are:
#143(::Any, ::Any) at none:1

I got it. So, Julia tried to find some function with (::Tuple{Int64,Symbol}) argument. And next message about Closest candidates: #143(::Any, ::Any) is actually for Julia but not for me… Because of it is function provided by me. And candidates should be selected by Julia but not by me…

So here a message like Expected (::Tuple{Int64,Symbol}) but found arguments (::Any, ::Any) would be better. No clue how hard to fix it in the compiler. But signs to detect it are internal name of function with the prefix #… and the only candidate function.


#8

In that case, it is probably best to avoid tuple as a variable name, too. I guess it does not cause any problems in practice, when it goes out of scope, but it’s quite confusing to read.


#9

Are you particularly interested in the tuple issue, or are you mainly looking for a way to find the indices? In the latter case, you can forget all about the filter and the enumerate and the tuples, and just use findall.


#10

yes, thanks. I was really looking for findall.

findall(name -> startswith(name, "a"),  ["aa", "ab", "ca"])

looks much simpler.

But the topic about tuples usage syntax/errors diagnosis is really non obvious. So, the explanations might be useful for some other things.


#11

If your matching string is just a single character, this is actually significantly faster, especially for long arrays of strings:

findall(name -> startswith(name, 'a'), ["aa", "ab", "ca"])

That is, use a char, 'a', instead of a string, "a".


#12

If your matching string is just a single character,

Thanks. In my actual case it looks like:

function get_indices_for_prefix(df::DataFrame, prefix::String)
    findall(name -> startswith(string(name), prefix), names(df))
end