Filter by array of tuples

I found weird behavior of tuple mapping on variables. I need to get indices of elements by a rule.

This code is working fine too:

names = ["aa", "ab", "ca"]
filter((i) -> startswith(i[2], "a"), collect(enumerate(names)))

I specified one variable for the tuple and, next I’m taking value by index - i[2]. And it is not good.

But the code:

names = ["aa", "ab", "ca"]
filter((i, name) -> startswith(name, "a"), collect(enumerate(names)))

doesn’t work. I’m getting error:

ERROR: MethodError: no method matching (::getfield(Main, Symbol("##143#144")))(::Tuple{Int64,Symbol})
Closest candidates are:
  #143(::Any, ::Any) at none:1
Stacktrace:
 [1] mapfilter(::getfield(Main, Symbol("##143#144")), ::typeof(push!), ::Array{Tuple{Int64,Symbol},1}, ::Array{Tuple{Int64,Symbol},1}) at ./abstractset.jl:336
 [2] filter(::Function, ::Array{Tuple{Int64,Symbol},1}) at ./array.jl:2352
 [3] top-level scope at none:0

At the same time, following code works fine:

names = ["aa", "ab", "ca"]
@show arr = collect(enumerate(names))
(i, name) = arr[1]
@show i
@show name

I’m getting:

arr = collect(enumerate(names)) = Tuple{Int64,String}[(1, "aa"), (2, "ab"), (3, "ca")]
i = 1
name = "aa"

Is it a bug or I’m doing something wrong?
Julia 1.0.3

As you probably know, the syntax

i -> startswith(i[2], "a")

creates an anonymous function similar to

function __anonymous__(i)
      startswith(i[2], "a")
end

Now the syntax

(i, name) -> startswith(name, "a")

creates an anonymous function taking two arguments (as opposed to a function taking one argument which is a tuple of two values, as in the first, working, case):

function __anonymous__(i, name)
      startswith(name, "a")
end

Hence the error message telling you that the correct method is not found: since enumerate produces a collection of Tuple{Int64,String}, this is what your function has to take as argument.


So if you want to “deconstruct” the tuple, you have to do it inside the function (at least I don’t know of any other way):

julia> filter(tuple-> let (i, name)=tuple; startswith(name, "a") end,
              collect(enumerate(["aa", "ab", "ca"])))
2-element Array{Tuple{Int64,String},1}:
 (1, "aa")
 (2, "ab")

BTW, using names as the name of your test variable is probably not a good idea, as it collides with Base.names

ok, thanks. Explicit assignment is a little bit non obvious way to expand arguments for lambda. Too many code…

Regarding names, sure. It was a fragment of code for DataFrame column names analysis. Therefore it was names.

And regarding error message, it is really hard to understand that (::getfield(Main, Symbol("##143#144"))) means inappropriate number of arguments…

You can create an anonymous function that does the tuple de-structuring if you want, it just requires an extra tailing comma in the arguments:

julia> f = ((index, item),) -> println("index: $index, item: $item")
#9 (generic function with 1 method)

julia> map(f, enumerate(["a", "b", "c"]))
index: 1, item: a
index: 2, item: b
index: 3, item: c

You need the comma to differentiate ((a, b),) -> ..., which is a function taking a single argument (a tuple) from (a, b) -> ..., which is a function taking two arguments, as mentioned above.

Yes, I understand this message might be confusing. Here is how to read it:

no method matching (::getfield(Main, Symbol("##143#144")))(::Tuple{Int64,Symbol})
#                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^
#                       this part is the "anonymous"         this is the argument
#                       function's "name": not               type that Julia was
#                       interesting                          expecting: a tuple
Closest candidates are:
# This is what Julia found:
  %143(::Any, ::Any) at none:1
# ^^^^  ^^^^^^^^^^^
# name   two args
#        instead of a tuple

But I agree that it is hard to read…

That’s a really nice annotation. I bet it would be possible to modify julia to produce something like that by default. The rust compiler does a great job of using a little bit of annotation and ascii art to make its error messages much friendlier, and I bet we could do the same.

Thank you for the answers.
((index, item),) -> ... is much better than tuple -> let (index, item) = tuple; ...

It is still confusing syntax comparing with other languages experience.

It would be good to have just differentiating between:
tuple -> something(tuple[2]) ...
and
(index, item) -> ...
in any case it is visually distinguishable by brackets.

And already syntax like ((index, item)) -> ... or tuple(index, item) -> .... is more clear than ((index, item),) ->... with additional comma… But ok. May be it is just because of bad interpreting of the error messages…

ERROR: MethodError: no method matching (::getfield(Main, Symbol(“##143#144”)))(::Tuple{Int64,Symbol})
Closest candidates are:
#143(::Any, ::Any) at none:1

I got it. So, Julia tried to find some function with (::Tuple{Int64,Symbol}) argument. And next message about Closest candidates: #143(::Any, ::Any) is actually for Julia but not for me… Because of it is function provided by me. And candidates should be selected by Julia but not by me…

So here a message like Expected (::Tuple{Int64,Symbol}) but found arguments (::Any, ::Any) would be better. No clue how hard to fix it in the compiler. But signs to detect it are internal name of function with the prefix #… and the only candidate function.

In that case, it is probably best to avoid tuple as a variable name, too. I guess it does not cause any problems in practice, when it goes out of scope, but it’s quite confusing to read.

Are you particularly interested in the tuple issue, or are you mainly looking for a way to find the indices? In the latter case, you can forget all about the filter and the enumerate and the tuples, and just use findall.

yes, thanks. I was really looking for findall.

findall(name -> startswith(name, "a"),  ["aa", "ab", "ca"])

looks much simpler.

But the topic about tuples usage syntax/errors diagnosis is really non obvious. So, the explanations might be useful for some other things.

If your matching string is just a single character, this is actually significantly faster, especially for long arrays of strings:

findall(name -> startswith(name, 'a'), ["aa", "ab", "ca"])

That is, use a char, 'a', instead of a string, "a".

If your matching string is just a single character,

Thanks. In my actual case it looks like:

function get_indices_for_prefix(df::DataFrame, prefix::String)
    findall(name -> startswith(string(name), prefix), names(df))
end