Findfirst and eachline

The following one liner identifies the text file row number where the word “Stack” first appears:

findfirst(r -> contains(r, "Stack"), readlines(file))

However, the following does not, perhaps surprisingly:

findfirst(r -> contains(r, "Stack"), eachline(file))

Issuing:

ERROR: MethodError: no method matching keys(::Base.EachLine{IOStream})
The function `keys` exists, but no method is defined for this combination of argument types.

Any clues?

findfirst uses keys but eachline doesn’t have keys. You can collect it first probably.

collect(eachline()) == readlines(), but it would be more interesting to not have to read all the lines.

FWIW, the following options avoid collect():

for (i, r) in enumerate(eachline(file))
   contains(r, "Stack") && return i
end

or:

first(first(Iterators.filter(x -> contains(x[2], "Stack"), enumerate(eachline(file)))))

but not as simple/nice as the wished syntax.

2 Likes

The latter can be rewritten in a nicer way with a comprehension:

first(ind for (ind, str) in enumerate(eachline(file)) if contains(str, "Stack"))
3 Likes

I think this is a good example of where traits or interfaces would make things nicer. There could be separate implementations of findfirst for simple iterable sequences and ones that are maps of key=>value pairs (e.g. IsIterable and IsMap).

It’s not clear what findfirst should even mean for non-indexable collections, in general. For example, what should it do for an unordered collection where iteration order is arbitrary? Like:

findfirst(iszero, Set(0:10))

I think it is better in such cases for the caller to say explicitly what they want, e.g. have an Iterators.Enumerable(itr) wrapper that defines pairs for any collection in terms of enumeration, like:

struct Enumerable{I}
    itr::I
end
Base.pairs(e::Enumerable) = enumerate(e.itr)

# plus other methods to make iterate(e::Enumerable) --> iterate(e.itr), etcetera

in which case you can do

julia> findfirst(iszero, Enumerable(Set(0:10)))
5
1 Like

The orderedness question is not specific to non-indexable collections. Maps can be unordered too, such as Dict. Currently findfirst doesn’t care whether the order of the keys has any significance, it just returns the first match when iterating over the keys. Unless there is motivation to change that, the same behavior could be extended to non-indexable collections.

The invariant we have right now is

k = findfirst(predicate, A)
@assert predicate(A[k])

That invariant is lost for non-indexable A.

Admittedly, since findfirst returns an index, it perhaps doesn’t make a lot of sense to use it to a non-indexable collection.

Sometimes it’d be more convenient to have function that returns the first element that satisfies the predicate (instead of its index). In that case, indexability would not be a requirement.

1 Like

(edit: quote what I intended to answer to)

maybe Iterators.filter + first does what you want?

julia> k = first(Iterators.filter(x->startswith(x,"git restore"), eachline(".bash_history")))
"git restore Presentations/"

According to the docs there should be no unneccessary allocations, but I have not checked my assumption.

@HSt, note that the code shown throws the line, not its number.

With first(Iterators.filter(...)), you can just throw an enumerate at the eachline:

julia> first(Iterators.filter(((_,x),)->contains(x,"git restore"), enumerate(eachline("/Users/mbauman/.zsh_history"))))
(1289, ": 1691431753:0;git restore --staged base/abstractarray.jl")

(now you know what I was doing at 2pm on August 7, 2023)

1 Like

That looks similar to the post a bit further above. :slight_smile:
Note that one additional first() is required to extract the line number.

Simplicity here tends to be more personal taste? As 4th post without enumerate, IMHO readable and reusable:

function firstline(pred, file)
    i = 0
    for r in eachline(file)
        i += 1
        pred(r) && return i
    end
    # not found
    nothing
end

Here is the one-liner :face_with_open_eyes_and_hand_over_mouth:

julia> firstline(startswith("git restore"), ".bash_history")
34

(edit: firstline reads nicer than grepfirst; edit: link to prev post)

1 Like