Channel iterators rely on `done` being called before `next`


#1

I am trying to use product() from IterTools with several channel iterators. A simple example:

using IterTools

mkfunc(lst) =
    function(c)
        for elem in lst
            println("producing val $elem")
            put!(c, elem)
            println("returned from put for $elem")
        end
    end

channel1 = Channel(mkfunc([1, 2]))
channel2 = Channel(mkfunc([3, 4]))

for (v1, v2) in product(channel1, channel2)
    println("$v1, $v2")
end

This code fails with

ERROR: LoadError: UndefRefError: access to undefined reference

in the middle of the iteration.

The reason, as far as I understand it, is that the Channel's next() relies on done() (and, consequently, take!()) being called before it to populate the state’s val field. Which is what, say, for cycle normally does. But product() calls next() right away after start(), which leads to the state’s value being undefined.

If my reasoning is correct, who is at fault here?

  1. The Channel's next() implementation relying on the done() call?
  2. The IterTools's product() implementation not calling done()?
  3. The iteration protocol documentation not specifying that done() should be called before next()?

#2

Maybe of interest:
https://github.com/JuliaLang/julia/issues/18823


#3

Which version of Julia are you trying to do this in? I am not sure about your exact problem, but I’ll just note that in 0.6, the Channel iterators have had many changes to make them more resilient, compared to 0.5


#4

v0.6. My link in the opening post (Channel's next()) points to the master branch, and what I see there does not seem too resilient to me. next() can fail even if there is something in the channel to take, and both next() and done() very unexpectedly mutate the iterator state.


#5

Thank you. I guess the take home of that thread is that the problem is known, and it may be solved in v1.0. Although I would argue that, perhaps, it is worth fixing Channel iterator before that (or will it result in a noticeable performance regression?).


#6

AFAICT product calls done. It’s not correct to call next without checking done first.

I guess the issue is different: maybe it’s that you can’t iterate repeatedly over a Channel, which is what product is doing? Anyway it looks like Channel should print a more explicit error.


#7

fair enough, thanks for clarifying.


#8

It does in the line you quoted, but a few lines down there’s start() immediately followed by next(). That’s where it fails.

Is it the official position? Perhaps it is worth stating it explicitly in the docs. I personally expected next() to only fail if there is nothing to take from the iterator, and done() not affect the iterator state at all, regardless of its result. It seems to me that the author of IterTools assumed that the provided iterators are deterministic, so if they returned a nonzero amount of values the first time they were used, there would be at least one value to take every time an iteration starts.

Yes, you are right, a basic Channel cannot be iterated over several times (although it still does not justify the “undefined reference” error). What I have in my code is a thin wrapper that recreates a Channel every time start() is called. This can be iterated over several times, but product() still fails, for the reasons mentioned in the opening post:

using IterTools

mkfunc(lst) =
    function(c)
        for elem in lst
            println("producing val $elem")
            put!(c, elem)
            println("returned from put for $elem")
        end
    end

struct MyChannel
    func
end

function Base.start(mc::MyChannel)
    c = Channel(mc.func)
    c_state = start(c)
    c, c_state
end

function Base.next(mc::MyChannel, state)
    c, c_state = state
    val, new_state = next(c, c_state)
    val, (c, new_state)
end

function Base.done(mc::MyChannel, state)
    c, c_state = state
    done(c, c_state)
end

channel1 = MyChannel(mkfunc([1, 2]))
channel2 = MyChannel(mkfunc([3, 4]))

for v in channel1
    println(v)
end
for v in channel1
    println(v)
end
for (v1, v2) in product(channel1, channel2)
    println("$v1, $v2")
end

#9

I see. I’m not sure there’s an official position on that, but we should certainly have a clear policy about it. If nobody else comments, it would be worth filing an issue in GitHub.