How to leverage internals of `collect`

As part of on API I am working on, one of the higher-level functions collects results from an iterative algorithm. The results use user-defined functions, which are ideally type stable, but don’t have to be. Code could look like

"""
    get_result(state)

Return the result from state. Defined by the user.
"""
function get_result end

"""
    update_state(rng, state)

Return a new state updated randomly. Uses user-defined components.
"""
function update_state(rng, state) end

function collect_results(state, N, rng = Base.Random.GLOBAL_RNG)
    results = FIXME_initialize(...)
    for _ in 1:N
        FIXME_add!(results, get_result(state))
        state = update_state(rng, state)
    end
    results
end

Lines with FIXME is where I need help.

I could somehow cram this into a form for collect with closures, but I would prefer to keep the loop. Looking at the sources, the internals of collect have mechanisms for initializing a container with a type, then dealing with the possibility of getting something else instead. Is there a chance of using those? An example would help.

Ideally, my hope is that the compiler can sometimes figure out types for get_result, then it would be nice to get efficient code, but that’s a best case scenario I would not want to rely on, and want the right container type even if that does not work.

Could you provide more info on why you prefer the loop?

I prefer a loop because I have a state, and a somewhat complicated update mechanism for it (naturally, not shown in the MWE). As I said, I could shoehorn it into a collect with a closure, but that would not be the natural idiom. Loops are natural for many problems in Julia, now I just want to collect the result.

I suppose whether or not something is more natural is a bit subjective. I would argue that digging into the internals of collect is perhaps less natural.

How about the map do syntax? See stevengj’s suggestion for a similar problem.

2 Likes

Thanks, but my primary intention here is not to debate whether this is “natural”, but access this mechanism. It is of course possible that the interface is yet not ready for use outside Base, in which case I will just open an issue after thinking more about this.

As suggested by stevegj in that thread linked above, I’ve found the map-do construction pretty useful for this kind of situation. In your case, that might look like:

function collect_results(state, N, rng = Base.Random.GLOBAL_RNG)
    map(1:N) do _
        get_result(update_state(rng, state))
    end
end

which keeps the loop structure you’re looking for while also letting you simple omit both FIXMEs.

2 Likes

Thanks. Thinking about this, perhaps the only thing I cannot do with a map is break or return early. In this particular case I don’t need it though.

1 Like

Ah, yes, I agree, and I have wanted to be able to break inside a map as well.

I looked at Base.collect more carefully and came to the conclusion that what I want cannot be done in a way that fits well with the features that make Julia performant.

Specifically, Base.collect_to! is how one of the “worst case” scenarios is handled: when it finds that it can’t store an element because of type restrictions, it widens, copies, and calls itself recursively.

In order to achieve something similar in a loop with a mutable structure, I can imagine a

mutable struct Container
    elements::Vector # note the unspecified element type
end

and a variant of push! that tries pushing to .elements, and widens and copies if necessary. This way we keep the same Container, but its .element does not have a concrete type.

Or alternatively, I can envision a

container = store(container, new_element)

which may or may not return its first argument, if widening and copying if necessary. This runs into changing variable types, again problematic.

Anyhow, this is not a critique of Julia, just my notes on what I understood in connection with this question. I just have to organize my code differently.