New package: Collects.jl - meant to improve upon and generalize the interface of collect

Just requested the registration of the new package, Collects.jl, meaning it should be registered after three days pass.

Basically it’s meant to replace collect, especially when the desired output container type is not Array, as the new package’s interface takes the container type as a method argument. Another generalization with respect to collect is that the user now has control over how empty iterators are handled. In particular, type inference is not used by default, which is friendlier to the compiler optimizer, and to the correctness of the package ecosystem. The default behavior is to throw when the element type can’t be deduced from either eltype, the elements of the iterator, or the output type argument.

The main exported names are Collect and collect_as. The former is meant for package authors implementing the Collects.jl interface, while the latter should be more convenient for users.

The collect_as function behaves similarly to collect, except that the first argument is the container type instead of the element type. Apart from two positional arguments, collect_as also takes an optional keyword argument, controlling how the element type of empty iterators is handled. See the Readme for more information:

Some notes:

  • Collects.jl implements its interface for (some) collection types from Base, otherwise it’s an interface package, meaning that its interface is meant to be implemented in other packages for types that package owns.

  • Unlike collect, Collects.jl does not use Base.IteratorEltype. Example where Base.IteratorEltype matters for collect:

    julia> i = Any[3, 3.0];
    
    julia> o = Iterators.map(identity, i);
    
    julia> eltype(collect(i))
    Any
    
    julia> eltype(collect(o))
    Real
    

    Collects.jl basically assumes Base.IteratorEltype(iterator) === Base.EltypeUnknown(). My rationale for this choice is that someone who wants Base.HasEltype() can easily do collect_as(Vector{eltype(iter)}, iter):

    julia> eltype(collect_as(Vector, i))
    Real
    
    julia> eltype(collect_as(Vector, o))
    Real
    
    julia> eltype(collect_as(Vector{eltype(i)}, i))
    Any
    
    julia> eltype(collect_as(Vector{eltype(o)}, o))
    Any
    
32 Likes

I know this is going to be against the stream of “everything should be externalized”, but it feels like a good function to be in located in Base. I’m quite sure I would use this function if it was in Base, but am equally sure that I will not add Collects.jl as a requirement to any of my packages. There is just too much overhead for a single and simple function.

4 Likes

Sorry for the last minute bikeshedding, but I think fromiter would be a good name for this. It reads a little more naturally with the argument order, i.e. fromiter(Set, x) is like

Set `fromiter` x

which is similar to how contains(haystack, needle) can be read as

haystack `contains` needle

and occursin(needle, haystack) can be read as

needle `occursin` haystack

There is also some precedence for fromiter or from_iter in other languages, e.g.,

1 Like

Opening a poll for voting on alternative names, but I’m not willing to give up collect in the name, unless there’s a really good alternative. EDIT: uhh the package just got registered. I’ll make a v2 ASAP if necessary.

Sure, but there’s even more precedent for collect, I think.

Poll

What should the name of the user-facing function be?
  • collect_as
  • collectas
  • collect_to
  • collectto
  • collect_into
  • collectinto
  • collect_from
  • collectfrom
0 voters

collect already is in Base, though. If a PR is going to be made against JuliaLang/julia to add a replacement for collect, the replacement better be really good, otherwise there’s going to be yet another replacament added to Base two years later. And the only way to be sure is to test it out in the package ecosystem.

I’m sure you’re underestimating the choices that went into both the design of the interface and into the implementation.

Case in point, even though collect_as existed for some time as a function in FixedSizeArrays.jl, after creating this standalone package, and even after starting the registration process, there’s been significant changes.

Furthermore, what you’re saying doesn’t really make sense: this is an interface package, meant to have its interface implemented by an arbitrary number of other packages. So you can’t just reimplement it or vendor it locally.

Regarding the poll, is there someone able to see the results? I just get this meaningless attempt at a visualization:

The poll should be configured so that each vote is public, but I don’t understand if that is actually available.

Seems like the ranked choice voting is supposed to go in rounds and in the end have quite different visuals: the third image in this original PR and a bunch of images in this thread show it. Not sure why this one displays the results in this useless way. Perhaps worth creating a thread in that meta.discourse website after getting info about the Discourse version being used here from the admins/mods.

1 Like