# Implement iterator over subsets of array

Suppose I have an array like

``````x = [ 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 ]
``````

I want to loop over the sets of sequences of common numbers, using a syntax like:

``````for set in eachset(x)
...
end
``````

Where `eachset(x)` should behave as an array of sets.

If I define

``````eachset(x) = [ findall(isequal(i),x) for i in unique(x) ]
``````

I get the indexes of the elements of each set and I could use that (although I only need the ranges really).

But I understand that I do not need to really allocate that array to iterate over its elements. What do I have to implement to get an iterator instead of an array?

1 Like

`x` is always sorted in proper order? If yes, then you only need to implement `iterate`. It is very simple, consider this tutorial for example: Writing Iterators in Julia 0.7

1 Like

It is. It is slightly more complicated than that, because the vector is not simply a vector of numbers, but a vector of structures which contain the counter, but the sets are sequential in the original vectors.

Thanks for the link. If anything changed from 0.7 please let me know.

Not any changes that I know of. It works the same on 1.7, at least my last Iterator did.

1 Like

Since the topic was named `implement` I forget that there are other options (like use). Maybe this one will useful: Introduction Â· IterTools

1 Like

You can use a generator instead of array comprehension to avoid allocating the main array:

``````( findall(isequal(i),x) for i in unique(x) )
``````

``````[ findall(isequal(i),x) for i in unique(x) ]
``````

but thatâ€™s still quite inefficient compared to a hand-crafted iteratorâ€¦

Maybe if I provide some more information on the problem it gets clearer:

I have a `struct` named `Atom`, which contains the information of the atoms of my system. To simplify, let us suppose that it has 2 fields, the `name` and the molecule to which it belongs, i. e.:

``````struct Atom
name::String
molecule::Int
end
``````

Now I have a vector of â€śatomsâ€ť, for example with 2 water molecules:

``````julia> atoms = [ Atom("O",1), Atom("H",1), Atom("H",1),
Atom("O",2), Atom("H",2), Atom("H",2) ]
6-element Vector{Atom}:
Atom("O", 1)
Atom("H", 1)
Atom("H", 1)
Atom("O", 2)
Atom("H", 2)
Atom("H", 2)

``````

The molecules are always consecutive (not necessarily consecutive, but the molecule numbers are unique for each molecule), and are numbered according to the `molecule` field. What I want is to iterate over the molecules, with:

``````for molecule in eachmolecule(atoms)
...
end
``````

So I have to implement the `eachmolecule` function that generates the iterator.

One thing that I have to decide what is one molecule. It may be a vector of atoms, a view of a vector of atoms, or another struct with the range of atoms of the original vector of atoms (the option I am leaning to).

Working on it from the information provided. Thanks!

If you wanna follow the example of `Iterators.partition` it uses `SubArray`.

``````help?> Iterators.partition
partition(collection, n)

Iterate over a collection n elements at a time.

Examples
â‰ˇâ‰ˇâ‰ˇâ‰ˇâ‰ˇâ‰ˇâ‰ˇâ‰ˇâ‰ˇâ‰ˇ

julia> collect(Iterators.partition([1,2,3,4,5], 2))
3-element Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},1}:
[1, 2]
[3, 4]
[5]
``````

I think following the trail given by `@edit Iterators.partition([1, 2, 3, 4], 2)` may give you an example of how to implement the iterator you want, as `Iterators.partition` (or better, the `iterate` for `PartitionIterator`) is probably the code in `Base` that most closely resembles what you want.

1 Like

``````struct Atom
name::String
molecule::Int
end

atoms = [ Atom("O",1), Atom("H",1), Atom("H",1),
Atom("O",2), Atom("H",2), Atom("H",2) ]

struct EachMolecule
atoms::Vector{Atom}
end

eachmolecule(atoms) = EachMolecule(atoms)

function Base.iterate(em::EachMolecule, state = 1)
r0 = state
r0 > length(em.atoms) && return nothing
m0 = em.atoms[r0].molecule
r1 = r0
while r1 <= length(em.atoms)
em.atoms[r1].molecule != m0 && return (r0:r1 - 1, r1)
r1 += 1
end

return (r0:r1 - 1, r1)
end
``````
``````julia> for molecule in eachmolecule(atoms)
println(molecule)
end
1:3
4:6
``````
2 Likes

That was the part over which I was beating my brains out

This is the second large contribution you give to that package in terms of how things are done! I promise that when it becomes something useful you will be justly acknowledged. Thank you very much!

2 Likes