A recent blog post highlighted the usage of Base.Iterators.partition
for correctly managing state in a multithreaded code. I checked how this function behaves on Dict
and Set
and found that it copies the data (see trace below) instead of creating a lazy structure that can iterate directly over the entries of the dictionary or set. I suppose that a lazy structure would be more performant for most applications. In the future, I may also implement efficient Base.Iterators.partition
methods for SortedDict, SortedSet, and SortedMultiSet in DataStructures.jl. Therefore, I am wondering:
-
Why does Base copy the data for this operation on Dict and Set?
-
Would it be a breaking change to reimplement
Base.Iterators.partition
lazily for Dict and Set instead of copying? For most usages, the change would be invisible, but in some odd cases like changing the data structure while iterating over it, this change could break a user’s code.
julia> s = Set(1:9);
julia> u = Base.Iterators.partition(s,4);
julia> for i in u
println(i, " ", typeof(i))
end
[5, 4, 6, 7] Vector{Int64}
[2, 9, 8, 3] Vector{Int64}
[1] Vector{Int64}