Iterator Invalidation

I wrote some code that looks like

for val in values  # values is a Vector
  idx = findfirst(values, val)
  values[idx] = 0
  if too many zeros in values
    ... move nonzero values to beginning of vector
    resize!(values, number of non zeros)
  end
end

What does this code do when values gets resized? I looked at the docs for information on what operations invalidate iterators but can’t find it.

1 Like

In case of arrays, the iteration seems to use indexes. This code:

a=collect(10:20)
for v in a
    println(v)
    deleteat!(v, 1)
end

gives this as output:

10
12
14
16
18
20

Not sure about the other collections.

I would say don’t write code that changes the size of the array that you are iterating over. Even if you get it to work correctly, the code will be very hard to read afterwards. (The statement val in values will no longer mean what the reader thinks it means.)

I’d like to give an example of a better way to do the above loop, but I’m not sure what the goal is here? It seems on the first iteration idx will be 1, so values[1] will be set to 0. On the second iteration, either values[2] is already 0, or it will be set to 0 because it is the first element equal to val. So in the end all of values will be zero?

Edit: Maybe this is what you want:

while !isempty(values)
    val = popfirst!(values)
    # do something
end

Iterators for v in x are just syntactic sugar for the iteration interface, most importantly iterate. Just imagine your code expanded as described there.

If you happen to change the original object you are iterating over, generally consequences are undefined, So don’t do it :wink:

Also, don’t use findfirst, since it will be spurious for duplicates. Consider

for (idx, val) in enumerate(values)
    ...
end

Iterators in C++ are, essentially, just raw pointers that you hope will point into the memory owned by your container. Modifying that container might break the assumption that those pointers belong to your container, so in C++ we have to worry a lot about iterator invalidation. In Julia, we don’t usually deal with raw pointers, and instead with have a nice iterate interface that results in, roughly, the behavior of a C++ loop like for (int i = 0; i < v.size(); i++) { do something with v[i]...}, but without having to keep track of the index variable manually.

So, for example, iterating over a vector while appending to it in Julia is fine:

julia> v = [1, 2, 3]
f3-element Array{Int64,1}:
 1
 2
 3

julia> for x in v
         @show x
         if x == 1
           push!(v, 4)
         end
       end
x = 1
x = 2
x = 3
x = 4

because the code expands to:

julia> let
         next = iterate(v)
         while next !== nothing
           x, state = next
           @show x
           if x == 1
             push!(v, 4)
           end
           next = iterate(v, state)
         end
       end
x = 1
x = 2
x = 3
x = 4

which still checks (by checking if the result of iterate(v, state) is nothing ) if we’ve reached the end of v, even after modification of v. In essence, any questions about what is safe or valid to do with an iterator in Julia should be answerable by falling back to that while loop construct, as documented in Interfaces · The Julia Language

The equivalent C++ code, on the other hand, produces garbage because modifying v causes the iterators to silently start pointing to completely invalid memory:

C++ > #include <iostream>
true

C++ > std::vector<int> v = {1, 2, 3};

C++ > for (auto iter = v.begin(); iter != v.end(); ++iter) {
        std::cout << "x: " << *iter << std::endl;
        if (*iter == 1) {
          v.push_back(4);  // oops, `iter` now points to garbage. All bets are off. 
        }
      }
x: 1
x: 0
x: 3
x: 0
x: 956330853
x: 13616
x: 513
x: 0
x: 2020515472

Some iterators hoist the computation of e.g. length which could give similar problems in Julia (not sure if any iterator that would be susceptible to this actually does this hoisting though).