Any shortcut to 1:length(myVector)?

If I were a beginner looking to this thread for a canonical way to do this, what would I take away? … eachindex ?

1 Like

Probably 1:length(myVector) :smiley: even if eachindex is quite elegant.

On the same side, I often have to use several iterations to average my results, like

mcRuns = 10000;
for i = 1 : 1 : mcRuns 
   # Do some stupid stuff, 
   # without taking having to deal with i  into account 
end

Is there a way to do this elegantly like

do mcRuns times 
   # Do some stupid stuff, more elegantly 
end

IMHO, yes. eachindex is not just elegant but efficient. If you also need the values, go with pairs.

I you really want/need integers (even if more efficient indexing is possible) then I’d go with 1:length(v) (or enumerate if I want the values as well).

And of course, if you only need the values you can simply write for val in myVector.

3 Likes

1:length(v) will fail for OffsetArrays. Use LinearIndices(v) instead if you want scalar indexing.

5 Likes

No, this advice is wrong. There are examples in this topic that show where this approach fails.

You can say

for _ in 1:nRuns
    # ...
end
5 Likes

Just go with:

for value in container – when you only care about iterating the values.
for index in eachindex(container) – when you gonna just use the indexes, or the values are used very infrequently. EDIT: see below [1].
for (index, value) in pairs(container) – when you gonna need both at each loop; or if you are using some more complex structure like Dict/trees and this may save some effort of hashing keys (or searching the tree) again.

[1] To complement: For simple code over Vectors there will probably no performance difference between eachindex + container[index] and pairs. If you gonna apply SSOT (Single Source of Truth) and you want to query the container every time instead of having bindings with cached values, then go for it. It may be better than using pairs in cases that, inside a single loop, the values may change inside the container and you always want to get the most recent values. But, theoretically, the performance of pairs will always be the same or better than eachindex + container[index] if both index and value are used.

15 Likes

Do you think one should avoid, for performance reasons,

x = rand(1000)
y = rand(1000)
for i in eachindex(x)
   # do stuff with x[i]
   for j in eachindex(y)
      # do stuff with x[i] and y[j]
   end
end

in favor of

for vx in x
   # do stuff with vx
   for vy in y
      # do stuff with vx and vy
   end
end

I tested a small example here and, although the codes are not lowered to the same thing (I could imagine that they could be), and there are repeated getindex(x,i) calls in the first choice, I could not build an example where I could measure any performance difference at all.

Perhaps the advice is more if the values are going to be used repeatedly many, many times instead of the very infrequently? (or, of course, if the structure is such that getindex for it is expensive for some reason).

Or maybe this is surprising advice to me because in what I do I only use vectors, matrices, etc, instead of Dicts or other data structures that require more sophisticated indexing?

1 Like

Especially since you see negligble performance difference, if you only use i for x[i] and j for y[j], then I’d greatly prefer the second example for simplicity and clarity.

2 Likes

If your container is a Vector and it is indexed a single time, then there will probably be no difference in performance. Sometimes even if it is indexed multiple times (if the compiler can prove it does not change).

I would go with pairs in this case anyway because I believe the code may change and eachindex may become a worse decision, but I do not see the opposite happening (i.e., I do not see eachindex becoming a better decision than pairs in a code that use both index and value). But if the code is guaranteed to not change too much (i.e., will keep using Vectors and not making so many accesses) then the decision ends up being mostly stylistic. I would only recommend eachindex if you really is semantically iterating positions and may change the value in the indexed position inside the loop before accessing it again and therefore using pairs can lead to a subtle bug of using the old value stored in value instead of the new in the indexed position.

EDIT: updated my first answer to address some cases I thought only after after this question.

1 Like

Just for completeness, at the end there is no alternative for a loop like this, is there? (except for firstindex and lastindex, which would support offset arrays)?

N = length(x)
for i in 1:N-1
   for j in i+1:N
      # do stuff with x[i] and x[j]
   end
end
N = length(x)
I = LinearIndices(x)
for i in I[1:N-1]
   for j in I[i+1:N]
      # do stuff with x[i] and x[j]
   end
end
3 Likes

I really do think axes is the “canonical” way to do these things, e.g. you could do:

for i in axes(x,1)[begin:end-1]
   for j in i+1:axes(x,1)[end]
      # do stuff with x[i] and x[j]
   end
end

and it will work with offset arrays.

3 Likes

I think that approach won’t work with a Matrix, since you are iterating only over the first axis:

julia> x = [1 2; 3 4]
2×2 Array{Int64,2}:
 1  2
 3  4

julia> for i in axes(x,1)[begin:end-1]
          for j in i+1:axes(x,1)[end]
             @show x[i] x[j]
          end
       end
x[i] = 1
x[j] = 3
1 Like

Why would a Matrix be named myVector??

4 Likes

Hopefully it wouldn’t be! :smile:

1 Like