Iterating over several arrays

Ribeiro · March 24, 2021, 10:40am

Hi!
Say I have 2 arrays:

a=rand(3);
b=rand(4);

and I want to iterate over both of them

for i in 1:7
  do stuff on ab[i]
end

what’s a good way of doing that? The obvious way is ab = vcat(a,b), but that is creating a new array (my example is small, my real case is large).
Any clever way of doing it without having to create a new array and without a bunch of if statements to switch from a to b inside the loop (and, obviously, without 2 for loops)?
Thanks!

LaurentPlagne · March 24, 2021, 10:44am

I do not understand the question. Could you elaborate on the do stuff part or provide a MWE ?

FPGro · March 24, 2021, 10:49am

for thing in Iterators.flatten([a, b])
   ... do sth with thing
end

Ribeiro · March 24, 2021, 10:58am

Thanks for the answer.
Any way to do this with no allocations? Also, is there the option to output the index as well? I can always build the index myself, of course.
Here’s an example:

a=[1,2,3]
b=[4,5]
function tmpfun(a,b)
  acc=0
  for element in Iterators.flatten([a,b])
    acc+=element
  end
  return acc
end

Thanks a lot!

LaurentPlagne · March 24, 2021, 11:03am

function tmpfun2(a,b)
       acc=0
       for v in (a,b)
           for e in v
               acc+=e
           end
       end
       acc
end

DNF · March 24, 2021, 11:03am

I think there is something in the Iterators module, chain or something.

sijo · March 24, 2021, 11:13am

You can avoid the allocation using Iterators.flatten((a, b)) instead of Iterators.flatten([a, b]).

FPGro · March 24, 2021, 11:15am

My bad, use a tuple literal

julia> @benchmark sum(thing for thing in Iterators.flatten((A,B))) setup = begin A = rand(10); B = rand(20) end
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     14.113 ns (0.00% GC)
  median time:      14.315 ns (0.00% GC)
  mean time:        14.414 ns (0.00% GC)
  maximum time:     38.939 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> @benchmark sum(thing for thing in Iterators.flatten([A,B])) setup = begin A = rand(10); B = rand(20) end
BenchmarkTools.Trial:
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     41.994 ns (0.00% GC)
  median time:      44.209 ns (0.00% GC)
  mean time:        47.920 ns (4.97% GC)
  maximum time:     1.501 μs (96.88% GC)
  --------------
  samples:          10000
  evals/sample:     993

Edit: sorry, sudete already answered this

Edit2:

Also, is there the option to output the index as well?

Just the running number? You can wrap an enumerate around it. Or do you mean the indices into a and b? ~~That would be a tiny bit more code, but you probably wouldn’t want to concatenate them in that case.~~ It’s easily solved by sudete. I’d say we have a draw here sudete

julia> for (i,thing) in enumerate(Iterators.flatten((a,b)))
           # do something useful here
           print(i," ",thing," ")
       end
1 0.11576933628769859 2 0.5420240722242764 3 0.2224035318833757 [...]

sijo · March 24, 2021, 11:23am

yep barely

If you want a single index sequence from 1 to the total number, you can use for (i, x) in enumerate(Iterators.flatten((a, b))).

If you want the original indices from the arrays, you can do

function f(a,b)
    for (i, x) in Iterators.flatten((pairs(a), pairs(b)))
        println("$i: $x")
    end
end

julia> f([1,2,3], [40,50])
1: 1
2: 2
3: 3
1: 40
2: 50

This also works with non-standard arrays with indices starting at 0 or whatever, and it doesn’t allocate:

function f(a,b)
    acc_i, acc_x = 0, 0
       for (i, x) in Iterators.flatten((pairs(a), pairs(b)))
          acc_i += i
          acc_x += x
       end
    return (acc_i, acc_x)
end

julia> @btime f($[1,2,3], $[40, 50])
  15.846 ns (0 allocations: 0 bytes)

Edit: This time @FPGro was faster for enumerate

Ribeiro · March 24, 2021, 12:21pm

Thanks a lot, folks. This is what I ended up using:

function tmpfun(a,b)
  acc=0;
  acc2=0
  for (i,ii) in enumerate(Iterators.flatten((a,b)))
    acc+=i;
    acc2+=ii
  end
  return acc,acc2
end

which does exactly what I needed (global index, no allocations).
Thanks again!

Ribeiro · March 24, 2021, 12:33pm

And in case anyone is curious, I compared tmpfun to

function tmpfun2(a,b)
  acc=0;
  acc2=0
  for i in 1:length(a)
    acc+=i;
    acc2+=a[i];
  end
  for i in 1:length(b)
    acc+=i+length(a);
    acc2+=b[i];
  end
  return acc,acc2
end

and:

a=rand(10000);
b=rand(5000);

julia> @btime tmpfun($a,$b)
  18.399 μs (3 allocations: 64 bytes)
(112507500, 7565.072862093092)

julia> @btime tmpfun2($a,$b)
  17.999 μs (3 allocations: 64 bytes)
(112507500, 7565.072862093092)

So it seems like there is no performance penalty!
The allocations happen when I return acc,acc2. If I return acc+acc2 there are no allocations. So it’s allocating the array that gets returned, I guess (no clue as to why it says 3 allocations, but that’s not really an issue).
Thanks again!

sijo · March 24, 2021, 1:05pm

These allocations are due to a type instability: acc2 starts as an integer but is later assigned a float. You can see this by running @code_warntype tmpfun2(a,b).

To fix it you can initialize acc2=0.0, or to work with any element type: acc2=zero(eltype(a)).

If a and b can have different element types, you would need zero(promote_type(eltype(a), eltype(b))) but you would get lots of allocations from Iterators.flatten anyway…

Ribeiro · March 24, 2021, 1:15pm

Ah, I see. Thanks for the tips!

FPGro · March 24, 2021, 1:53pm

If you just add up the indices from 1 to n, you don’t even need to keep track of them explicitly. There’s a simple formula:

function bar(iters)
    s = sum(length, iters)
    return (s*(s+1)÷2, sum(Iterators.flatten(iters)))
end

Which is faster, more general and free of allocations ^^

julia> @benchmark tmpfun(A,B) setup = begin A = rand(10000); B = rand(5000) end
BenchmarkTools.Trial:
  memory estimate:  64 bytes
  allocs estimate:  3
  --------------
  minimum time:     13.699 μs (0.00% GC)
  median time:      13.900 μs (0.00% GC)
  mean time:        14.047 μs (0.00% GC)
  maximum time:     65.700 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark bar((A,B)) setup = begin A = rand(10000); B = rand(5000) end
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     10.199 μs (0.00% GC)
  median time:      10.300 μs (0.00% GC)
  mean time:        10.396 μs (0.00% GC)
  maximum time:     44.699 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Ribeiro · March 24, 2021, 2:02pm

Yep, I’m familiar with Gauss’ formula. The accumulator was just a demo. My real code is way more complicated, but requires the indices as well (to access other vectors that are as long as a+b).
Thanks!

Topic		Replies	Views
For-loop over two arrays, sequentially General Usage	3	1383	May 26, 2022
Combining two arrays with alternating elements Performance	10	5279	September 26, 2018
Accumulation in the for loop General Usage question	17	2129	January 21, 2022
Array cumulation General Usage array	4	467	December 23, 2020
Iterating over 2D array and placing results in 1D array with the same number of elements New to Julia indexing	20	2169	July 3, 2020

Iterating over several arrays

Related topics