# Combining two arrays with alternating elements

I wish to combine two arrays, lets say a = [1, 2, 3, …] and b = [4, 5, 6, …], so that the elements alternate between them like so: c = [1, 4, 2, 5, 3, 6, …]. My first attempt was to make an “my_combine” object like so:

``````julia> using Base

julia> struct my_combine{T <: Tuple} <: AbstractArray{eltype(T), 1}
_x::T
_size::Int
my_combine(args...) = new{typeof(args)}(args, sum(length(i) for i in args))
end

julia> Base.getindex(a::my_combine, i::Int) = a._x[(i-1)%length(a._x)+1][div(i-1, length(a._x))+1]

julia> Base.size(a::my_combine) = (a._size,)
``````

Which gives the desired effect. But, when benchmarked, its seems to be a bit slow when compared to other combining functions like zip:

``````julia> r1 = rand(1000); r2 = rand(1000);

julia> @time for a in my_combine(r1, r2)
a
end
0.000117 seconds (4.00 k allocations: 125.109 KiB)

``````

“my_combine” seems to be twice as slow, and takes up twice the memory, as zip:

``````julia> @time for (a,b) in zip(r1, r2)
(a,b)
end
0.000053 seconds (2.00 k allocations: 78.156 KiB)
``````

I’m still a bit new to Julia, so I’m sure I’m missing some performance tricks. Any thoughts?

1 Like

Doing vectorized indexing is not generally the fastest way to do things in Julia. It’s just the way you’re forced to do things in “vectorized languages” in order to not be terribly slow. You can do that in Julia and it will be just as fast, but the simplest thing you could possibly do is often far faster. In this case, simplest possible thing is just using a `for` loop:

``````function interleave(a::Vector{T}, b::Vector{T}) where T
length(a) == length(b) || error("length mismatch")
c = Vector{T}(undef, 2*length(a))
i = 0
for (x, y) in zip(a, b)
c[i += 1] = x
c[i += 1] = y
end
return c
end
``````

I threw a `zip` in there, which is a potentially problematic abstraction, but the optimizer is quite good at optimizing that despite the abstraction. On my machine, this is 12x faster than `my_combine`. It’s unclear that the `zip` version is doing much since it doesn’t actually collect the zipped pairs. The fact that it’s slow indicates that the optimizer isn’t getting rid of the for loop altogether, but it could do so since it just throws the tuples away.

Also, `using Base` is unnecessary since that’s done by default in all normal modules, including the interactive prompt (REPL).

3 Likes

As comparison, the plain

``````collect(Iterators.flatten(zip(a,b)))
``````

is is somewhat faster than yours but has some overhead compared to Stefan’s. Maybe even the lazy `Iterators.flatten(zip(a,b))` could be a solution depending on what you want to do with the result.

6 Likes

It seems to me that you want to create a wrapper around the two arrays, instead of allocating new array built from the other two, is that right?

I think you are losing a lot of performance due to the `div` and `%` operations, which are known to be slow, not just in Julia, but for all computers in general.

1 Like

This is quite interesting. Apparently, when I use `collect` function on the “`Iterators.fatten(...)`” and “`my_combine`” methods, they seem to become more efficient. I posted the three methods’ benchmarks’ (yours, `Iterators.fatten(...)`, mine, `my_combine`, and Stefan’s, `interleave`), with and without using `collect`, below (for my machine). For some reason, for `my_combine`, time and memory usage is improved by a factor of three when using `collect`! To me this seems a bit counter-intuitive. Naively, I would think that `collect` would create a new Array object and thus be less efficient since additional memory would be allocated. Unless, of course, all `collect` does is wrap an AbstractArray in an Array-like object. This last explanation (to me at this hour of the night), to make some sense because `my_combine` would have the most to gain from avoiding memory allocation for the full arrays (since it doesn’t allocated memory for the full array when the `my_combine` object is created). Or perhaps, this is only a fluke on my machine – weirder things has happened. Thoughts?

``````julia> r1 = rand(10000000); r2 = rand(10000000);

julia> @btime for a in my_combine(r1, r2)
a
end
748.308 ms (40000005 allocations: 1.19 GiB)

julia> @btime for a in collect(my_combine(r1, r2))
a
end
256.029 ms (20000007 allocations: 457.76 MiB)

julia> @btime for a in interleave(r1, r2)
a
end
564.708 ms (39999491 allocations: 762.93 MiB)

julia> @btime for a in collect(interleave(r1, r2))
a
end
616.393 ms (39999493 allocations: 915.52 MiB)

julia> @btime for a in Iterators.flatten(zip(r1, r2))
a
end
913.383 ms (40000002 allocations: 2.09 GiB)

julia> @btime for a in collect(Iterators.flatten(zip(r1, r2)))
a
end
569.172 ms (39999496 allocations: 762.93 MiB)

``````

I think this is an artefact from the way you test them. Avoiding globals and wrapping the test in a function avoids excessive allocation

``````function f(r1, r2)
b = 0.0;
for a in Iterators.flatten(zip(r1, r2))
b += a
end
b # return something so the function is not optimised away
end
``````
``````julia> @btime f(\$r1,\$r2)
2.525 μs (0 allocations: 0 bytes)
``````
1 Like

``````Base.eltype(a::my_combine) = eltype(first(a._x))
``````

collecting will yield comparable performance.

Before:

``````julia> f() = collect(my_combine(ra1,ra2))
f (generic function with 2 methods)

julia> @btime f()
1.659 s (20000007 allocations: 457.76 MiB)
``````

after:

``````julia> @btime f()
57.812 ms (7 allocations: 152.59 MiB)
``````

Boy, what a difference globals make:

``````julia> @btime f(\$r1, \$r2) # my_combine
23.978 ms (2 allocations: 48 bytes)

julia> @btime f(\$r1, \$r2) # collect(my_combine)
3.600 s (40000005 allocations: 762.94 MiB)

julia> @btime f(\$r1, \$r2) # collect(my_combine) with laborg's suggestion
90.611 ms (5 allocations: 152.59 MiB)

julia> @btime f(\$r1, \$r2) # interleave
78.977 ms (2 allocations: 152.59 MiB)

julia> @btime f(\$r1, \$r2) # collect(interleave)
142.290 ms (4 allocations: 305.18 MiB)

@btime f(\$r1, \$r2) # Iterators.flatten
23.837 ms (0 allocations: 0 bytes)

@btime f(\$r1, \$r2) # collect(Iterators.flatten)
91.188 ms (6 allocations: 152.59 MiB)

``````

It seems, for me, that “flatting” and `my_combine` are on par with each other. And on a side note, why does specifying

``````Base.eltype(a::my_combine) = eltype(first(a._x))
``````

improve the performance of `collect`?

– O’yes, thx evertone for tolerating my curiosity!

In the way you defined your `my_combined` struct the element type isn’t inferred and so the collect machinery has to handle the contents of your array tuple always as type `Any`. Defining the eltype for your struct helps collect to overcome this.

I guess there might be a better way in defining `my_combined` using `NTuple{S,T}`, so that the extra eltype definition isn’t necessary, but this is a thing others can do better than I.

With this the eltype definition isn’t necessary anymore.

``````struct my_combine_{N,T} <: AbstractVector{T}  where {N}
_x::NTuple{N,Vector{T}}
_size::Int
my_combine_(args...) = new{length(args),eltype(args[1])}(args, sum(length(i) for i in args))
end
Base.size(a::my_combine_) = (a._size,)
Base.getindex(a::my_combine_, i::Int) = a._x[(i-1)%length(a._x)+1][div(i-1, length(a._x))+1]
``````

I personally would use the simpler versions presented here (collect + flat iterator).

1 Like

5 Likes

The iterator solution is actually almost optimal if you compare it to what it does:

``````function interleave1(args::NTuple{N,Vector{T}}...) where {N,T}
n = minimum(length.(args))
c = Vector{T}(undef, length(arg)*n)
i = 0
for x in zip(args...)
for y in x
c[i += 1] = y
end
end
return c
end

interleave2(args...) = collect(Iterators.flatten(zip(args...)))

@btime benchm(\$r1,\$r2, interleave1)
@btime benchm(\$r1,\$r2, interleave2)
``````
``````julia> @btime f(\$r1,\$r2, interleave2)
83.923 μs (6 allocations: 156.42 KiB)
10009.221967441981

julia> @btime f(\$r1,\$r2, interleave1)
66.413 μs (2 allocations: 156.33 KiB)
10009.221967441981
``````
2 Likes