Combining two arrays with alternating elements

I wish to combine two arrays, lets say a = [1, 2, 3, …] and b = [4, 5, 6, …], so that the elements alternate between them like so: c = [1, 4, 2, 5, 3, 6, …]. My first attempt was to make an “my_combine” object like so:

julia> using Base

julia> struct my_combine{T <: Tuple} <: AbstractArray{eltype(T), 1}
       _x::T
       _size::Int
       my_combine(args...) = new{typeof(args)}(args, sum(length(i) for i in args))
       end

julia> Base.getindex(a::my_combine, i::Int) = a._x[(i-1)%length(a._x)+1][div(i-1, length(a._x))+1]

julia> Base.size(a::my_combine) = (a._size,)

Which gives the desired effect. But, when benchmarked, its seems to be a bit slow when compared to other combining functions like zip:

julia> r1 = rand(1000); r2 = rand(1000);

julia> @time for a in my_combine(r1, r2)
       a
       end
  0.000117 seconds (4.00 k allocations: 125.109 KiB)

“my_combine” seems to be twice as slow, and takes up twice the memory, as zip:

julia> @time for (a,b) in zip(r1, r2)
       (a,b)
       end
  0.000053 seconds (2.00 k allocations: 78.156 KiB)

I’m still a bit new to Julia, so I’m sure I’m missing some performance tricks. Any thoughts?

1 Like

Doing vectorized indexing is not generally the fastest way to do things in Julia. It’s just the way you’re forced to do things in “vectorized languages” in order to not be terribly slow. You can do that in Julia and it will be just as fast, but the simplest thing you could possibly do is often far faster. In this case, simplest possible thing is just using a for loop:

function interleave(a::Vector{T}, b::Vector{T}) where T
    length(a) == length(b) || error("length mismatch")
    c = Vector{T}(undef, 2*length(a))
    i = 0
    for (x, y) in zip(a, b)
        c[i += 1] = x
        c[i += 1] = y
    end
    return c
end

I threw a zip in there, which is a potentially problematic abstraction, but the optimizer is quite good at optimizing that despite the abstraction. On my machine, this is 12x faster than my_combine. It’s unclear that the zip version is doing much since it doesn’t actually collect the zipped pairs. The fact that it’s slow indicates that the optimizer isn’t getting rid of the for loop altogether, but it could do so since it just throws the tuples away.

Also, using Base is unnecessary since that’s done by default in all normal modules, including the interactive prompt (REPL).

3 Likes

As comparison, the plain

collect(Iterators.flatten(zip(a,b)))

is is somewhat faster than yours but has some overhead compared to Stefan’s. Maybe even the lazy Iterators.flatten(zip(a,b)) could be a solution depending on what you want to do with the result.

6 Likes

It seems to me that you want to create a wrapper around the two arrays, instead of allocating new array built from the other two, is that right?

I think you are losing a lot of performance due to the div and % operations, which are known to be slow, not just in Julia, but for all computers in general.

1 Like

This is quite interesting. Apparently, when I use collect function on the “Iterators.fatten(...)” and “my_combine” methods, they seem to become more efficient. I posted the three methods’ benchmarks’ (yours, Iterators.fatten(...), mine, my_combine, and Stefan’s, interleave), with and without using collect, below (for my machine). For some reason, for my_combine, time and memory usage is improved by a factor of three when using collect! To me this seems a bit counter-intuitive. Naively, I would think that collect would create a new Array object and thus be less efficient since additional memory would be allocated. Unless, of course, all collect does is wrap an AbstractArray in an Array-like object. This last explanation (to me at this hour of the night), to make some sense because my_combine would have the most to gain from avoiding memory allocation for the full arrays (since it doesn’t allocated memory for the full array when the my_combine object is created). Or perhaps, this is only a fluke on my machine – weirder things has happened. Thoughts?

julia> r1 = rand(10000000); r2 = rand(10000000);

julia> @btime for a in my_combine(r1, r2)
       a
       end
  748.308 ms (40000005 allocations: 1.19 GiB)

julia> @btime for a in collect(my_combine(r1, r2))
       a
       end
  256.029 ms (20000007 allocations: 457.76 MiB)

julia> @btime for a in interleave(r1, r2)
       a
       end
  564.708 ms (39999491 allocations: 762.93 MiB)

julia> @btime for a in collect(interleave(r1, r2))
       a
       end
  616.393 ms (39999493 allocations: 915.52 MiB)

julia> @btime for a in Iterators.flatten(zip(r1, r2))
       a
       end
  913.383 ms (40000002 allocations: 2.09 GiB)

julia> @btime for a in collect(Iterators.flatten(zip(r1, r2)))
       a
       end
  569.172 ms (39999496 allocations: 762.93 MiB)

I think this is an artefact from the way you test them. Avoiding globals and wrapping the test in a function avoids excessive allocation

function f(r1, r2) 
    b = 0.0;
    for a in Iterators.flatten(zip(r1, r2))
        b += a
    end
    b # return something so the function is not optimised away
end
julia> @btime f($r1,$r2)
  2.525 μs (0 allocations: 0 bytes)

See https://docs.julialang.org/en/v1/manual/performance-tips/index.html#Avoid-global-variables-1

1 Like

If you additionally define:

Base.eltype(a::my_combine) = eltype(first(a._x))

collecting will yield comparable performance.

Before:

julia> f() = collect(my_combine(ra1,ra2))
f (generic function with 2 methods)

julia> @btime f()
  1.659 s (20000007 allocations: 457.76 MiB)

after:

julia> @btime f()
  57.812 ms (7 allocations: 152.59 MiB)

Boy, what a difference globals make:

julia> @btime f($r1, $r2) # my_combine
  23.978 ms (2 allocations: 48 bytes)

julia> @btime f($r1, $r2) # collect(my_combine)
  3.600 s (40000005 allocations: 762.94 MiB)

julia> @btime f($r1, $r2) # collect(my_combine) with laborg's suggestion
  90.611 ms (5 allocations: 152.59 MiB)

julia> @btime f($r1, $r2) # interleave
  78.977 ms (2 allocations: 152.59 MiB)

julia> @btime f($r1, $r2) # collect(interleave)
  142.290 ms (4 allocations: 305.18 MiB)

@btime f($r1, $r2) # Iterators.flatten
  23.837 ms (0 allocations: 0 bytes)

@btime f($r1, $r2) # collect(Iterators.flatten)
  91.188 ms (6 allocations: 152.59 MiB)

It seems, for me, that “flatting” and my_combine are on par with each other. And on a side note, why does specifying

Base.eltype(a::my_combine) = eltype(first(a._x))

improve the performance of collect?

– O’yes, thx evertone for tolerating my curiosity!

In the way you defined your my_combined struct the element type isn’t inferred and so the collect machinery has to handle the contents of your array tuple always as type Any. Defining the eltype for your struct helps collect to overcome this.

I guess there might be a better way in defining my_combined using NTuple{S,T}, so that the extra eltype definition isn’t necessary, but this is a thing others can do better than I.

With this the eltype definition isn’t necessary anymore.

struct my_combine_{N,T} <: AbstractVector{T}  where {N}
       _x::NTuple{N,Vector{T}}
       _size::Int
       my_combine_(args...) = new{length(args),eltype(args[1])}(args, sum(length(i) for i in args))
end
Base.size(a::my_combine_) = (a._size,)
Base.getindex(a::my_combine_, i::Int) = a._x[(i-1)%length(a._x)+1][div(i-1, length(a._x))+1]

I personally would use the simpler versions presented here (collect + flat iterator).

1 Like

It’s welcomed, not tolerated! Please don’t hesitate to ask questions about performance or anything else!

5 Likes

The iterator solution is actually almost optimal if you compare it to what it does:

function interleave1(args::NTuple{N,Vector{T}}...) where {N,T}
    n = minimum(length.(args))
    c = Vector{T}(undef, length(arg)*n)
    i = 0
    for x in zip(args...)
        for y in x
            c[i += 1] = y
        end
    end
    return c
end

interleave2(args...) = collect(Iterators.flatten(zip(args...)))

@btime benchm($r1,$r2, interleave1)
@btime benchm($r1,$r2, interleave2)
julia> @btime f($r1,$r2, interleave2)
  83.923 μs (6 allocations: 156.42 KiB)
10009.221967441981

julia> @btime f($r1,$r2, interleave1)
  66.413 μs (2 allocations: 156.33 KiB)
10009.221967441981
2 Likes