Combining two arrays with alternating elements

Gregory_Martinez · September 26, 2018, 4:18am

I wish to combine two arrays, lets say a = [1, 2, 3, …] and b = [4, 5, 6, …], so that the elements alternate between them like so: c = [1, 4, 2, 5, 3, 6, …]. My first attempt was to make an “my_combine” object like so:

julia> using Base

julia> struct my_combine{T <: Tuple} <: AbstractArray{eltype(T), 1}
       _x::T
       _size::Int
       my_combine(args...) = new{typeof(args)}(args, sum(length(i) for i in args))
       end

julia> Base.getindex(a::my_combine, i::Int) = a._x[(i-1)%length(a._x)+1][div(i-1, length(a._x))+1]

julia> Base.size(a::my_combine) = (a._size,)

Which gives the desired effect. But, when benchmarked, its seems to be a bit slow when compared to other combining functions like zip:

julia> r1 = rand(1000); r2 = rand(1000);

julia> @time for a in my_combine(r1, r2)
       a
       end
  0.000117 seconds (4.00 k allocations: 125.109 KiB)

“my_combine” seems to be twice as slow, and takes up twice the memory, as zip:

julia> @time for (a,b) in zip(r1, r2)
       (a,b)
       end
  0.000053 seconds (2.00 k allocations: 78.156 KiB)

I’m still a bit new to Julia, so I’m sure I’m missing some performance tricks. Any thoughts?

StefanKarpinski · September 26, 2018, 4:28am

Doing vectorized indexing is not generally the fastest way to do things in Julia. It’s just the way you’re forced to do things in “vectorized languages” in order to not be terribly slow. You can do that in Julia and it will be just as fast, but the simplest thing you could possibly do is often far faster. In this case, simplest possible thing is just using a for loop:

function interleave(a::Vector{T}, b::Vector{T}) where T
    length(a) == length(b) || error("length mismatch")
    c = Vector{T}(undef, 2*length(a))
    i = 0
    for (x, y) in zip(a, b)
        c[i += 1] = x
        c[i += 1] = y
    end
    return c
end

I threw a zip in there, which is a potentially problematic abstraction, but the optimizer is quite good at optimizing that despite the abstraction. On my machine, this is 12x faster than my_combine. It’s unclear that the zip version is doing much since it doesn’t actually collect the zipped pairs. The fact that it’s slow indicates that the optimizer isn’t getting rid of the for loop altogether, but it could do so since it just throws the tuples away.

Also, using Base is unnecessary since that’s done by default in all normal modules, including the interactive prompt (REPL).

mschauer · September 26, 2018, 6:20am

As comparison, the plain

collect(Iterators.flatten(zip(a,b)))

is is somewhat faster than yours but has some overhead compared to Stefan’s. Maybe even the lazy Iterators.flatten(zip(a,b)) could be a solution depending on what you want to do with the result.

DNF · September 26, 2018, 7:19am

It seems to me that you want to create a wrapper around the two arrays, instead of allocating new array built from the other two, is that right?

I think you are losing a lot of performance due to the div and % operations, which are known to be slow, not just in Julia, but for all computers in general.

Gregory_Martinez · September 26, 2018, 7:56am

This is quite interesting. Apparently, when I use collect function on the “Iterators.fatten(...)” and “my_combine” methods, they seem to become more efficient. I posted the three methods’ benchmarks’ (yours, Iterators.fatten(...), mine, my_combine, and Stefan’s, interleave), with and without using collect, below (for my machine). For some reason, for my_combine, time and memory usage is improved by a factor of three when using collect! To me this seems a bit counter-intuitive. Naively, I would think that collect would create a new Array object and thus be less efficient since additional memory would be allocated. Unless, of course, all collect does is wrap an AbstractArray in an Array-like object. This last explanation (to me at this hour of the night), to make some sense because my_combine would have the most to gain from avoiding memory allocation for the full arrays (since it doesn’t allocated memory for the full array when the my_combine object is created). Or perhaps, this is only a fluke on my machine – weirder things has happened. Thoughts?

julia> r1 = rand(10000000); r2 = rand(10000000);

julia> @btime for a in my_combine(r1, r2)
       a
       end
  748.308 ms (40000005 allocations: 1.19 GiB)

julia> @btime for a in collect(my_combine(r1, r2))
       a
       end
  256.029 ms (20000007 allocations: 457.76 MiB)

julia> @btime for a in interleave(r1, r2)
       a
       end
  564.708 ms (39999491 allocations: 762.93 MiB)

julia> @btime for a in collect(interleave(r1, r2))
       a
       end
  616.393 ms (39999493 allocations: 915.52 MiB)

julia> @btime for a in Iterators.flatten(zip(r1, r2))
       a
       end
  913.383 ms (40000002 allocations: 2.09 GiB)

julia> @btime for a in collect(Iterators.flatten(zip(r1, r2)))
       a
       end
  569.172 ms (39999496 allocations: 762.93 MiB)

mschauer · September 26, 2018, 8:17am

I think this is an artefact from the way you test them. Avoiding globals and wrapping the test in a function avoids excessive allocation

function f(r1, r2) 
    b = 0.0;
    for a in Iterators.flatten(zip(r1, r2))
        b += a
    end
    b # return something so the function is not optimised away
end

julia> @btime f($r1,$r2)
  2.525 μs (0 allocations: 0 bytes)

See Performance Tips · The Julia Language

laborg · September 26, 2018, 8:27am

If you additionally define:

Base.eltype(a::my_combine) = eltype(first(a._x))

collecting will yield comparable performance.

Before:

julia> f() = collect(my_combine(ra1,ra2))
f (generic function with 2 methods)

julia> @btime f()
  1.659 s (20000007 allocations: 457.76 MiB)

after:

julia> @btime f()
  57.812 ms (7 allocations: 152.59 MiB)

Gregory_Martinez · September 26, 2018, 8:51am

Boy, what a difference globals make:

julia> @btime f($r1, $r2) # my_combine
  23.978 ms (2 allocations: 48 bytes)

julia> @btime f($r1, $r2) # collect(my_combine)
  3.600 s (40000005 allocations: 762.94 MiB)

julia> @btime f($r1, $r2) # collect(my_combine) with laborg's suggestion
  90.611 ms (5 allocations: 152.59 MiB)

julia> @btime f($r1, $r2) # interleave
  78.977 ms (2 allocations: 152.59 MiB)

julia> @btime f($r1, $r2) # collect(interleave)
  142.290 ms (4 allocations: 305.18 MiB)

@btime f($r1, $r2) # Iterators.flatten
  23.837 ms (0 allocations: 0 bytes)

@btime f($r1, $r2) # collect(Iterators.flatten)
  91.188 ms (6 allocations: 152.59 MiB)

It seems, for me, that “flatting” and my_combine are on par with each other. And on a side note, why does specifying

Base.eltype(a::my_combine) = eltype(first(a._x))

improve the performance of collect?

– O’yes, thx evertone for tolerating my curiosity!

laborg · September 26, 2018, 9:00am

In the way you defined your my_combined struct the element type isn’t inferred and so the collect machinery has to handle the contents of your array tuple always as type Any. Defining the eltype for your struct helps collect to overcome this.

~~I guess there might be a better way in defining my_combined using NTuple{S,T}, so that the extra eltype definition isn’t necessary, but this is a thing others can do better than I.~~

With this the eltype definition isn’t necessary anymore.

struct my_combine_{N,T} <: AbstractVector{T}  where {N}
       _x::NTuple{N,Vector{T}}
       _size::Int
       my_combine_(args...) = new{length(args),eltype(args[1])}(args, sum(length(i) for i in args))
end
Base.size(a::my_combine_) = (a._size,)
Base.getindex(a::my_combine_, i::Int) = a._x[(i-1)%length(a._x)+1][div(i-1, length(a._x))+1]

I personally would use the simpler versions presented here (collect + flat iterator).

StefanKarpinski · September 26, 2018, 11:42am

It’s welcomed, not tolerated! Please don’t hesitate to ask questions about performance or anything else!

mschauer · September 26, 2018, 1:15pm

The iterator solution is actually almost optimal if you compare it to what it does:

function interleave1(args::NTuple{N,Vector{T}}...) where {N,T}
    n = minimum(length.(args))
    c = Vector{T}(undef, length(arg)*n)
    i = 0
    for x in zip(args...)
        for y in x
            c[i += 1] = y
        end
    end
    return c
end

interleave2(args...) = collect(Iterators.flatten(zip(args...)))

@btime benchm($r1,$r2, interleave1)
@btime benchm($r1,$r2, interleave2)

julia> @btime f($r1,$r2, interleave2)
  83.923 μs (6 allocations: 156.42 KiB)
10009.221967441981

julia> @btime f($r1,$r2, interleave1)
  66.413 μs (2 allocations: 156.33 KiB)
10009.221967441981

Topic		Replies	Views
Iterating over several arrays New to Julia	14	331	March 24, 2021
Performance of collecting a zip where one argument to zip has eltype == Union{Nothing, Int} Performance	4	849	April 28, 2020
Concatenating iterables without allocating memory Performance	34	7312	January 16, 2020
Custom flatten version is faster than `flatten()` General Usage	4	1213	January 2, 2019
Julia multithreading is running slower than serial, can someone please explain why...? Performance multithreading , floops	15	1399	March 3, 2023

Combining two arrays with alternating elements

Related topics