Efficient tuple concatenation


#1

I was needing a way to concatenate an undefined number of tuples. I came up with

tuplejoin(t1::Tuple, t2::Tuple, t3...) = tuplejoin((t1..., t2...), t3...)
tuplejoin(t::Tuple) = t

This works

julia> tuplejoin((1,2),(3,4),(5,6))
(1, 2, 3, 4, 5, 6)

However, while it is fast for two or three tuples, the time it takes grows very fast with additional tuples

julia> using BenchmarkTools

julia> @btime tuplejoin((1,2),(1,2));
  1.609 ns (0 allocations: 0 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2));
  6.962 ns (1 allocation: 64 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2));
  277.876 ns (6 allocations: 320 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2));
  730.792 ns (11 allocations: 608 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  1.319 μs (17 allocations: 976 bytes)

Is there a better way to do this so it scales well with the number of tuples?


#2

Something like this?

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
       @inline tuplejoin(x, y) = (x..., y...)
       @inline tuplejoin(x, y, z...) = tuplejoin(tuplejoin(x, y), z...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2));
  2.374 ns (0 allocations: 0 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  4.260 ns (0 allocations: 0 bytes)

#3

Thanks Michael! This indeed works beautifully in Julia 0.6. However I would have thought it would be equivalent to my solution (which I think is inlined automatically). Apparently, however, there is some subtle difference, although I’m not sure where.

What is puzzling is that under current master both solutions are indeed the same… and slow. I guess this is a regression?

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
tuplejoin (generic function with 1 method)

julia> @inline tuplejoin(x, y) = (x..., y...)
tuplejoin (generic function with 2 methods)

julia> @inline tuplejoin(x, y, z...) = tuplejoin(tuplejoin(x, y), z...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  1.157 μs (17 allocations: 976 bytes)

julia> versioninfo()
Julia Version 0.7.0-DEV.1165
Commit 1a43098cf7 (2017-07-31 03:33 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-5775R CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Environment:

EDIT: for comparison, v0.6

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
tuplejoin (generic function with 1 method)

julia> @inline tuplejoin(x, y) = (x..., y...)
tuplejoin (generic function with 2 methods)

julia> @inline tuplejoin(x, y, z...) = tuplejoin(tuplejoin(x, y), z...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  3.720 ns (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b (2017-06-19 13:05 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-5775R CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

#4

particularly impressive is the 0 allocations


#5

Absolutely! I wish we can get it back in 0.7!


#6

Then you should file an issue.


#7

Jeffrey already did it for me


#8

This base case is backwards. It should be:

julia> @inline tuplejoin(x, y, z...) = (x..., tuplejoin(y, z...)...)

#9

WOW! Indeed!

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
tuplejoin (generic function with 1 method)

julia> @inline tuplejoin(x, y) = (x..., y...)
tuplejoin (generic function with 2 methods)

julia> @inline tuplejoin(x, y, z...) = (x..., tuplejoin(y, z...)...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  3.719 ns (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 0.7.0-DEV.1165
Commit 1a43098cf7 (2017-07-31 03:33 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-5775R CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Environment:

But why does this work?? I think there is some important lesson hiding here. Could you ellaborate?


#10

… But not quite. It’s indeed non-allocating in v0.7, until I concatenate tuples of different length

In v0.7

julia> @btime tuplejoin((1,2),(1,2,3),(1,2,4,5),(1,2),(1,2),(1,2));
  253.331 ns (1 allocation: 128 bytes)

While in v0.6

julia> @btime tuplejoin((1,2),(1,2,3),(1,2,4,5),(1,2),(1,2),(1,2));
  4.512 ns (0 allocations: 0 bytes)

#11

It doesn’t call itself recursively on new values, only existing ones. This allows inference to trivially prove that it won’t need to solve the halting problem in order to do constant propagation.


#12

Thanks Jameson. I’ll write this down somewhere so that a future, smarter me can revisit it some day. :yum:


#13

@lekland, the allocation with your second example is not because tuples of different length are involved, but because the length of the final output tuple is 15. For some reason, the type is only inferrable up to a total tuple length of 14 in v0.7 and up to length 15 in v0.6. Adding one more element in one of the tuples, or one more non-empty tuple, will also result in allocations in v0.6.

There is a constant tupletype_len in inference which controls the maximal length of tuples that inference can handle and whose value is set to 15. But apparently something changed in the type inference algorithm by which this particular typejoin function can only be inferred up to tupletype_len - 1.


#14

Confirmed. Well, that was a coincidence! I had read about this cutoff in GitHub, but didn’t think to check.

Just to be clear, my real-world code does not need such long concatenations. My upper cutoff is typically 9 for the final length of the tuple.