Efficient tuple concatenation

pablosanjose · August 15, 2017, 9:03pm

I was needing a way to concatenate an undefined number of tuples. I came up with

tuplejoin(t1::Tuple, t2::Tuple, t3...) = tuplejoin((t1..., t2...), t3...)
tuplejoin(t::Tuple) = t

This works

julia> tuplejoin((1,2),(3,4),(5,6))
(1, 2, 3, 4, 5, 6)

However, while it is fast for two or three tuples, the time it takes grows very fast with additional tuples

julia> using BenchmarkTools

julia> @btime tuplejoin((1,2),(1,2));
  1.609 ns (0 allocations: 0 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2));
  6.962 ns (1 allocation: 64 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2));
  277.876 ns (6 allocations: 320 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2));
  730.792 ns (11 allocations: 608 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  1.319 μs (17 allocations: 976 bytes)

Is there a better way to do this so it scales well with the number of tuples?

Michael_Eastwood · August 15, 2017, 10:12pm

Something like this?

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
       @inline tuplejoin(x, y) = (x..., y...)
       @inline tuplejoin(x, y, z...) = tuplejoin(tuplejoin(x, y), z...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2));
  2.374 ns (0 allocations: 0 bytes)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  4.260 ns (0 allocations: 0 bytes)

pablosanjose · August 15, 2017, 11:00pm

Thanks Michael! This indeed works beautifully in Julia 0.6. However I would have thought it would be equivalent to my solution (which I think is inlined automatically). Apparently, however, there is some subtle difference, although I’m not sure where.

What is puzzling is that under current master both solutions are indeed the same… and slow. I guess this is a regression?

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
tuplejoin (generic function with 1 method)

julia> @inline tuplejoin(x, y) = (x..., y...)
tuplejoin (generic function with 2 methods)

julia> @inline tuplejoin(x, y, z...) = tuplejoin(tuplejoin(x, y), z...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  1.157 μs (17 allocations: 976 bytes)

julia> versioninfo()
Julia Version 0.7.0-DEV.1165
Commit 1a43098cf7 (2017-07-31 03:33 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-5775R CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Environment:

EDIT: for comparison, v0.6

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
tuplejoin (generic function with 1 method)

julia> @inline tuplejoin(x, y) = (x..., y...)
tuplejoin (generic function with 2 methods)

julia> @inline tuplejoin(x, y, z...) = tuplejoin(tuplejoin(x, y), z...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  3.720 ns (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b (2017-06-19 13:05 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-5775R CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

JeffreySarnoff · August 15, 2017, 11:18pm

particularly impressive is the 0 allocations

pablosanjose · August 15, 2017, 11:44pm

Absolutely! I wish we can get it back in 0.7!

mauro3 · August 16, 2017, 9:05am

Then you should file an issue.

pablosanjose · August 16, 2017, 9:29am

Jeffrey already did it for me

https://github.com/JuliaLang/julia/issues/23277

jameson · August 16, 2017, 3:44pm

This base case is backwards. It should be:

julia> @inline tuplejoin(x, y, z...) = (x..., tuplejoin(y, z...)...)

pablosanjose · August 16, 2017, 4:24pm

WOW! Indeed!

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
tuplejoin (generic function with 1 method)

julia> @inline tuplejoin(x, y) = (x..., y...)
tuplejoin (generic function with 2 methods)

julia> @inline tuplejoin(x, y, z...) = (x..., tuplejoin(y, z...)...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  3.719 ns (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 0.7.0-DEV.1165
Commit 1a43098cf7 (2017-07-31 03:33 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-5775R CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Environment:

But why does this work?? I think there is some important lesson hiding here. Could you ellaborate?

pablosanjose · August 16, 2017, 4:32pm

… But not quite. It’s indeed non-allocating in v0.7, until I concatenate tuples of different length

In v0.7

julia> @btime tuplejoin((1,2),(1,2,3),(1,2,4,5),(1,2),(1,2),(1,2));
  253.331 ns (1 allocation: 128 bytes)

While in v0.6

julia> @btime tuplejoin((1,2),(1,2,3),(1,2,4,5),(1,2),(1,2),(1,2));
  4.512 ns (0 allocations: 0 bytes)

jameson · August 16, 2017, 4:44pm

It doesn’t call itself recursively on new values, only existing ones. This allows inference to trivially prove that it won’t need to solve the halting problem in order to do constant propagation.

pablosanjose · August 16, 2017, 9:52pm

Thanks Jameson. I’ll write this down somewhere so that a future, smarter me can revisit it some day.

juthohaegeman · August 17, 2017, 5:24am

@lekland, the allocation with your second example is not because tuples of different length are involved, but because the length of the final output tuple is 15. For some reason, the type is only inferrable up to a total tuple length of 14 in v0.7 and up to length 15 in v0.6. Adding one more element in one of the tuples, or one more non-empty tuple, will also result in allocations in v0.6.

There is a constant tupletype_len in inference which controls the maximal length of tuples that inference can handle and whose value is set to 15. But apparently something changed in the type inference algorithm by which this particular typejoin function can only be inferred up to tupletype_len - 1.

pablosanjose · August 17, 2017, 12:59pm

Confirmed. Well, that was a coincidence! I had read about this cutoff in GitHub, but didn’t think to check.

Just to be clear, my real-world code does not need such long concatenations. My upper cutoff is typically 9 for the final length of the tuple.

djturizo · March 18, 2024, 3:24am

I was unsure to revive this topic but I think its worth mentioning that as of Julia 1.10, the Julia function Base.IteratorsMD.flatten reduces a tuple of tuples into a single concatenated tuple. For example:

julia> Base.IteratorsMD.flatten(((1,2),(3,4),(5,6)))
(1, 2, 3, 4, 5, 6)

The function does not work with variable tuple arguments, but I think it’s more common to find a tuple of tuples instead (like the Vararg type, for example). Even so, I think it’s worth knowing that there already is a Julia utility that handles arbitrary tuple concatenation. I do not know how performant it is compared to OP’s custom function though.

nsajko · March 18, 2024, 4:58am

Base.IteratorsMD.flatten is a Julia-internal implementation detail, i.e., it’s not a part of a public API and not meant to be used outside of the Julia implementation.

djturizo · January 17, 2025, 7:42pm

Let this be a reason to add it to the public API then. I find this function pretty convenient, it’s a shame that I had to scrape Julia’s source code to find it though.

Topic		Replies	Views
Efficient recursive tuple construction General Usage question , tuple , recursion	19	1102	September 10, 2021
Variable Length Tuples Without Allocation Internals & Design question	9	792	March 27, 2025
Question on tuple New to Julia	10	704	November 18, 2018
Core.tuple() warntype Performance	22	1464	January 17, 2020
Manually unroll operations with objects of tuple New to Julia question , metaprogramming , unrolling	21	2272	June 13, 2018

Efficient tuple concatenation

Related topics