Type conversion for `Tuple` without `memcpy`

Consider the following:

t = (1, 2) # a Tuple{Int64, Int64}
@code_llvm identity(t)
@code_llvm Tuple{Int64, Int64}(t)
@code_llvm convert(Tuple{Int64, Int64}, t)

The LLVM for all of these is the same (apart from the definition name):

define void @jlsys_convert_61082([2 x i64] addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)*, [2 x i64] addrspace(11)* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %3 = bitcast [2 x i64] addrspace(11)* %2 to i8 addrspace(11)*
  %4 = bitcast [2 x i64] addrspace(11)* %0 to i8 addrspace(11)*
  call void @llvm.memcpy.p11i8.p11i8.i32(i8 addrspace(11)* %4, i8 addrspace(11)* %3, i32 16, i32 1, i1 false)
  ret void
}

I would have expected this to be a no-op, as it is for Float64, Array, etc., but there’s an llvm.memcpy. Is this a performance bug, or is this copy necessary somehow? If the latter is the case (which would be weird to me, since Tuples are immutable), is there a way to elide the copy in convert calls?

The native code (that actually is run) just does two moves. Also, if you use the tuple for something, these moves are elided:

julia> getfirst(t) = (a = identity(t); a[1])
getfirst (generic function with 1 method)


julia> @code_llvm getfirst((1,2))

define i64 @julia_getfirst_60706([2 x i64]* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %1 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 0
  %2 = load i64, i64* %1, align 8
  ret i64 %2

I see, thank you. That explains these surprising benchmark results:

using StaticArrays, BenchmarkTools
@benchmark 1 * identity(x) setup = x = rand(SMatrix{3, 3})
@benchmark identity(x) setup = x = rand(SMatrix{3, 3})

which resulted in a time of 3.271 ns for the first and 9.493 ns for the second. So I guess it just doesn’t make sense to benchmark/analyze methods that are essentially identity by themselves (out of context).