Type conversion for `Tuple` without `memcpy`

question
performance
llvm
tuple

#1

Consider the following:

t = (1, 2) # a Tuple{Int64, Int64}
@code_llvm identity(t)
@code_llvm Tuple{Int64, Int64}(t)
@code_llvm convert(Tuple{Int64, Int64}, t)

The LLVM for all of these is the same (apart from the definition name):

define void @jlsys_convert_61082([2 x i64] addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)*, [2 x i64] addrspace(11)* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %3 = bitcast [2 x i64] addrspace(11)* %2 to i8 addrspace(11)*
  %4 = bitcast [2 x i64] addrspace(11)* %0 to i8 addrspace(11)*
  call void @llvm.memcpy.p11i8.p11i8.i32(i8 addrspace(11)* %4, i8 addrspace(11)* %3, i32 16, i32 1, i1 false)
  ret void
}

I would have expected this to be a no-op, as it is for Float64, Array, etc., but there’s an llvm.memcpy. Is this a performance bug, or is this copy necessary somehow? If the latter is the case (which would be weird to me, since Tuples are immutable), is there a way to elide the copy in convert calls?


#2

The native code (that actually is run) just does two moves. Also, if you use the tuple for something, these moves are elided:

julia> getfirst(t) = (a = identity(t); a[1])
getfirst (generic function with 1 method)


julia> @code_llvm getfirst((1,2))

define i64 @julia_getfirst_60706([2 x i64]* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %1 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 0
  %2 = load i64, i64* %1, align 8
  ret i64 %2

#3

I see, thank you. That explains these surprising benchmark results:

using StaticArrays, BenchmarkTools
@benchmark 1 * identity(x) setup = x = rand(SMatrix{3, 3})
@benchmark identity(x) setup = x = rand(SMatrix{3, 3})

which resulted in a time of 3.271 ns for the first and 9.493 ns for the second. So I guess it just doesn’t make sense to benchmark/analyze methods that are essentially identity by themselves (out of context).