I have a function “complement” that takes two tuples of ints and does a simple comparison to generate a third tuple. I would prefer to write it so that it works for any length of tuple, but that kills the performance (10x slower). It looks like the problem is that with a loop, an array gets created first and then converted into a tuple.
Preferred implementation:
ac(a, b) = a == b ? a : 6 - a - b
function complement1(a, b)
return Tuple(ac(a[k], b[k]) for k in 1:length(a))
end
julia> a = (1, 2, 3, 2); b = (2, 1, 3, 2);
julia> @btime complement1(a, b)
212.952 ns (2 allocations: 160 bytes)
(3, 3, 3, 2)
Fast implementation:
function complement4(a, b)
return (ac(a[1], b[1]), ac(a[2], b[2]), ac(a[3], b[3]), ac(a[4], b[4]))
end
julia> @btime complement4(a, b)
22.970 ns (1 allocation: 48 bytes)
(3, 3, 3, 2)
Does anyone know some kind of trick to get the compiler to unroll the loop / create a tuple directly, so I can write the general version and still get close to the speed of the 4-tuple version?
btw I tried a couple of alternatives to the first version, but they didn’t help:
# adding type hints
function complement1a(a::NTuple{4, Int64}, b::NTuple{4, Int64})::NTuple{4, Int64}
return Tuple(ac(a[k], b[k]) for k in 1:4)
end
# splatting the array before turning it into a tuple
function complement2(a, b)
return tuple([ac(a[k], b[k]) for k in 1:4]...)
end
# zipping the inputs together first
ac2(a) = ac(a[1], a[2])
function complement3(a, b)
return Tuple(ac2.(zip(a,b)))
end
julia> @btime complement1a($a, $b)
212.000 ns (2 allocations: 160 bytes)
(3, 3, 3, 2)
julia> @btime complement2($a, $b)
216.601 ns (2 allocations: 160 bytes)
(3, 3, 3, 2)
julia> @btime complement3($a, $b)
269.972 ns (4 allocations: 416 bytes)
(3, 3, 3, 2)
(Using Julia 1.5.1 on Windows).