tldr: I’m trying to find the most performant method for indexing Tuple.parameters
in a struct similar to SArray{S<:Tuple}
where I want f(::SArray{S}, i::Int) where S = S.parameters[i]
. All my current solutions require a lot of boilerplate code and don’t scale to more dimensions very well.
Potential Solutions
- Stick with
Tuple.parameters[i]
- Use
Val
to map eachTuple
element. This doesn’t seem to allow much flexibility in terms of iteration because even compiler time known sequences of iterations have to use dynamic dispatch for creating theVal(i)
. Also, this seems to be really intense to compile because it takes a while the first time around on my computer. - Create a jump table using
if else
statements to map eachTuple
element. This seems performant but I find it harder to metaprogram for multiple dimensions than theVal
solution because I have to edit quote blocks and get into the gritty details of expressions are write out by hand each level of dimensionality (e.g., a separate function for 1,2,3… dimension cases). It seems like there should be a macro to help with this but I still need to compose the function head with the contents of the tuple specified.
Progress of Current Investigation
When I had my new friend Traceur take a look at things I found warnings (*gasp*).
struct TmpType{T} end
t = TmpType{Tuple{1,2,3}}()
julia> @inline dumbfunc(t::TmpType{T}, i::Int) where T = T.parameters[i]
dumbfunc (generic function with 1 method)
julia> @trace dumbfunc(t, 2)
┌ Warning: unsafe_pointer_to_objref returns Any
└ @ pointer.jl:128
┌ Warning: getindex returns Any
└ @ essentials.jl:549
┌ Warning: dumbfunc returns Any
└ @ REPL[9]:1
2
julia> @benchmark dumbfunc(t, 2)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 37.154 ns (0.00% GC)
median time: 37.900 ns (0.00% GC)
mean time: 38.525 ns (0.00% GC)
maximum time: 170.946 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 992
Alternatively using Val…
julia> dumbfunc2(t::TmpType{T}, ::Val{I}) where {T,I} = T.parameters[I]
dumbfunc2 (generic function with 1 method)
julia> i = Val(2)
Val{2}()
julia> @trace dumbfunc2(t, i)
┌ Warning: unsafe_pointer_to_objref returns Any
└ @ pointer.jl:128
┌ Warning: getindex returns Any
└ @ essentials.jl:549
2
julia> @benchmark dumbfunc2(t, i)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 38.140 ns (0.00% GC)
median time: 38.990 ns (0.00% GC)
mean time: 41.482 ns (0.00% GC)
maximum time: 140.588 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 992
So it seems a little better but then I had a crazy idea to handle the unsafe_pointer_to_objref warning.
for i in 1:3
eval(:(dumbfunc3(::TmpType{Tuple{I1,I2,I3}}, ::Val{$i}) where {I1,I2,I3} = $(Symbol("I",i))::Int))
end
julia> @trace dumbfunc3(t, i)
2
julia> @benchmark dumbfunc3(t, i)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 18.130 ns (0.00% GC)
median time: 18.368 ns (0.00% GC)
mean time: 18.408 ns (0.00% GC)
maximum time: 53.923 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
Finally helps us get speed because no indexing has to occur.
But this is SUPER restrictive, because this loses all performance once I try iterating
through each value.
@inline function itr_with_index(x::TmpType)
for idx in 1:3
dumbfunc(t, idx)
end
end
@inline function itr_with_val(x::TmpType)
for idx in 1:3
dumbfunc3(t, Val(idx))
end
end
julia> @benchmark itr_with_index(t)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 127.024 ns (0.00% GC)
median time: 128.688 ns (0.00% GC)
mean time: 130.207 ns (0.00% GC)
maximum time: 325.132 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 891
julia> @benchmark itr_with_val(t)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 7.782 μs (0.00% GC)
median time: 7.868 μs (0.00% GC)
mean time: 7.972 μs (0.00% GC)
maximum time: 41.136 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 4
So maybe I should use ifelse
Base.@pure function dumbfunc4(::TmpType{Tuple{I1,I2,I3}}, i::Int) where {I1,I2,I3}
if i == 1
I1::Int
elseif i == 2
I2::Int
elseif i == 3
I3::Int
end
end
@inline function itr_with_control_flow(x::TmpType)
for idx in 1:3
dumbfunc4(t, idx)
end
end
julia> @benchmark itr_with_index(t)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 126.215 ns (0.00% GC)
median time: 127.884 ns (0.00% GC)
mean time: 134.846 ns (0.00% GC)
maximum time: 285.840 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 896
julia> @benchmark itr_with_control_flow(t)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 70.126 ns (0.00% GC)
median time: 70.339 ns (0.00% GC)
mean time: 71.062 ns (0.00% GC)
maximum time: 205.784 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 975
Best for iteration
Single value get index is comparable.
julia> @benchmark dumbfunc3(t, i)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 19.689 ns (0.00% GC)
median time: 19.742 ns (0.00% GC)
mean time: 20.376 ns (0.00% GC)
maximum time: 87.241 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
julia> @benchmark dumbfunc4(t, 2)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 19.716 ns (0.00% GC)
median time: 20.045 ns (0.00% GC)
mean time: 20.127 ns (0.00% GC)
maximum time: 71.109 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
Now lets see how this scales.
function big_itr_with_index(x::TmpType)
for i in 1:1_000_000
itr_with_index(x)
end
end
function big_itr_with_control_flow(x::TmpType)
for i in 1:1_000_000
itr_with_control_flow(x)
end
end
julia> @benchmark big_itr_with_index(t)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 106.994 ms (0.00% GC)
median time: 107.456 ms (0.00% GC)
mean time: 107.868 ms (0.00% GC)
maximum time: 114.997 ms (0.00% GC)
--------------
samples: 47
evals/sample: 1
julia> @benchmark big_itr_with_control_flow(t)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 54.505 ms (0.00% GC)
median time: 55.100 ms (0.00% GC)
mean time: 55.156 ms (0.00% GC)
maximum time: 57.452 ms (0.00% GC)
--------------
samples: 91
evals/sample: 1
So this is roughly twice as fast.