TLDR
High level iterators (values(dict), etc.) are potentially >3x slower than they could be… Why and how could they be improved?
Let me lead with the real near term question:
Why is the “ideal” iterator >3x slower than the “entitlement” iterator:
using BenchmarkTools
function functional_value_iterate(s::Int,L::Int,slots,vals)
@assert L < typemax(Int)
if L != typemax(Int)
for i=s:L
if @inbounds slots[i] == 0x1
return (@inbounds vals[i], i+1)
end
end
end
nothing
end
function ideal_iterator_sum(iter::Base.ValueIterator{<:Dict})
ret=0
# Start
d=iter.dict
slots=d.slots
vals=d.vals
L=length(slots)
state=d.idxfloor
next=nothing
@goto iterate
while true
(v, state) = next
ret+=v
@label iterate
next = functional_value_iterate(state,L,slots,vals)
if next === nothing
break
end
end
return ret
end
function entitlement_iterator_sum(iter::Base.ValueIterator{<:Dict})
ret=0
d=iter.dict
slots=d.slots
vals=d.vals
L=length(slots)
v=valtype(d)(0)
for state = d.idxfloor:L
if @inbounds !(slots[state]==0x1)
continue
end
v=@inbounds vals[state]
ret += v
end
return ret
end
Results:
julia> @benchmark ideal_iterator_sum(values(d))
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 2
--------------
minimum time: 2.699 μs (0.00% GC)
median time: 2.916 μs (0.00% GC)
mean time: 3.469 μs (0.00% GC)
maximum time: 59.679 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 9
julia> @benchmark entitlement_iterator_sum(values(d))
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 2
--------------
minimum time: 794.000 ns (0.00% GC)
median time: 860.106 ns (0.00% GC)
mean time: 968.261 ns (1.71% GC)
maximum time: 166.960 μs (99.02% GC)
--------------
samples: 10000
evals/sample: 90
julia>
Background idea:
So I have had a strange obsession recently with the idea that many higher level iterators could be on the order of >6x slower than what their entitlement loops could be.
One thought was could there be an iterator paradigm that with a little magic be easier for the JIT to optimize. Mainly by simplifying the iterate method of higher order iterators to remove any prework that they would have to do.
A solution I thought up is what I am calling a “Functional State” Iterator, with the idea being that any prework that needs to be done in the iterator is done by the initial iterate call and an anonymous function f(state)
is created to tell the compiler that these values don’t change. This can work with the existing two argument iterate paradigm by being totally encapsulated in the iterator state as a tuple (f,state)
. With the reuse of the same anonymous function signaling to the compiler, certain optimizations could happen.
So with inlining and magic:
next = iterate(iter)
would become
boxed_vars=.... ; next = f(initial_state,boxed_vars...)
and next = iterate(iter,state )
becoming next = f(state,boxed_vars...)
This simplifies the following transformation:
next=iterate(iter)
while next !== nothing
v,state = next
....
next = iterate(iter,state)
end
can be transformed to:
boxed_vars = ...
state = initial_state
@goto iterate
while true
v,state = next
....
@label iterate
next=f(state,boxed_vars...)
if next === nothing
break
end
end
The following is the basic implementation:
#---------------
struct FunctionalState{I,F<:Function} <: Function
f::F
end
FunctionalState(iter::I, f::F) where {I, F <: Function} = FunctionalState{I,F}(f)
(fs::FunctionalState)(state) = fs.f(state)
# This is the iterate function for every iterator the produces a Functional State.
function Base.iterate(::I,next::Tuple{FunctionalState{I,F},S}) where { I,S, F<:Function}
fs, state = next
state === nothing && return nothing
next = fs.f(state)
next === nothing && return nothing
(val,state) = next
return ( val,( fs, state))
end
struct FunctionalStateValueIterator{T<:AbstractDict}
dict::T
end
Base.length(v::FunctionalStateValueIterator) = length(v.dict)
Base.eltype(v::FunctionalStateValueIterator) = eltype(v.dict)
function functional_state_values(d::Dict)
FunctionalStateValueIterator(d)
end
function functional_value_iterate(s::Int,L::Int,slots,vals)
@assert L < typemax(Int)
if L != typemax(Int)
for i=s:L
if @inbounds slots[i] == 0x1
return (@inbounds vals[i], i+1)
end
end
end
nothing
end
function Base.iterate(iter::FunctionalStateValueIterator{<:Dict})
d=iter.dict
slots=d.slots
vals=d.vals
L=length(slots)
f(i::Int)=functional_value_iterate(i,L,slots,vals)
fs=FunctionalState(iter,f)
iterate(iter,(fs,d.idxfloor))
end
This works and is a little faster than the current iterator but I want it to get closer to entitlement.