Hi,
Coming in late, but a brief comment is that “type hiding” is a way to trade off
- “having to recompile whole function chain with every new type”
vs - “having to compile almost everything once and a very small function for every new type” at the cost of boxing and potentially dynamic dispatch (I write potentially as it depends on with how many types you actually invoke the function - Julia compiler has an adaptive rule here)
So essentially when you have code e.g.
function f(var)
# a lot of preprocessing
# core operation that are expensive
end
you change it into code
f1(@nospecialize var) = f2(wrap(var))
function f2(wrapped_var)
# a lot of preprocessing
f3(unwrap(wrapped_var))
end
function f3(var)
# core operation that are expensive
end
And in this way whenever you get a new type of var
:
- the original code had to recompile whole
f
each time - the changed code compiles
f1
andf2
only once, butf3
each time it hits a new type, but sincef3
is small it will compile fast; the cost is that you have to callwrap
andunwrap
and callingf3
will be with dynamic dispatch
Now regarding eachrow(df)
in DataFrames.jl. This is a type unstable iterator and will be slow. If you need a type stable iterator use Tables.namedtupleiterator(df[!, columns_you_are_going_to_use])
and use a function barrier (this call is kind of unwrap
operation in my example above and df
is a wrapped container var
), The thing is that eachrow
will be compiled once and fast, and with Tables.namedtupleiterator
and function barrier you get a compilation each time you call it with new columns and the compilation cost gets up with the number of columns you pass.