Yes! That’s how you get a gut feeling. But at the moment of writing some code, you normally don’t have all the context worked out (i.e. you don’t have all the callers written before designing the callee). Nor do you necessarily know the types you will use. And context matters!
Something that took me embarrasingly long is to figure out how to combine stack-allocation and mutability (like stack-allocated C arrays).
The right API is to use immutable tuples, and for modification do the tup = modify(tup, params)
dance, where
function modify(tup, params)
rtup = Ref(tup)
ptr = pointer_from_objref(rtup)
#use unsafe_store! to modify rtup, depending on params
return rtup[]
end
This works reasonably well. Alternatively, one can pass a reference to modify
; if this gets inlined, then the allocation can be elided. So one has the choice whether the caller or callee creates the Ref
; if the caller creates the mutable container then everything depends on inline-heuristics for performance.
Something that still annoys me is the following: Often, one wants wrapper-types. That is,
struct MyWrap{T}
parent::T
end
in order to override some dispatches of the parent. Overriding iterate
is especially important: for
-loops are nicer syntax than the iteration protocol (the while
-dance with tuple unpacking and benign type-instability). Also, iterate
, getfield
and getproperty
are afaik treated specially (special-cased inlining / inference).
But if parent
is not bitstype, then this abstraction is not zero-cost: It creates an extra pointer indirection and allocation (just like views).
Hence, MyWrap
should never be stored in an array or structure, and only be passed to @inline
or very expensive functions (these can hoist the indirection out of loops), and the caller needs to think very hard about two possible strategies: Create MyWrap
early and reuse it, or create it on-demand. But for efficient reuse, it should be mutable instead of immutable!
Therefore, the entire API and the kind of types depend massively on available compiler optimizations and inline heuristics.
It would be really nice if I could write efficient code / APIs that do not unpack and repack the MyWrap
all the time. That is, if I could actually abstract away the fact that MyWrap
has a parent
, instead of having to do some mix & mash between passing parent
and MyWrap
around, and having to test every single use for whether inline heuristics and escape analysis succeed.
So basically the old problem that immutable types are not inlined in arrays, structures and tuples, i.e. the fact that e.g. String
and struct foo a::String end
have different layout (first is char *
, second is char**
). This is a terrible problem that makes a lot of abstractions very expensive, and is only partially alleviated by escape analysis (the escape analysis is really cool! But allocation-avoidance based on that is a semi-reliable compiler optimization, instead of a property of the data layout).
One specific example that I gave up on:
I want to store a densish graph in a bitmap. Small graphs work fine with something like Vector{NTuple{N, UInt64}}
(i.e. size of graph rounded to multiple of 64 is part of the type). For larger graphs, this does not fly anymore. So neighbors(g, i)
needs to return some kind of view into the backing contiguous Matrix{UInt64}
. And iterate(neighbors(g, i))
needs to iterate over set bits. Next, many graph algorithms need to iterate over something like neighbors(g, i) & neighbors(g, j) & ~neighbors(g, k)
. And now we have a problem: This needs to run without allocation of temporaries. The broadcast machinery is good at that, so lazy.(neighbors(g, i) .& neighbors(g, j) .& (~).(neighbors(g, k)))
is a sensible syntax (cf related discussion, with awesome future @:
syntax by @tkf ).
I am too stupid to get the resulting Broadcasted
-view-construction in a way that avoids allocations for typical operations (all
, any
, iterate
, count
). The view
is a custom type, not a standard view (because it needs to use the logical indexing code for iteration).