Julep: Efficient Hierarchical Mutable Data

Why not guarantee this to be inlined without any new keyword asking for it?
Wouldn’t that solve your problem, or would you then want a way to opt out of this?

Indeed, inlining and non-inlining needs to be available both for C compatibility and for performance reasons. Let’s look at the following C code:

struct part {
    int c;
    char d;
};

struct whole_inline {
    int a;
    struct part p;
};

struct whole_non_inline {
    int a;
    struct part* p;
};

Both whole_inline and whole_non_inline can exist and need to be supported for C compatibility.

Regarding performance: If part is part of a continuous memory access pattern, inlining helps. If it is not (and sufficiently large), it might be better to just store the pointer and get the memory of part out of the way of the continuous memory access pattern (better having a gap of 4 or 8 bytes than a larger gap).

I can see the status quo might be easier on the compiler and this would make for longer compilation times

The proposal might be implemented by dispatching/specializing each constructor on Union{Ptr{Part}, Inline, NoInline}.

  • Inline: Allocate a larger memory region (as in Base.summarysize). This logic should not be slower than the existing logic for inlining mutable structs. Some additional accounting might be needed, though.
  • Ptr{Part}: Do not allocate but reuse the provided memory pointed to by Ptr{Part}. Some accounting might be different or might be obsolete.
  • NoInline: Same behavior as now.

Therefore, if no inline is used, compilation speed should be mostly unaffected. If it is used, especially with mixing inlining and non-inlining, more constructors need to be compiled and that will take additional time (as always when using more dispatch/specialization).

I think this is a valid tradeoff: More runtime-optimization can lead to longer compilation time.

I can foresee we might want to allow the compiler to change order of (those super) structs, as is allowed by Rust

This is a compelling idea. However, due to C compatibility (or to anything else) this would need to be optional and would indeed be a bigger independent change.