TL;DR
This proposal outlines a method to achieve space and time-efficient writable substructures in Julia. The memory layout is identical to that of immutable structs within mutable structs. This approach allows to express that two objects with object identity have identical lifetime. It integrates seamlessly with Julia’s existing language mechanisms.
The proposal requires the introduction of one additional (or reused) keyword before fields and an extension of the new
syntax semantics. The proposal’s effectiveness is further enhanced with the introduction of a new Memory
type.
I apologize for the length of this post. If this is not the appropriate place to store the proposal, please direct me to the correct location.
Goal
I would appreciate any feedback and improvement suggestions. Since this proposal likely affects core parts of Julia, only a few people might be able to drive this change. If there are no major obstacles and the idea seems worthwhile, I am interested in knowing whether one of the core develops could consider pursuing this idea.
Motivation
My team uses Julia for real-time signal processing. We love using Julia to quickly prototype ideas and then gradually refine the successful ones until they are real-time capable, without needing to port them to another language.
Our main data structures often consist of multiple levels of nested structs, with some fields (both leaf and non-leaf) being arrays. These data structures are typically created at initialization time, i.e., after system startup and before entering the real-time phase. During the real-time phase, efficiency is key. This includes having an optimized memory layout for cache locality and SIMD, efficient interoperability with C
, and avoiding allocations.
Current Status
For isbits types, StaticArrays
is great. However, most of the values in our data structures need to be modified during the real-time phase. This works well when StructArrays
is appropriate, for small immutable leaf structures by creating a new instance, and for slightly larger types with Accessors
.
However, performance degrades significantly when the data structures are larger, even with Accessors
. Using a mutable struct
or an Array
is often not an alternative, as it adds an indirection via the reference, severely reducing cache locality. Additionally, it often disrupts C
interoperability by changing the memory layout and not inlining the (sub)structure or array into the parent struct.
Earlier Attempts
@andyferris implemented modifying non-isbits types in MArray
to allow modification of an array where the references are stored inside of a struct. However, he had to remove the implementation, as the conclusion was that there is no way to implement it correctly.
@jameson’s Julep to modify immutables when part of mutables would have helped, but he closed it as not planned
. While I got the impression that the language semantics would be sound by effectively making a previously immutable part of a mutable struct, I am not yet settled on whether it is a good idea to effectively make an immutable type mutable in this context. This is because immutables types are not designed with mutability in mind, which might result in missing modification methods like setindex
or by violations of constructor invariants.
Semantics
Julia provides ease and expressiveness, not just good performance. Instead of looking for an efficient implementation, let’s provide the means to adequately describe the problem so that the compiler can create efficient code.
The earlier attempts seem to have focused on the ability to modify immutables (see above). Yet, the defining property of mutable structures is that they have object identity. Many of the larger types (as in Base.summarysize
) have object identity. I suggest first focusing on mutable composite types and then checking later whether there remains a need to extend immutables inside mutable types.
Problem Statement
Let’s define two types Part
with object p
and Whole
with object w
. Both types have object identity (= mutable struct
). Part
makes sense on its own and as part of other types. One of these other types is Whole
. We want to express that p
is coupled with w
for their whole lifetimes. Currently we cannot express this.
With
mutable struct Whole
part::Part
end
we can express that p
and w
- are created in any order
- are (weakly) coupled temporarily when
p
is referenced byw
, beginning either whenw
is constructed or later bysetfield!(w, :part, p)
, and - finalized in any order, either when still (weakly) coupled or after a call of
setfield!(w, :part, p_other)
, withp_other
being a different object ofPart
, and both objects (happily) living their object life.
By adding const
as in
mutable struct Whole
const part::Part
end
we can express a stronger coupling, i.e. that
p
is created beforew
,p
andw
are still coupled at the end of the life ofw
andp
can outlivew
.
While I am happy for p
, this does only help so much in expressing our problem.
Proposal
So we don’t tell the compiler the exact problem and wonder why it does not find the best solution? Let’s fix it and define a keyword which expresses that the lifetimes of p
and w
are fully coupled, i.e.
p
andw
are created at the same time,p
andw
are always coupled during both of their lifetimes andp
andw
are finalized at the same time.
I can imagine this keyword to be coupled
, static
, inline
, assimilate
, or something similar. If it needs to be an already reserved keyword, local
and import
seem to be possible choices, though ignoring their usual meaning might not be the best idea. Let’s use inline
as a placeholder in the following examples. So this would look like
mutable struct Whole
inline part::Part
end
That wasn’t too difficult, was it? But what does it mean? Just adding a keyword, i.e., increasing code length, doesn’t bring much benefit unless you are paid by the number of characters in your code.
Memory Layout
As the lifetimes are now fully coupled, p
and w
do not need two different memory regions with a reference connecting them. So instead of two smaller allocations we guarantee to have one larger allocation, which is more efficient. Nice.
But we should not stop here. As p
and w
are now part of the same memory region anyway, the memory of p
can be put inside of the memory of w
exactly the same as it would be done if p
were immutable.
This is how this would look like:
mutable struct Part
b::UInt16
c::Int16
end
mutable struct Whole
a::Int32
inline part::Part
end
As this is guaranteed to be inlined, sizeof(Whole)
will return 8
(instead of 16
on x86-64-bit due to the pointer and alignment).
Object identity
But mutable composite types also need to have stable memory addresses. Does p
have a stable memory address? Sure. It can be easily calculated by
pointer_from_objref(w) + fieldoffset(Whole, :part)
using
Base.fieldoffset(x::DataType, sym::Symbol) = fieldoffset(x, findfirst(:part |> ==, Whole |> fieldnames))
As fieldoffset
is constant per type, it follows, that object identity of Part
is preserved as for Whole
objects w1
and w2
the following is valid:
addr(w1.part) === addr(w2.part) ⟺ addr(w1) === addr(w2)
with
const addr = pointer_from_objref
Lifetime
Additionally, tracking the lifetime of objects is crucial. Conceptually, this should be easy, as either every usage of p
counts as a usage of w
and lifetime ends when w
can no longer be reached, or usage of p
and w
is attributed separately and lifetime ends when both p
and w
can no longer be reached.
Abstract types
Obviously, inline
will only be defined if Part
is sufficiently concrete. I don’t know where to draw the line exactly, but allowing everything which has sizeof
defined might make sense. Disallowing isbits
types could make sense, as the inline
would be ineffective and we probably want to avoid cargo cult programming which puts inline
everywhere.
Relationship to const
As shown above, inline
is a stronger const
and in that sense, they are mutually exclusive. On the other hand, const
with immutable structs makes the fields of that struct constant in the context of the defining struct, which is a useful property. I see no reason why it should be a different case for mutable structs, therefore
mutable struct Whole
inline const part::Part
end
should behave like the const immutable struct pendant.
Recursive Definition?
In
mutable struct C
d::Int
end
mutable struct B
c::C
end
mutable struct A
inline b::B
end
should c
also be inlined in a = 42 |> C |> B |> A
, i.e. should the keyword work recursively, i.e. should c
(and therefore d
) also be inlined into the memory region of b
(and therefore a
)?
Coupling lifetimes of a
and b
does not logically couple lifetimes of a
and c
(or d
for that matter). Staying with the tradition that the answer of a question in a header is always “no”, that’s why inline
should not work recursively. Additionally, const
does not work recursively (which you can easily try yourself), so this is the consistent solution. Even though deep inlining is probably wanted in the majority of cases when using inlining, it can easily be created by defining additional types (as done with const), e.g.,
mutable struct StaticB
inline c::C
end
and use that one instead in A
. All methods of B
can easily be reused by StaticB
if they are defined for an abstract super type which B
and StaticB
inherit from. If the definition of both types gets tedious, a macro can create the non-static type based on the definition of the static type.
Object creation
Up to now, we have assumed that the objects just exist. Now let’s examine how to construct them. All objects ultimately begin their existence with a call to new
, as even an outer constructor ultimately calls an inner constructor, which might be the default constructor.
Mutable Type
If Whole
contains a (weakly) coupled Part
as in
mutable struct Whole
part::Part
end
p
will be created sometime before w
is actually created. This is true even if the creation of p
and w
are packed as tightly as possible:
Whole() = new(Part())
Although p
is created immediately before w
, technically the creation of p
somewhere in memory is completed first before the creation of w
begins somewhere else in memory, independently of p
. This approach cannot work for our proposal as it does not align with the memory model.
Immutable Type
If the existing construction of mutable types cannot be reused for our proposal, let’s consider whether we can reuse the approach taken for immutable types. Let’s temporarily use an object i
of an immutable type Immutable
in Whole
:
mutable struct Whole
immutable::Immutable
end
Regardless of whether the i
to be saved in immutable
is constructed before the call of the Whole
constructor, inside of any inner or outer constructor, or inside the new
call, i
is conceptually created before w
. However, in this case, i
is copied into w
when w
is constructed, as w
lacks object identity.
Since we want to address a w
with object identity, we cannot reuse this approach for our proposal.
New Concept
The concrete types of all inline
fields are known. However, there can be multiple inline
fields with the same type. We need to ensure that the coupled object creation time is managed correctly and that we do not alter object identity implicitly.
There are several options for how the user interface for creating mutable structs
with inline
fields could look like. Let’s group them by control flow.
Defining p
before creating w
We can define the creation of p
before the creation of w
if we either explicitly or implicitly delay the creation of p
until w
is created. The compiler needs to ensure that p
is not used before w
is created. This can be done implicitly by backtracking the arguments to new
that correspond to inline
fields until their creation. Alternatively, it can be done by annotating the call that creates p
and forward tracking the inaccessible variable. A more Julian approach might be to handle this explicitly,m either using a closure or by creating a special type like LazyConstructor
that wraps the constructor call of p
.
All the implicit solutions would need to be limited in scope, e.g. limited to the currently active call of the Whole
constructor or to any constructor call of Whole
.
All approaches allow for quite some flexibility. But the question is whether the resulting complexity is worth it.
Defining p
when creating w
By only allowing the constructor call of p
inside of the new
call of Whole
we lose some flexibility, but the logic becomes simpler. The compiler can request the additional memory for Part
in the new
call of Whole
, then follow the constructor call and handle the new
call of Part
specially by not requesting additional memory but using the already allocated memory.
As the inlined field’s type is specific, it would be possible to make this changed behavior explicit by, for example, using new
instead of Part
or just providing the Tuple
of the arguments. However, this approach might be more confusing than helpful.
A constructor can call new
more than once.
mutable struct Part
a::Int
Part() = (new(41) |> println; new(42))
end
It needs to be clarified whether this should be forbidden when used as an inlined field or whether the compiler needs to track the call of the constructor’s return object.
In any case, it is important to ensure that the construction of p
remains efficient. Would some kind of implicit dispatch/specialization on inline
versus non-inline
in the constructors and/or the new
call work? This could involve either an additional keyword or just a different function name.
Defining p
after creating w
It is also possible to avoid defining p
inside the new
call in Whole
by either leaving the argument out or using _
for the inline field argument. In this case, w.p
would need to be assigned by a constructor call (or optionally by some kind of copy
) before the constructor ends. This assignment can occur either in the current constructor call or in any constructor call of Whole
on the stack.
Suggestion
It might be best to ensure that p
is created by a call within the new call of Whole
. If the user attempts something too complex, such as creating multiple objects in the constructor of Part
, Julia can generate an error indicating the issue.
Arrays
The above functionality is already excellent and works on its own, but it becomes even more powerful when arrays support the new mechanism. To achieve this, we likely need a new type StaticMemory{L, T}
(temporary name) as the static version of Memory{T}
, where L
represents number of elements and T
represents the type.
Memory
has a length
field per object, while StaticMemory
stores its length in the type. You can think of Memory
and StaticMemory{L, T}
being defined as follows:
mutable struct Memory{T}
const length::Int
const ptr::T[] # pointing at the position after the object
end
mutable struct StaticMemory{L, T}
inline data::T[]
end
This pseudo-code uses pointer/array syntax. StaticMemory
does not point to an object of type T
; instead, it includes them in its own memory.
Building on this type, other types like WVector
(a writable vector similar to MVector
) that support modifying non-isbits types can inline a StaticMemory
object on their own. This is possible because they all have object identity and are thus covered by the proposal.
As a beneficial side effect, you can create an Array type that requires only one allocation, compared to the typical two allocations of, for example, Vector
. However, the type is limited to a fixed size or, at least, cannot grow beyond a fixed size.
In this context, the proposal sits between FixedSizeArray
(which builds on Memory
) and MArray
(which builds on Tuple
). With this additional proposal, there are three basic memory types in Julia:
Memory Types | Length | Mutable |
---|---|---|
Memory |
Run-time | Yes |
StaticMemory |
Compile-time | Yes |
Tuple |
Compile-time | (mostly) No |