Clarification about memory management of immutable and mutable struct

Hello,

I’ve been playing around with Julia for awhile and am a big fan of the language overall. Recently, I’ve been doing a bit of work to construct an immutable collections and functional utilities library (work in progress, but source is here). A few core pieces are finished at this point so I’ve been testing the performance. Without going into details, what I’ve found is that my persistent dictionary (PDict) has a performance on par with the Julia native Dict type, with respect to haskey, in, and get operations (PDict queries take about 2.5 times as long as Dict queries), but is 2000 times slower for operations like insert. While I’m not surprised that the persistent operations are slower, I am surprised that they are this many orders of magnitude slower (for reference, clojure’s persistent map performs inserts in about 4x the time required for Julia’s mutable dicts, in my tests). I don’t claim to have a perfect implementation of the insert operation, but the size of the difference got me wondering about implementation details, and, long story short, I have a few questions about how structs and mutable structs actually work under the hood.

In particular, I’m wondering how the memory for structs gets handled. I’m not completely sure how Julia handles the movement and/or allocation of data structures on the heap and stack, but some of the documents and questions/answers I’ve read seem to indicate that immutable structs are allocated on the stack and are always passed by value rather than reference. If this is really the case, then many of the performance advantages of objects like immutable dictionaries will be entirely wiped away—a big advantage of immutable dictionaries is that you need never copy them (it will never change so just pass it by reference). If this is the case, then it would be more efficient to declare them as mutable structs so that they can live on the heap.

Similarly, if I declare a struct that itself contains a field that is of a concrete immutable type, does this type get included in the structure by reference or by value? If by value, then can I work around this by declaring the field to be of an intermediate abstract type? Would this even be useful?

Is there any documentation of these issues? I would also be happy to get a pointer to where in julia/src this gets hammered out.

Thanks in advance.

4 Likes

My understanding is that the placement of immutable onto the stack (or heap) is generally viewed as an optimization: meaning that whether it is or not depends on the current state of the optimizations applied by the compiler.

If you want to avoid making copies of immutable structs you could use Ref

1 Like

As soon as you have mutable fields in your immutable it will be heap allocated, at that point you can check for yourself what’s going on as you pass it through functions by looking at pointers I think. Base.pointer_from_objref should help you out.

From what I understand everything that’s always allocated on the heap will be passed as a reference rather than by value, since how would it know exactly how to make those copies. Please correct me if I’m wrong, this is partly from my own experience testing some behavior that was discussed here: Memory layout of structs with mutable fields.

AFAICT you are using abstract types. That’s the most likely issue. You seems to be encoding size in the type which means any insertion is guaranteed to change type so doing so in a loop is guaranteed to be type unstable no matter how you implement it. The only difference the implementation can make is to control where the type instability is.

No. They can be allowed anywhere or not at all and they can be passed by reference. (This is about machine code level. They are always passed by reference at semantics level).

The object layout is indeed something that makes sense to take about (unlike “where is an object allocated”). Field of isbits types will be stored inline. They are still semantically stored by reference though.

Which is exactly who you can kill preformance from type instability.

Exactly.

This could help in some cases yes.

No.

Doing that is illegal. You can never call pointer_from_objref on a immutable object and expect a valid result. The compiler might feel like giving you something that seems like the correct result but it’s only because it doesn’t want to spend the extra effort to give you garbage.

This is partially correct as of the current implementation if you replace “always allocated on the heap” with non-isbits. The reverse is not true.

8 Likes

Yea of course, hence the “at that point”, i.e. starting from the assumption it’s always heap allocated. But thanks a lot for the helpful

without further explanation. Would be great if you could at least give a counterexample.

Contrived example:

julia> struct X
         y::Base.RefValue{Int}
       end

julia> f(i) = X(Ref(i)).y[]

julia> @code_llvm f(1)

;  @ REPL[6]:1 within `f'
define i64 @julia_f_15996(i64) {
top:
  ret i64 %0
}
2 Likes

Nice thanks a lot for this! However, if I look at a vector of these, making it a RefValue{Tuple{Int,Int}} I see that they are not stored contiguously (only pointers/refs are there), so where are the actual tuples being allocated?

Also in your example, making the X mutable doesn’t change anything (still on the stack I guess?).

The Vector is allocated on the heap and the Ref has to also go onto the heap. The issue is that when you pass the Ref to some other piece of code it may escape so both the Vector and the Ref need to be able to be tracked independently.

Yea that makes sense. So looking at the above my statement was wrong, I clearly need to reread some of the previous explanations, sorry about that.

Thanks everyone, this is very useful! I do have a few followup questions based on the discussion here, though.

AFAICT you are using abstract types. That’s the most likely issue.

This is correct, but are you saying that declaring a field to be an abstract type with a handful of concrete descendants should result in a 200x slowdown? (By the way, I miscounted the number of decimal places in my previous profiling–200x slower not 2000). Clojure (on the JVM) uses abstract types in its implementation of persistent maps (my implementation is closely based on clojure’s and should in theory have the same overall complexity for insertion). Further, clojure’s type hierarchy, on which method dispatch occurs, allows for much more complexity than Julia’s. Is this really something that the JVM is ~100 times faster at than llvm/Julia? Or is this likely just a fraction of the performance drop I’m seeing? Can anyone estimate this?

You seems to be encoding size in the type which means any insertion is guaranteed to change type…

This is true for the PVec type, but I’m profiling the PHashDict type, which only uses chunked vectors of size 32 (so the PVec32 type, which is the type assigned to the appropriate field in the PMap32 type, which in turn is used to map hash values to key-value pairs by PDict). There is also an abstract type (PMapNode) that stores the key-value pairs in the leaves; using an abstraction here wasn’t the design I wanted, but it was a necessity given that a persistent dictionary (implemented as a hash array mapped trie, here) needs to be able to store empty nodes. An alternative would be to declare that a node in a PMap{T} is of type Union{Nothing, T}, but I assumed that using Julia’s dispatch system on an abstract type with two or three concrete descendants would be faster than writing my own switch on a union type? Does anyone have suggestions for how to implement such a system without a substantial performance loss?

With regards to the PVec type, it’s true that the number of elements for small vectors (length <= 32) is encapsulated in the type (though larger vectors are made up of vectors of smaller vectors so that updates can be more efficient than O(n)). Again, this isn’t the implementation I wanted, but AFAIK there are no other options in Julia aside from giving up immutability. Are there any immutable collections in Julia that don’t require that the number of elements be declared at compile time? Tuple and NTuple require that the number of elements be part of the type parameters, and an immutable struct has a constant number of fields. I suppose I could use Tuple without type parameters, but would this be any more efficient than using an abstract type? There isn’t an immutable array type I’ve missed is there? Or is this the kind of thing that can only be added to the language with a C module?

As an addendum, I think Julia’s type system and method dispatch are both awesome–Julia seems like one of the few modern languages whose design enables and could really benefit from the kind of immutable collection libraries clojure, scala, haskell, and a few others have employed to great effect, especially for multi-threaded programming. I’m curious if this is something the Julia core developers intended or see as desirable?

That amount of slow down is very typical for runtime dynamic dispatch.

Iirc your abstract types have parametric subtypes so they have infinite many subtypes. Having a finite number of subtype should allow faster dynamic dispatch and a union is the simplest way to tell the compiler about it. You don’t have to do any dispatch yourself. You just need to change the field type if you know it will be only of a finite number of concrete types.

1 Like

Iirc your abstract types have parametric subtypes so they have infinite many subtypes. Having a finite number of subtype should allow faster dynamic dispatch and a union is the simplest way to tell the compiler about it.

Oh, hmm, I hadn’t thought of this… is this true even though my nodes in PMap{T} are declared to be of type PMapNode{T} (i.e., with the type parameter included)? So a PMap{Int} contains PMapNode{Int} nodes (using abstract type PMapNode{T} with, e.g., PMapEmptyNode{Int}() being one of the possible values a node could store).

Thanks!

If none of the subtypes have type parameters that are not fixed it should be fine, though it will be harder for the compiler to figure that out and generate code for.