Control where mutables are allocated in a very special case: a large number of mutables of the same type

zsunberg · July 3, 2021, 3:17am

In my research, I often find myself incrementally creating large graphs or trees where each node has some properties that might get updated later.

Semantically, it makes the most sense for each node to be represented by a mutable object. But, since mutables are heap-allocated, this is very slow. What I want to do is to set aside a big chunk of memory for these nodes (essentially another stack). Then, when I create a new one, it should be trivial to find space to allocate it at the end of the occupied portion of this chunk (also perhaps there should be special gc behavior).

Is this possible now? Is it a change that could be added? (As far as I can tell, it would not change any semantics, except right where the object is constructed)

Another way of saying this is that I want a single special source vector, where the mutable objects are actually stored contiguously in memory like immutables are, but any other reference to them is just a pointer.

(Currently I use an Int to represent each node and a vector for each of the properties (IIUC what MetaGraphs does), but this results in much harder-to-read code)

anon56330260 · July 3, 2021, 4:23am

What you request is basically inline-stored mutable struct, which is currently not officially supported by Julia.

Before you try to design your own memory pool in Julia, note that Julia already has it’s own memory pool, so if you believe that having your own pool might make allocation (much) faster, you might be disappointed, unless your allocation has some special patterns.

But if you just want inline-stored mutable, you can twist pointers in Julia, just like in C++/C (Warning: this is as dangerous as in C++, before you copy the following codes, wall through them carefully). Let’s assume that you will to implement a linked list, which is essentially:

mutable struct Node
    x::Int
    next::Node
end

To store it in a contiguously, try following code:
Firstly we define a immutable struct Node. It will be stored inline in the array. Then we define a MNode, which is essentially a pointer to Node.

# wrap Node as a mutable struct
struct MNode
    ptr::Ptr{Nothing}
end

struct Node
    x::Int
    next::MNode
end

then define getter and setter to make life easier. So when we mutate or get the field of MNode, we are mutating or get the field of the content (that is, Node) it points to.

# delegate the getproperty to underlying Node
function Base.getproperty(p::MNode,i::Symbol) 
    n = Base.unsafe_load(Base.unsafe_convert(Ptr{Node},getfield(p,:ptr)))
    getproperty(n,i)
end

# delegate the setproperty to underlying Node
@generated function _setproperty!(p::MNode,::Val{N},x::V) where {N,V}
    T = Node
    for i = 1:fieldcount(T)
        if fieldname(T,i) == N
            if fieldtype(T,i) != V
                return :(error("Type $T has a field named $N, but type of it is not $V"))
            else
                off = fieldoffset(T,i)
                return :(Base.unsafe_store!(Base.unsafe_convert(Ptr{$V},getfield(p,:ptr)+$off),x,1))
            end
        end
    end
    :(error("Type $T has no field named $N"))
end

function Base.setproperty!(p::MNode,name::Symbol,i) where T
    _setproperty!(p,Val(name),i)
end

Finally, build an allocator, which is essentially a stack of Node. When we allocate from the allocator, we return a MNode which points to the Node. Be careful of deallocate!, it does nothing more than substrate the stack pointer! (Note, if you deallocate element in arbitrary order, you need to implement your own memory allocator)

mutable struct NodeFactor
    # don't resize v! since it will cause element reallocating
    v::Vector{Node}
    stackhead::Int
    size::Int
    ptr::Ptr{Node}
    function NodeFactor(c::Int)
        # create a large vector
        v = Vector{Node}(undef,c)
        # get the underlying pointer of vector
        ptr = Base.unsafe_convert(Ptr{Node},v)
        return new(v,0,c,ptr)
    end
end
  
function allocate!(n::NodeFactor)
    if n.stackhead > n.size
        error("run out of space") 
    end
    # return new pointer
    newptr = n.ptr
    # move stack pointer to next place
    n.stackhead += 1
    # update pointer to next place
    n.ptr += sizeof(Node)
    return MNode(newptr)
end

function deallocate!(n::NodeFactor)
    if n.stackhead == 0
        error("double free!") 
    end
    n.stackhead -= 1
    n.ptr -= sizeof(Node)
    return
end

Then you can now test the allocator!

nf = NodeFactor(100)
# allocate three nodes
n1 = allocate!(nf)
n2 = allocate!(nf)
n3 = allocate!(nf)
n1.x = 1
n2.x = 2
n3.x = 18
n1.next = n2
n2.next = n3
println(n1.next.next.x == n3.x)

You should get true here.

lmiq · July 3, 2021, 10:34am

All that is very interesting and instructive, thanks. But maybe it is important to mention to the OP that in this example, an perhaps in his example as well, it is much easier to just let this Node to be immutable, store an array of those and just replace the entire node instead of mutating it.

I would suggest posting more details about the structure of the problem, to obtain more specific suggestions.

An immutable struct containing a mutable-static vector maybe, with overloaded getindex, or getter functions can be practical, maybe.

struct Node
  index::Int
  properties::MVector{2, Float64}
end
myfirstprop(n::Node) = n.properties[1]
etc

Then store an array of those.

anon56330260 · July 3, 2021, 3:20pm

Then MVetor is again allocated on heap. This is not working.
The core problem here is that, each mutable struct has their own identity, so if their lifetime is undetermined, that is, longer than a function call, they must be managed by GC and allocated on heap individually, or you manage memory manually.

lmiq · July 3, 2021, 3:33pm

Trying to learn here. What would be the disadvantage of a vector of immutable structs? Only if the number of fields is very large I see a possible problem, otherwise just replacing the “nodes” in the vector by new ones seems good enough, isn’t it?

(There are some packages like StrideArrays.jl that appear to aid the management of large mutable arrays in the heap, but I don’t know much about them)

zsunberg · July 3, 2021, 4:19pm

@ChenNingCong , @lmiq Thanks! However I am not quite satisfied, and greed is an important julian value

ChenNingCong:

nf = NodeFactor(100)
# allocate three nodes
n1 = allocate!(nf)
n2 = allocate!(nf)
n3 = allocate!(nf)
n1.x = 1
n2.x = 2
n3.x = 18
n1.next = n2
n2.next = n3
println(n1.next.next.x == n3.x)

Syntactically, this is very close to what I am looking for - thank you! But I think overloading setproperty is somewhat misleading and there is a lot of overhead for creating new types. If I allow myself to overload setproperty, there are other solutions that don’t involve pointers. What I really want is something to work like this:

mutable struct MyNode
    x::Int
    next::MyNode
end

p = SpecialPool(MyNode, 1000)

n = @alloc_hint p MyNode(1)

Yeah, you may be right. It seems like a big block of objects of the same type that all get deleted at the same time would be a pattern that could benefit. I do not know enough about garbage collection to know if this would actually help though.

Thanks! I am aware of this technique . Again it requires a lot more awkward code than the @alloc_hint syntax above. The linked-list is a good enough example for discussion. A real example is here: GitHub - JuliaPOMDP/ARDESPOT.jl: Implementation of the AR-DESPOT POMDP algorithm

zsunberg · July 3, 2021, 4:24pm

With SetField.jl it would look something like this:

nodes[id] = @set nodes[id].a = 3

You have to keep track of node id’s. It just results in more convoluted code.

jzr · July 3, 2021, 6:35pm

Is this what you’re after?

zsunberg · July 3, 2021, 10:04pm

@jzr thanks!! that is basically what I’m after! It has a few rough edges, but I think it will work. Time to get down to benchmarking

anon56330260 · July 4, 2021, 1:28am

Dose this package actually work for mutable types? I don’t think so. Mutable types have different memory layouts. Use Ptr{MutableTypes} won’t work as you might expected…
Also, grow is implemented incorrectly. It doesn’t even try to free the memory. As long as you try to implement your own grow function, you can quickly see why mutable types should not be placed inline in array. If they are placed inline, you can’t grow the array by simply replacing the old one by allocating a new one, because you can’t move the mutable types to the new array (unless there is no other values holding the pointer to them, but how can you know that). You can copy immutable values but not mutable ones.
Instead, you should keep pages of memory. When you grow the allocators, you allocate a new page if all pages are occupied. However this will also cause problems, because you need to scan the pages to determine whether there are free pages. How to implement a fast and correct one allocators is not easy…

jzr · July 4, 2021, 2:39am

I haven’t used the package, I only know about it from reading this old benchmark thread Julia programs now shown on benchmarks game website - #54 by Orbots

jzr · July 4, 2021, 5:41am

There is also GitHub - RelationalAI-oss/Blobs.jl: Binary blobs with on-the-fly pointer patching. I’d be interested to hear if you have an assessment of this one as well.

Topic		Replies	Views
Why mutable structs are allocated on the heap? General Usage question	36	7945	September 26, 2024
How is memory allocated for a vector of structs? Performance question	9	422	June 7, 2024
Continguous memory allocation for numerous instances of compound types General Usage	8	790	June 26, 2017
Container of mutable structs without 100x slowdown? Performance struct , accessors	7	1209	September 1, 2022
Initializing an array of mutable objects is surprisingly slow Performance performance	8	2250	November 7, 2019

Control where mutables are allocated in a very special case: a large number of mutables of the same type

Related topics