Updating an array of immutables -- unnecessary (?) memory allocation

I’m trying to update an array comprising a simple immutable type using the macro @set! from Setfield.j, see the following simple example:

using Setfield
struct A
    a::Int64
    b::Int64
end
a_array=[A(2,2) for i in range(1,10)]
function update_me(a_array)
    len_a=length(a_array)
    for i in range(1,100000)
        pos=rand(1:len_a)
        new_a=rand(1:10)
        @set! a_array[pos].a=new_a
    end
end
@time update_me(a_array)
@time update_me(a_array)
0.036304 seconds (303.41 k allocations: 6.284 MiB)
0.025393 seconds (300.00 k allocations: 6.104 MiB, 25.07% gc time)

It appears that @set! allocates a considerable amount of memory? Is that expected?
I can switch to a mutable type, but it has a larger memory footprint.

Is there a way to update the above array without a memory allocation penalty? or alternatively, reduce the amount of memory allocated for a mutable type?

The allocations are coming from Setfield.jl itself because it’s trying to be a bit too clever. Inside the function called by @set!, it does:

    if applicable(setindex!, obj, val, l.indices)
        setindex!(obj, val, l.indices...)
    else
        set(l, obj, val, ForbidMutation())
    end

and that applicable() check allocates memory:

julia> @btime applicable($setindex!, $a_array, 6, $(5,))
  33.843 ns (2 allocations: 32 bytes)
true

This kind of reflection inside the inner loop of your algorithm is going to be pretty expensive.

The right solution would be to fix Setfield.jl to not allocate memory just trying to decide what algorithm to apply. Or in the simple case, you could manually do what Setfield.jl is supposed to be doing:

julia> function update_me(a_array)
           len_a=length(a_array)
           for i in range(1,100000)
               pos=rand(1:len_a)
               new_a=rand(1:10)
               a_array[pos] = A(new_a, a_array[pos].b)
           end
       end
update_me (generic function with 1 method)

julia> @btime update_me($a_array)
  4.722 ms (0 allocations: 0 bytes)

(using @btime from BenchmarkTools.jl for more accurate timing without having to run the macro twice).

1 Like

I don’t know what @set! does, but the canonical solution for single-threaded code / simple structs is a_array[pos]=A(new_a, a_array[pos].b)

The canonical solution for very large structs, or multi-threaded code where you absolutely must not load-store the b field for consistency reason is

apt=pointer(a_array,pos)
unsafe_store!(convert(Ptr{Int64},apt), new_a,1)

Under most circumstances, both will compile to exactly the same thing. The second variant only really becomes simpler if your struct contains a very long tuple, and you compute the field-index for the update.

Thanks for the quick and informative replies.

  1. Do you see a quick fix for Setfield.jl?
  2. If one considers a more complicated struct that contains many fields (a,b,c,d…), is there a canonical way to change a specific field? If I understand correctly your answers, it appears that one must write a dedicated function for each field.

Another related question – what is the logic for not allowing to directly update fields of an immutable object? It seems like this is eventually possible while using a rather complicated mechanism.

Setfield.jl would need to be @generated to get the pointer offsets; in other words, it would need to be almost completely rewritten. On 0.62:

julia> using Setfield
julia> using BenchmarkTools

julia> mutable struct foo{N}
       vals::NTuple{N,Int64}
       end
julia> function upd(t,v)
       @set! t.vals[50]=v
       nothing
       end
julia> t=foo((collect(1:1_000)...,));
julia> @btime upd($t,$1);
  48.888 ms (495421 allocations: 30.24 MiB)
julia> function upd_pointer(t::foo{N},v) where N
       assert(N >= 50)
       ptr=convert(Ptr{Int64},pointer_from_objref(t))
       unsafe_store!(ptr, v, 50)
       nothing
       end;
julia> @btime upd_pointer($t,$1);
  6.127 ns (0 allocations: 0 bytes)

To generalize your solution, is it possible to detect the memory offset given a field name?

To generalize your solution, is it possible to detect the memory offset given a field name?

Yes, using fieldnames and fieldoffset (in @generated functions). But getting all cases correct is kinda lengthy and bug-prone, especially if you want to avoid the quadratic runtime of many tuple operations. Also, I still don’t understand the new pointer-tbaa rules in 0.7.

For convenience, I’m using

julia> @generated function getfieldoffset(::Type{T}, ::Val{fieldname}) where {T,fieldname}
       fns=fieldnames(T)
       for i=1:length(fns)
       if fns[i]==fieldname
       return :( $(fieldoffset(T,i)))
       end
       end
       error("$(T) has no field named $(fieldname)")
       end;
julia> getfieldoffset(typeof(1:5), Val{:stop}())
0x0000000000000008

julia> getfieldoffset(typeof(1:5), Val{:step}())
ERROR: UnitRange{Int64} has no field named step

Thanks for the detailed reply.
I might be missing something fundamental – I don’t understand why should it take a herculean effort to achieve something quite basic that C/C++/Fortran do very well “out of the box”. Are mutable types the preferred choice in Julia, despite their larger memory footprint?

No, immutables are preferred. This is why starting in 0.6 they are called struct and mutable structinstead of immutable and type as they were in 0.5 and earlier.

I don’t know or understand the details or internals, but those who do have said the compiler has an easier time optimizing structs. That structs of isbits are also isbits and allocate no memory when creating them:

julia> struct tuplewrap{N}
           floats::NTuple{N,Float64}
           ints::NTuple{N,Int}
       end

julia> using BenchmarkTools

julia> @benchmark tuplewrap(ntuple(i -> 0.0, Val(20)), ntuple(i -> 0, Val(20)))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     57.336 ns (0.00% GC)
  median time:      57.337 ns (0.00% GC)
  mean time:        58.824 ns (0.00% GC)
  maximum time:     95.378 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     983

julia> isbits(tuplewrap(ntuple(i -> 0.0, Val(20)), ntuple(i -> 0, Val(20))))
true

are also really convenient.

And you don’t always necessarily want to hold them in an array / RefValue / field of a mutable struct.
But I agree. When you do, it’d be nice for it to be much easier / more convenient to mutate them.

In this case, with

struct A
    a::Int64
    b::Int64
end

the easiest thing would be to reinterpret a_array from an n length vector of As to a 2 x n matrix of Int64.

I do think like that much of this seems transparent, which makes it easier to understand and reason about. structs go on the stack and can’t be modified, mutable structs are references, so creating them allocates memory on the heap, but we can modify them.

I did also notice that small arrays in Fortran are stack allocated too (even when they’re of unknown size, with -fstack-arrays), but you have no problem editing their contents.
On that note, when it comes to editing contents, I found that for Julia code the first thing I have to do is actually extract every single element from the arrays, like (eg, with Base.Cartesian.@nextract):

U_1 = U[1]
U_2 = U[2]
...
U_n = U[n]
# do work
U[1] = U_1
U[2] = U_2
...
U[n] = U_n

or performance would be crippled. With Fortran, it did not seem to make any difference at all. But I’m guessing all that means is that gfortran was more aggressive about creating temporaries and rearranging my operations/indexes to get the same effect.

Granted, code where you’re constantly mutating things can be ugly.

The thing that really shines in julia is when you handle arrays of small isbits structs, or “genuine mutables” (where you would call malloc in C); you then get C speeds with python-like ease of coding.

But yes, this is a gripe that afaik many core devs share, and there are PRs languishing to fix parts of this (e.g. https://github.com/JuliaLang/julia/pull/21912).

Until then, consider cxx.jl and just writing parts of your stuff in C. After mutating heap-placed isbits immutables, the next missing feature (for me) are stack-allocated “mutable bitstypes”, i.e. the classic char buf[20];. The contortions you have to go through to simulate this are not funny.

Thanks again for taking the time to answer!

I guess I’m missing some basic understanding of the difference between stack vs heap allocation in general and in Julia in particular.

If I understand correctly, mutable types are allocated on the heap. Does this imply that one must allocate an extra memory for the addresses of each array element? As far as I know, e.g in c++ allocating memory dynamically with std::vector<“some complicated object”> just allocates a contiguous chunk of memory and does not suffer from spurious memory allocation related to the address.

In my particular problem, I am allocating a relatively large array (most likely will not fit in the stack) and was hoping to eliminate the extra memory associated with the object address. Is that possible with a mutable object? Alternatively, can I allocate an immutable on the heap and then more easily mutate its content?