Is there any advantage of directly using `Memory{T}` over a traditional Array?

Hi, everyone. Correct me if I’m wrong, please:

I understand that an Array is implemented via a MemoryRef which contains a Memory object, with supertype DenseArray.

When an array is created, some memory is allocated. MemoryRef contains the address and Memory contains the elements. This Memory object has a fixed size, it cannot be reshaped. If we resize the original array, I observe that the address changes and a new memory object is created, with extra memory allocated.

If I know that my arrays are of fixed size, should I use Memory instead of Array directly? Is there any performance reason to use them or any reason to avoid them?

I haven’t found any comments about that on the documentation.

Cheers!

1 Like

FixedSizeArrays.jl uses Memory while conforming to the array interface (and giving you LTS 1.10 compatibility): [ANN] FixedSizeArrays.jl: What Array probably should have been

1 Like

Memory is one dimensional, if you need to use multidimensional arrays that won’t cut it.

FixedSizeArrays.jl, mentioned above, provides the linear algebra structure around Memory, similarly to Array, with the additional constraint that the shape is fixed after instantiation of an array, which in certain cases can enable more performance optimisations than Array.

That’s an interesting package, thanks!

However, I’d like to understand the difference of using Memory vs Array, as I stated above. Not just using a package that does it, if this makes sense.

Yes, but in Julia they are stored continously in memory in column order. I can always play with that if using an N-dimensional array.

  • Memory is a one dimensional contiguous chunk of memory, lacking the linear algebra structure
  • Array is a multidimensional, resizable array built around MemoryRef (almost same as Memory)
  • FixedSizeArray is a multidimensional, fixed-size array built around Memory

There are some other comparisons between multidimensional array types in the FixedSizeArrays.jl documentation.

Well, you need to mess up with the indexing. Which is what FixedSizeArrays does for you.

Memory is a very simple heap-allocated 1D buffer, used as a backend for a variety of data structures. Most of the Array API isn’t implemented for Memory, so you can’t use it as a substitute for Vector. The Memory backing a Vector doesn’t even mirror it:

julia> x = [1]
1-element Vector{Int64}:
 1

julia> x.ref.mem
1-element Memory{Int64}:
 1

julia> for i in 1:4 push!(x, 0) end

julia> x
5-element Vector{Int64}:
 1
 0
 0
 0
 0

julia> x.ref.mem
16-element Memory{Int64}:
             1
             0
             0
             0
             0
 2781483281008
 2781482320016
 2781482319984
 2781486417904
 2781486417936
 2781486417968
 2781486418000
 2781486418032
 2781486418064
 2781486417840
         16850

Look how much larger the Memory got than the higher-level Vector. push! doesn’t allocate every single call, it’ll allocate a larger Memory in a few calls so most calls only need to quickly tweak an element. The Vector is responsible for referencing the new Memory and specifying where to start and end. You can still build data structures on top of Array, but you’re locked into how Array allocates. Working with Memory is more work and more control.

Memory is slightly friendlier to the compiler optimizer. An example is how the heap allocation can currently be replaced with a stack allocation in more cases with Memory than with Vector. Related examples:

Other relevant examples:

That said, hopefully in the future Julia will get more powerful escape analysis, which I suppose would make Vector and Memory equivalent in this respect.

On the other hand, if a certain API requires Array specifically, as opposed to more general AbstractArray or DenseArray, then Memory does not cut it, or may require a conversion with an allocation.

1 Like

As the others said, Memory is a built-in that julia uses for a fixed size buffer.

Memory has an identity (i.e. like it were defined as mutable struct), but is immutable (all fields are const). The allocation underlying the Memory’s memory typically has a lifetime tied to the Memory’s lifetime.

This design is forced by the way GC works in julia: Our memory management requires a julia object with identity (mutable struct), allocated on the heap, with proper object header; but the actual memory needs to potentially have its own object header, managed by your allocator (e.g. glibc ptmalloc) that the julia runtime does not understand (you could use allocators via LD_PRELOAD that we never heard about).

This of course sucks: If you have mutable struct F mem::Memory{...} end then accessing f.mem[idx] needs to chase one too many pointers (first load the address of mem, then load the Memory object to obtain the pointer to f.mem[idx], and finally load the index).

The standard julia approach is to use “fat pointers”, aka MemoryRef. A MemoryRef contains both a pointer to the data, for quick access to the data, and a reference to the Memory instance, for memory management purposes. So it could look like struct MemoryRef{T} ptr::Ptr{T}; mem::Memory{...} end. [*]

Crucially, MemoryRef is immutable / has no identity. It is supposed to be inlined into your structs / arrays.

So, the TLDR is:
If you access data through Memory, then you are probably doing something wrong. For a measly 8 extra bytes (use fat-pointer MemoryRef instead of thin-pointer Memory) you could cut out an entire load from the dependency chain. If that load wasn’t L1, then this is huge (L2 latency already is bad).

Unfortunately, FixedSizedArrays does it wrong.

[*] Afaiu MemoryRef is, and needs to be, a built-in type instead of Base, because the designers decided against providing a full-powered “access data through a pointer” primitive, so unsafe_load doesn’t cut the cake (I think this sucks – please rip out all special-casing for memoryref, and instead expose a full-powered Base.Core.unsafe_load_for_powerusers_unstable_please_stay_away that explicitly passes stuff like alignment / atomicity / aliasing info, and can load object references without type-checking on pain of UB).

4 Likes

With Array there is an indirection as you have noted. I.e. an additional pointer to be followed in the MemoryRef struct. In loops this indirection step is usually optimized away by the compiler, so the performance is most often the same as with Memory. However, such optimizations depends on a lot of things, and there might be cases where Memory is faster. Also, allocating a Memory block results in one allocation, whereas allocating an Array results in two (one for the underlying Memory, one for the Array “header” containing sizes and dimension information).

1 Like

Accessing data through Memory and Array has the same number of indirections / pointers to chase. Arrays do not access their data through the underlying Memory (a.ref.mem) but instead through their Memory-ref a.ref.ptr_or_offset, which directly points at the buffer.

If you don’t need to support things like resizing, then you can reduce the number of indirections by using the same “trick” as the julia Base Array implementation, i.e. by storing a MemoryRef.

Somebody should make a package with a convenient struct for that (FixedSizedArrays is not that package – it does not cut down on the number of indirections).

yes, but depending on what you do to the Array and how the program is structured, that pointer, i.e. the a.ref will be reloaded, even though it isn’t strictly necessary. The ptr in Memory is a const.

1 Like

Thanks a lot for the answers!

What I see is that Memory could be used to optimize some things, but it needs to be taken with care. For now, and to not lose the algebra advantages of an Array, I’ll continue using that. I’m not in a very critical situation.

You mean that even though they use Memory, they are just like an array and do have two allocations? One for the MemoryRef struct and another one for the Memory object. But I guess the compiler makes use of the fixed size and one can get some improvement.

So if I really wanted to do it, should I stick to MemoryRef?

1 Like

FixedSizeArray does not store a MemoryRef.

IMO foobar is exaggerating and possibly forgetting about some tradeoffs. In general, independently of the particular issues here or of what anyone says, the answer is implement both, benchmark a representative example, compare, until the decision is obvious for yourself.