Are you sure? I mean it seems so, but before the allocation of memory bufffer was done in C code, and I’m thinking did it also have two allocations pre-1.11, and C code allocations, i.e. indirect one, here half of them, are just not tracked? I don’t know either way but would like to know for sure.
OP has 250% times more allocations, so even double actual allocations wouldn’t fully explain. But at least the timing shows that some x% more isn’t too bad.
With the Memory change you get:
Range |
Mean |
[-100.00%, +400.00%] |
+73.94% |
more (tracked) allocations, i.e. double sometimes (for allocations, if in fact one extra), for some operations up to 4x (why?!), and for some operations you lose all of them.
What needs to happen is one Libc.malloc call/one (heap) allocation at least, and you get a pointer back. I.e. Memory
type is like allocation in C, 1D (always) fixed-length (always), it seems. The C library keeps track of length of each allocation (in bytes, not nr. of objects, it can’t since it does NOT know the type), so you can deallocate with just providing it a pointer, not the length (in bytes), since its length must be stored on the heap. [I think, but I’m not sure, that you must do free with a pointer to the first byte, not in the middle of an allocation, since how else is it going to know where its size is stored?]
So conversly when you allocate an Array
, what must happen? You need to store the number of dimensions and size of each, e.g. (n x m x…) somehow, and I think Julia could be more clever about it.
We could have something like (hypothetical for current situation, as of 1.11):
immutable struct Array
buffer::Memory
dims::some_dimension_type
end
Wherever you point to or allocate an Array
you must store this structure, and note it must use two pointers/two allocations, while the struct is immutable both the buffer
is a mutable array (the concept not the type upper-case Array), and the dims
separately on the heap.
It would be nice if dims
didn’t need its own heap allocation. It’s a tuple of UInt64
s, and the reason it must be a pointer to a variable-sized object, is that sometimes it’s just one Int64 in case your Array
is only a Vector
, but it needs two Int64
s for a Matrix
(another alias for Array), 3 for 3D Array (which is rare), or 4 for 4D even more rare… etc.
struct
s can’t be variable-length. The only way to have variable-length in them is with indirection. There with two pointers. [You could also have just one pointer, to the buffer
, and it has dims
in its header, but again variable-length there, so you likely would have a pointer to a pointer, still two allocations and heap objects.]
My idea and one solution is: the common case is a 2D array. You can have a (UInt64, UInt64) tuple only. When you have a Vector, the latter integer could be a zero. But you have a problem for 3D or more dimensions. It’s rare enough anyway, but it could also be encoded in fixed 128-bit (even 64 bits of space). With a union hack that tuple could alternatively sometimes encode a pointer. You can get away with just one 64-bit pointer, or a bit easier with one plus one Int64.
[E.g. in many cases 2D arrays are square, not always, and for square arrays (or close to) you only need UInt32 for the dimensions size. And if you know square you don’t need to store both n and m since they are the same.]
[I’m not sure it’s worth it, also your code will not work for 1.10 (or 1.6) which will likely become the new LTS, so please don’t do that for packages. Code is not backported as a policy. For this I could see `Memory` being backported, but not really, i.e. as a simple alias for `Array`.]
It doesn’t seem the extra allocations are too bad, so best if you don’t, and then my idea redundant… With large Arrays, the allocations are huge for the Memory
buffer, but the extra allocation for the dims are always tiny. Interestingly allocations of large and small cost the same, assuming you do not initialize. The same applies for freeing. But for GC it needs to traverse all pointers, so double the allocations means about double that cost (could be way less depending on if cashing helps), at least in memory if not time.
The GC might never actually free for you. Often you just have some huge array(s) and they stick around. The indirect cost is likely more for many small arrays, then the pointers are comparably larger. Then StaticArray
s likely still make sense. And are still better alternative to Memory.
There’a also GenericMemory type (and AtomicMemory), not sure when used, and looking this up also unrelated: