Increase in allocations with Julia v1.11-beta

Is the number of allocations considered an issue? In my small tests here with my packages I’m not seeing performance regressions, but I systematically see the number of allocations increasing, even when compilation is not involved:

1.10.2:

julia> using CellListMap

julia> x, box = CellListMap.xatomic(10^6);

julia> @btime CellList($x, $box)
  244.720 ms (13742 allocations: 139.53 MiB)

1.11:

julia> using CellListMap

julia> x, box = CellListMap.xatomic(10^6);

julia> @btime CellList($x, $box)
  215.821 ms (34333 allocations: 138.90 MiB)

Note the 34k vs. 13k allocations. (performance here increased, though)

4 Likes

Creating an Array now allocates twice due to the Memory change. You can look at the benchmarks to see the Memory pr did cause a big regression in allocations Julia performance data (tealquaternion.camdvr.org).

6 Likes

Okay, being out of the loop, I have to ask, how come? Why does it allocate twice?

1 Like

Could I hope, maybe, for a performance improvement if the arrays I use in this code were changed for Memory? They all fall, I think, into the category where “I don’t need all the Array” features.

I mean, this is a involved code, so making that transition will take some work. Is there a rational reason to have some expectations that changing standard vectors to Memory objects will improve somewhat the performance?

1 Like

The extra allocation is that before Memory, the you would allocate an array, now you allocate an array and it’s corresponding memory. If you don’t need resizability or other methods specific to arrays then Memory will be slightly faster

3 Likes

Basically a Julia Array used to directly have a pointer to memory sometimes even stored inline with the array metadata itself. However, this was becoming a problem because arrays are mutable, and thus we could not perform some compiler optimizations that depended on the memory location being constant. Also much of the underlying machinery was in C.

To alleviate this, a new Memory type has been introduced that is immutable and contains the pointer. Now an Array contains a Memory instance rather than the pointer directly. Also more of the machinery is in Julia.

Apparently, the current implementation now allocates once for the Array itself, then again for the Memory? I’m speculating.

10 Likes

Okay, for Julia 1.11 and the future, one should now use Memory to define Array instead to avoid this in dynamically allocating code?

So it would be Memory{Float64,1}() instead of Array{Float64,1} etc.?

No. Just use Array unless you’re implementing some fundamental type or are directly working with bytes of… memory.

The Julia v1.10 Array also did two allocations, but only in cases where the array was sufficiently large.

8 Likes

Since it has a fixed size and is immutable, is there the perspective of that being stack-allocated with proper escape analysis?

2 Likes

Yes, allowing Array to take advantage of general Julia-language optimizations (both present and future) was definitely a motivating factor. See: Julep: Redesigning `Array` using a new lower-level container type - HackMD for more details.

3 Likes

For the general question posed here — is an increase in allocations considered an issue? — the answer is maybe. Sometimes it reflects an honest-to-goodness regression, but if everything else is better then it may reflect an intentional tradeoff that the compiler is making.

5 Likes

Note your benchmark shows that even though the number of allocations went up, the total allocated memory went down slightly and the speed improved by 10-15%.

So personally I would not worry about the number of allocations per se if my codes get faster :slight_smile: Maybe you could run a more realistic workload and see how the runtime changed.

4 Likes

Yes sure. The benchmark is realistic, though. I have tested other more complete calculations and the results, at least for my packages, seem to be of that sort: greater number of allocations, but similar or slightly improved performance.

4 Likes

There may also be more GC burdens later on, when GC is triggered. Larger benchmarks involving GC runs would be informative.

1 Like

I think more memory allocations are a problem if the memory does not get reused. Too many allocations could result in memory fragmentation.

1 Like

Are you sure? I mean it seems so, but before the allocation of memory bufffer was done in C code, and I’m thinking did it also have two allocations pre-1.11, and C code allocations, i.e. indirect one, here half of them, are just not tracked? I don’t know either way but would like to know for sure.

OP has 250% times more allocations, so even double actual allocations wouldn’t fully explain. But at least the timing shows that some x% more isn’t too bad.

With the Memory change you get:

Range Mean
[-100.00%, +400.00%] +73.94%

more (tracked) allocations, i.e. double sometimes (for allocations, if in fact one extra), for some operations up to 4x (why?!), and for some operations you lose all of them.

What needs to happen is one Libc.malloc call/one (heap) allocation at least, and you get a pointer back. I.e. Memory type is like allocation in C, 1D (always) fixed-length (always), it seems. The C library keeps track of length of each allocation (in bytes, not nr. of objects, it can’t since it does NOT know the type), so you can deallocate with just providing it a pointer, not the length (in bytes), since its length must be stored on the heap. [I think, but I’m not sure, that you must do free with a pointer to the first byte, not in the middle of an allocation, since how else is it going to know where its size is stored?]

So conversly when you allocate an Array, what must happen? You need to store the number of dimensions and size of each, e.g. (n x m x…) somehow, and I think Julia could be more clever about it.

We could have something like (hypothetical for current situation, as of 1.11):

immutable struct Array
  buffer::Memory
  dims::some_dimension_type
end

Wherever you point to or allocate an Array you must store this structure, and note it must use two pointers/two allocations, while the struct is immutable both the buffer is a mutable array (the concept not the type upper-case Array), and the dims separately on the heap.

It would be nice if dims didn’t need its own heap allocation. It’s a tuple of UInt64s, and the reason it must be a pointer to a variable-sized object, is that sometimes it’s just one Int64 in case your Array is only a Vector, but it needs two Int64s for a Matrix (another alias for Array), 3 for 3D Array (which is rare), or 4 for 4D even more rare… etc.

structs can’t be variable-length. The only way to have variable-length in them is with indirection. There with two pointers. [You could also have just one pointer, to the buffer, and it has dims in its header, but again variable-length there, so you likely would have a pointer to a pointer, still two allocations and heap objects.]

My idea and one solution is: the common case is a 2D array. You can have a (UInt64, UInt64) tuple only. When you have a Vector, the latter integer could be a zero. But you have a problem for 3D or more dimensions. It’s rare enough anyway, but it could also be encoded in fixed 128-bit (even 64 bits of space). With a union hack that tuple could alternatively sometimes encode a pointer. You can get away with just one 64-bit pointer, or a bit easier with one plus one Int64.

[E.g. in many cases 2D arrays are square, not always, and for square arrays (or close to) you only need UInt32 for the dimensions size. And if you know square you don’t need to store both n and m since they are the same.]

[I’m not sure it’s worth it, also your code will not work for 1.10 (or 1.6) which will likely become the new LTS, so please don’t do that for packages. Code is not backported as a policy. For this I could see `Memory` being backported, but not really, i.e. as a simple alias for `Array`.]

It doesn’t seem the extra allocations are too bad, so best if you don’t, and then my idea redundant… With large Arrays, the allocations are huge for the Memory buffer, but the extra allocation for the dims are always tiny. Interestingly allocations of large and small cost the same, assuming you do not initialize. The same applies for freeing. But for GC it needs to traverse all pointers, so double the allocations means about double that cost (could be way less depending on if cashing helps), at least in memory if not time.

The GC might never actually free for you. Often you just have some huge array(s) and they stick around. The indirect cost is likely more for many small arrays, then the pointers are comparably larger. Then StaticArrays likely still make sense. And are still better alternative to Memory.

There’a also GenericMemory type (and AtomicMemory), not sure when used, and looking this up also unrelated:

1 Like

Is this so? Array is an abstract type. The concrete type is Array{T,N}. Couldn’t then dims be of type NTuple{N,Int}?

2 Likes

The type only stores the number of dimensions, but not the size of each. So you need to store that somewhere, and NTuple{N,Int} is still variable-length, so I think what I wrote still holds. I simplified a bit, yes Array is not concrete, but I had such a type in mind.

If the struct containing the field dims is immutable, then there may be no need to store it in memory at all, just like a Tuple variable. I don’t see where the heap or any other sort of allocation comes in.

EDIT:

For fixed N (that is, any concrete type Array{T,N}) it’s fixed length.

I meant that Array now reports 2 allocations, I don’t know if there were untracked ones previously.