Increase in allocations with Julia v1.11-beta

lmiq · April 11, 2024, 2:19pm

Is the number of allocations considered an issue? In my small tests here with my packages I’m not seeing performance regressions, but I systematically see the number of allocations increasing, even when compilation is not involved:

1.10.2:

julia> using CellListMap

julia> x, box = CellListMap.xatomic(10^6);

julia> @btime CellList($x, $box)
  244.720 ms (13742 allocations: 139.53 MiB)

1.11:

julia> using CellListMap

julia> x, box = CellListMap.xatomic(10^6);

julia> @btime CellList($x, $box)
  215.821 ms (34333 allocations: 138.90 MiB)

Note the 34k vs. 13k allocations. (performance here increased, though)

Zentrik · April 11, 2024, 2:44pm

Creating an Array now allocates twice due to the Memory change. You can look at the benchmarks to see the Memory pr did cause a big regression in allocations Julia performance data (tealquaternion.camdvr.org).

Ahmed_Salih · April 11, 2024, 2:47pm

Okay, being out of the loop, I have to ask, how come? Why does it allocate twice?

lmiq · April 11, 2024, 2:56pm

Could I hope, maybe, for a performance improvement if the arrays I use in this code were changed for Memory? They all fall, I think, into the category where “I don’t need all the Array” features.

I mean, this is a involved code, so making that transition will take some work. Is there a rational reason to have some expectations that changing standard vectors to Memory objects will improve somewhat the performance?

gbaraldi · April 11, 2024, 3:02pm

The extra allocation is that before Memory, the you would allocate an array, now you allocate an array and it’s corresponding memory. If you don’t need resizability or other methods specific to arrays then Memory will be slightly faster

mkitti · April 11, 2024, 3:12pm

Basically a Julia Array used to directly have a pointer to memory sometimes even stored inline with the array metadata itself. However, this was becoming a problem because arrays are mutable, and thus we could not perform some compiler optimizations that depended on the memory location being constant. Also much of the underlying machinery was in C.

To alleviate this, a new Memory type has been introduced that is immutable and contains the pointer. Now an Array contains a Memory instance rather than the pointer directly. Also more of the machinery is in Julia.

Apparently, the current implementation now allocates once for the Array itself, then again for the Memory? I’m speculating.

Ahmed_Salih · April 11, 2024, 3:17pm

Okay, for Julia 1.11 and the future, one should now use Memory to define Array instead to avoid this in dynamically allocating code?

So it would be Memory{Float64,1}() instead of Array{Float64,1} etc.?

mbauman · April 11, 2024, 3:20pm

No. Just use Array unless you’re implementing some fundamental type or are directly working with bytes of… memory.

The Julia v1.10 Array also did two allocations, but only in cases where the array was sufficiently large.

lmiq · April 11, 2024, 4:10pm

Since it has a fixed size and is immutable, is there the perspective of that being stack-allocated with proper escape analysis?

mbauman · April 11, 2024, 4:58pm

Yes, allowing Array to take advantage of general Julia-language optimizations (both present and future) was definitely a motivating factor. See: Julep: Redesigning `Array` using a new lower-level container type - HackMD for more details.

mbauman · April 11, 2024, 5:10pm

For the general question posed here — is an increase in allocations considered an issue? — the answer is maybe. Sometimes it reflects an honest-to-goodness regression, but if everything else is better then it may reflect an intentional tradeoff that the compiler is making.

abraemer · April 11, 2024, 5:17pm

Note your benchmark shows that even though the number of allocations went up, the total allocated memory went down slightly and the speed improved by 10-15%.

So personally I would not worry about the number of allocations per se if my codes get faster Maybe you could run a more realistic workload and see how the runtime changed.

lmiq · April 11, 2024, 5:44pm

Yes sure. The benchmark is realistic, though. I have tested other more complete calculations and the results, at least for my packages, seem to be of that sort: greater number of allocations, but similar or slightly improved performance.

greatpet · April 13, 2024, 8:37am

There may also be more GC burdens later on, when GC is triggered. Larger benchmarks involving GC runs would be informative.

PetrKryslUCSD · April 13, 2024, 1:56pm

I think more memory allocations are a problem if the memory does not get reused. Too many allocations could result in memory fragmentation.

Palli · April 13, 2024, 2:53pm

Are you sure? I mean it seems so, but before the allocation of memory bufffer was done in C code, and I’m thinking did it also have two allocations pre-1.11, and C code allocations, i.e. indirect one, here half of them, are just not tracked? I don’t know either way but would like to know for sure.

OP has 250% times more allocations, so even double actual allocations wouldn’t fully explain. But at least the timing shows that some x% more isn’t too bad.

With the Memory change you get:

Range	Mean
[-100.00%, +400.00%]	+73.94%

more (tracked) allocations, i.e. double sometimes (for allocations, if in fact one extra), for some operations up to 4x (why?!), and for some operations you lose all of them.

What needs to happen is one Libc.malloc call/one (heap) allocation at least, and you get a pointer back. I.e. Memory type is like allocation in C, 1D (always) fixed-length (always), it seems. The C library keeps track of length of each allocation (in bytes, not nr. of objects, it can’t since it does NOT know the type), so you can deallocate with just providing it a pointer, not the length (in bytes), since its length must be stored on the heap. [I think, but I’m not sure, that you must do free with a pointer to the first byte, not in the middle of an allocation, since how else is it going to know where its size is stored?]

So conversly when you allocate an Array, what must happen? You need to store the number of dimensions and size of each, e.g. (n x m x…) somehow, and I think Julia could be more clever about it.

We could have something like (hypothetical for current situation, as of 1.11):

immutable struct Array
  buffer::Memory
  dims::some_dimension_type
end

Wherever you point to or allocate an Array you must store this structure, and note it must use two pointers/two allocations, while the struct is immutable both the buffer is a mutable array (the concept not the type upper-case Array), and the dims separately on the heap.

It would be nice if dims didn’t need its own heap allocation. It’s a tuple of UInt64s, and the reason it must be a pointer to a variable-sized object, is that sometimes it’s just one Int64 in case your Array is only a Vector, but it needs two Int64s for a Matrix (another alias for Array), 3 for 3D Array (which is rare), or 4 for 4D even more rare… etc.

structs can’t be variable-length. The only way to have variable-length in them is with indirection. There with two pointers. [You could also have just one pointer, to the buffer, and it has dims in its header, but again variable-length there, so you likely would have a pointer to a pointer, still two allocations and heap objects.]

My idea and one solution is: the common case is a 2D array. You can have a (UInt64, UInt64) tuple only. When you have a Vector, the latter integer could be a zero. But you have a problem for 3D or more dimensions. It’s rare enough anyway, but it could also be encoded in fixed 128-bit (even 64 bits of space). With a union hack that tuple could alternatively sometimes encode a pointer. You can get away with just one 64-bit pointer, or a bit easier with one plus one Int64.

[E.g. in many cases 2D arrays are square, not always, and for square arrays (or close to) you only need UInt32 for the dimensions size. And if you know square you don’t need to store both n and m since they are the same.]

[I’m not sure it’s worth it, also your code will not work for 1.10 (or 1.6) which will likely become the new LTS, so please don’t do that for packages. Code is not backported as a policy. For this I could see `Memory` being backported, but not really, i.e. as a simple alias for `Array`.]

It doesn’t seem the extra allocations are too bad, so best if you don’t, and then my idea redundant… With large Arrays, the allocations are huge for the Memory buffer, but the extra allocation for the dims are always tiny. Interestingly allocations of large and small cost the same, assuming you do not initialize. The same applies for freeing. But for GC it needs to traverse all pointers, so double the allocations means about double that cost (could be way less depending on if cashing helps), at least in memory if not time.

The GC might never actually free for you. Often you just have some huge array(s) and they stick around. The indirect cost is likely more for many small arrays, then the pointers are comparably larger. Then StaticArrays likely still make sense. And are still better alternative to Memory.

There’a also GenericMemory type (and AtomicMemory), not sure when used, and looking this up also unrelated:

github.com

JuliaLang/julia/blob/396abe4fdc8615e162462910bfc16729be7c95ed/stdlib/FileWatching/test/pidfile.jl#L12-L12


      
          struct MemoryFile <: Base.AbstractPipe

matthias314 · April 13, 2024, 3:58pm

Is this so? Array is an abstract type. The concrete type is Array{T,N}. Couldn’t then dims be of type NTuple{N,Int}?

Palli · April 13, 2024, 7:29pm

The type only stores the number of dimensions, but not the size of each. So you need to store that somewhere, and NTuple{N,Int} is still variable-length, so I think what I wrote still holds. I simplified a bit, yes Array is not concrete, but I had such a type in mind.

matthias314 · April 13, 2024, 7:42pm

If the struct containing the field dims is immutable, then there may be no need to store it in memory at all, just like a Tuple variable. I don’t see where the heap or any other sort of allocation comes in.

EDIT:

For fixed N (that is, any concrete type Array{T,N}) it’s fixed length.

Zentrik · April 14, 2024, 9:54pm

I meant that Array now reports 2 allocations, I don’t know if there were untracked ones previously.

Topic		Replies	Views
Memory allocations when returning vectors General Usage array , memory-allocation	15	1546	June 6, 2018
Aligned Julia Arrays? Performance	37	3213	May 21, 2018
Memory allocations are not very understandable Performance	26	2026	December 12, 2017
Fortran vs Julia stack allocated arrays Performance array , memory-allocation	18	6600	November 29, 2018
[ANN] ArrayAllocators.jl: Integrating calloc and aligned memory into Array construction Package Announcements package , announcement , array , zeros	14	2033	May 5, 2022

Increase in allocations with Julia v1.11-beta

Related topics