Increase in allocations with Julia v1.11-beta

they were previously tracked. before, the data was in the same allocation as the header which is sometimes better and sometimes worse (if the array is resized, it can waste a lot of memory)

4 Likes

Only Array{T,1} can actually be resized, so Array{T,N}, N == 0 || N >= 2 also involving a mutable struct instead of plain struct seems unnecessary.

Would perhaps be nice if we had a distinction between non-resizeable mathematical arrays, and resizable linear collections, but that would be very breaking and thus is not happening.

Would be cool if there were some way to place constraints based on parameters:

struct MathArray{T,N} <: DenseArray{T,N}
    data::Memory{T}
    size::NTuple{N,Int}
end
mutable struct LinearCollection{T} <: DenseVector{T}
    data::Memory{T}
    length::Int
end

const Array{T,N} = # somehow make it `LinearCollection{T}` when `N == 1`, and `MathArray{T,N}` otherwise. 
7 Likes

On the other hand, we could permit resizing other dimensions too, as we currently pay all of that cost to be possible (seems to be very little actually except for the allocation itself) but currently gives none of that flexibility of that. And except for Matrix, it isn’t generally a beneficial idea to allocate it inlined anyways, as the struct starts to get quite large.

2 Likes

FWIW, this is the result of a larger calculation:

1.10.2:

49.666810 seconds (105.84 k allocations: 872.115 MiB)

1.11:

47.559673 seconds (229.30 k allocations: 962.446 MiB, 0.58% gc time, 11 lock conflicts)

3 Likes

Did you use multi-threaded garbage collection? I would suggest to launch Julia with

julia --gcthreads=7,1

if you have 8 fast physical cores…

Just to be clear: Memory has fixed size but is not immutable.

2 Likes

I don´t have (almost) GC in those calculations. (they are parallel - 10 threads there).

In any case, this is the result starting Julia with julia --gcthreads=9,1. Here 5 successive runs (discarding the first one, which has ~5% compilation):

1.10.2:

47.999261 seconds (105.75 k allocations: 872.109 MiB)
46.349978 seconds (105.78 k allocations: 872.106 MiB, 0.14% gc time)
45.914514 seconds (105.70 k allocations: 872.104 MiB, 0.09% gc time)
49.659957 seconds (105.89 k allocations: 872.119 MiB)
49.863240 seconds (105.85 k allocations: 872.119 MiB, 0.01% gc time)

1.11:

48.109827 seconds (229.17 k allocations: 962.447 MiB, 0.56% gc time, 13 lock conflicts)
51.137695 seconds (229.35 k allocations: 962.458 MiB, 0.55% gc time, 13 lock conflicts)
47.078975 seconds (229.14 k allocations: 962.445 MiB, 0.81% gc time, 12 lock conflicts)
52.507213 seconds (229.32 k allocations: 962.457 MiB, 0.48% gc time, 14 lock conflicts)
47.022936 seconds (229.06 k allocations: 962.440 MiB, 0.85% gc time, 10 lock conflicts)

I would say that the differences are not really meanginful, probably.

ps: I don’t like the “conflicts” word there, seems to imply there’s a bug. Maybe “lock events”?

The mutable static arrays of StaticArrays aren’t either, but they have the size in the type signature, which Memory does not.

By the way, is that because it would explode compilation times, as with static arrays?

Yes, putting the size in the type would cause the compiler to try to specialize on every size. Maybe worthwhile for a few small sizes but generally a waste of effort.

2 Likes

In C++ array size is part of the type, afaiu. Don’t the C++ compilers specialize on size?

FWIW - part 2. A very comprehensive benchmark of CellListMap.jl shows the results below. Most generally, small but consistent improvements in the performence of 1.11-beta1 relative to 1.10.2 (blue lines slightly below red one). Thus, for my purposes, this will be an improvement :-).

(the benchmark consists in computing the histogram of pairwise velocities of “galaxies”, with true density, in the case of “constant density”: GitHub - lmiq/PairVelocities).

9 Likes

std::array is statically sized (and stack allocated).

std::vector<T> is closer to Vector{T} in Julia. It is dynamically sized, and consists of two parts:

  • the data pointer containing the Ts. This is generally heap allocated (technically there is a second type parameter with a default arg, i.e. it is std::vector<T,A=std::allocator<T>>, so it’s really up to that allocator).
  • the header with that pointer, plus a size and capacity field. This is generally stack allocated.

In Julia, the capacity is with the data as part of Memory{T}, and the header gets heap allocated when it escapes (but shouldn’t otherwise).

To be pedantic, the boost::container:::devector is actually closer to Julia’s Vector{T}, because the two of them support fast pushing and poping on both ends, while std::vector<T> is only fast at the back.

Plenty of libraries provide their own types, including optionally mixing dynamic and static dimensions, like Eigen.
There are also dynamic vector types with some extra stack space they can use to avoid a heap allocation if the overall size is smaller. E.g., boost::container::small_vector.
There are also fixed-capacity vectors (in boost, plus a proposal for C++26), in case you know the maximum possible size (and that size is small enough that it’s worth always stack allocating that much instead of heap allocating however much you actually need).

3 Likes

But do you know if the C++ compilers ‘specialize’ on the size of arrays similarly to how the Julia compiler would?

Yes, they specialize. C++ compilers monomorphize, compiling separate code for every set of types, just like Julia.

Here is an example:

Note that sum inlines into each of the foos.
Each call to sum with different argument types compiles different code.
Note this works for an arbitrary number of arguments, I just did sum to keep it simple.

One of StaticArrays (and Tuple’s) problems are that a lot of code dealing with them are fully unrolled in Julia, which hurts compile times of these individual specializations somewhat unnecessarily.

Still, this is one of the reasons C++'s compile times aren’t good.

One cool thing about the C++ assembly is that for the length-2 arrays, they were passed in registers instead of the stack (note the lack of loading from rsp, instead immediately using registers).
Arrays of length 3 or more go back to using the stack, though.

7 Likes

It just crossed my mind that an array type that doesn’t depend on Array, (thus avoiding the extra indirection and allocation costs) should be very easy to define while retaining almost all of the array functionality (because most of the array functionality is defined for abstract types). The package is here, but it can’t be registered until Julia v1.11 gets released: Neven Sajko / FixedSizeColumnMajorDenseArrays.jl · GitLab

EDIT: the size here is fixed at run time, unlike with @giordano’s FixedSizeArrays.jl where the size seems to be stored in the type domain (thus it’s a compile time constant) EDIT2: actually, the size is stored as a value in both packages, it’s just that the README there is perhaps misleading

EDIT3: just use Giordano’s package instead, I’ll probably delete my repo

2 Likes

see also GitHub - giordano/FixedSizeArrays.jl

3 Likes

Very nice!

But I would suggest to use a shorter name, like FastFixedSizeArrays or FixedSizeCMDArrays, otherwise it becomes a pain to type…

1 Like

No, they’re runtime values stored as an NTuple{N,Int}: FixedSizeArrays.jl/src/FixedSizeArrays.jl at f8b71ffcf1334c60d2cc7279cb34010964dff387 · giordano/FixedSizeArrays.jl · GitHub

2 Likes

Yeah, I got confused by the README there. Already edited my comment.

I found it a bit strange that such a rather significant change is not even mentioned in the release notes, and also seems to have very little documentation.

Is there for example a canonical way to get the Memory object associated with an Array instance?