[ANN] FixedSizeArrays.jl: What Array probably should have been

This has been one year in the making, but the first release of FixedSizeArrays.jl is finally out and you can install it in your environment with

import Pkg
Pkg.add("FixedSizeArrays")

This package can be installed in Julia v1.10 and following releases, but the improvements mentioned in the sections below are generally available only in v1.11+ thanks to the use of the new Memory type.

@nsajko and @oscar_smith greatly contributed to FixedSizeArrays.jl.

Motivation

The Base.Array type is very convenient and flexible, but in many applications it’s too flexible: it’s a dynamic array type, which allows you to add or remove elements from both ends. However there are many situations (most notably linear algebra applications) where this flexibily is unnecessary and actually comes with a performance cost: it prevents some compiler optimisations, since the size of the array could change, even if it never does in practice. The package FixedSizeArrays.jl provides a mutable array implementation, FixedSizeArray, with the same backend as Base.Array (this is an implementation detail!), but whose size is fixed after construction. This allows the compiler to better reason about optimisations, since the size of an array is known with certainty after construction.

As a taster of the performance of FixedSizeArrays, allocating and filling a FixedSizeVector is (slightly) faster and uses less memory (32 bytes) than an equally sized Base.Vector:

julia> using BenchmarkTools, FixedSizeArrays

julia> for n in (0, 2^0, 2^5, 2^10, 2^15, 2^20, 2^25, 2^30)
           @info "n = $(n)"
           @btime fill!(Vector{Float32}(undef, $(n)), 0.0)
           @btime fill!(FixedSizeVectorDefault{Float32}(undef, $(n)), 0.0)
       end
[ Info: n = 0
  5.028 ns (1 allocation: 32 bytes)
  2.884 ns (0 allocations: 0 bytes)
[ Info: n = 1
  8.280 ns (2 allocations: 64 bytes)
  6.991 ns (1 allocation: 32 bytes)
[ Info: n = 32
  9.534 ns (2 allocations: 192 bytes)
  7.569 ns (1 allocation: 160 bytes)
[ Info: n = 1024
  91.336 ns (3 allocations: 4.07 KiB)
  88.852 ns (2 allocations: 4.04 KiB)
[ Info: n = 32768
  1.578 ÎĽs (3 allocations: 128.07 KiB)
  1.641 ÎĽs (2 allocations: 128.04 KiB)
[ Info: n = 1048576
  55.965 ÎĽs (3 allocations: 4.00 MiB)
  35.493 ÎĽs (2 allocations: 4.00 MiB)
[ Info: n = 33554432
  42.459 ms (3 allocations: 128.00 MiB)
  42.026 ms (2 allocations: 128.00 MiB)
[ Info: n = 1073741824
  1.352 s (3 allocations: 4.00 GiB)
  1.340 s (2 allocations: 4.00 GiB)

Furthermore, to facilitate escape analysis, when possible we throw a custom error exception when accessing out-of-bounds indices: this exception does not capture the entire array, thus enabling further optimisations, such as entirely removing memory allocations for small arrays because the error path could be elided (note: this happens only in Julia v1.12+):

julia> using BenchmarkTools, FixedSizeArrays, Random

julia> function g(T)
           v = T(undef, 250)
           rand!(v)
           return foldl(+, v)
       end
g (generic function with 1 method)

julia> @btime g(Vector{Float64}) setup=(Random.seed!(1))
  223.163 ns (2 allocations: 2.02 KiB)
131.05489829603343

julia> @btime g(FixedSizeVectorDefault{Float64}) setup=(Random.seed!(1))
  221.137 ns (0 allocations: 0 bytes)
131.05489829603343

Comparison with other packages

See the documentation for comparison with other fixed-size arrays in the ecosystem, but in a gist this does not replace StaticArray if what you want is an immutable stack-allocated array with the size in the type domain which you can dispatch on. FixedSizeArray really looks a lot more like Base.Array, but with fixed size.

JuliaCon talk

If you’re going to attend JuliaCon 2025, either in-person or remotely, make sure to come to our talk FixedSizeArrays.jl: What Array probably should have been.

97 Likes

Congrats! :tada: Will need to benchmark this with StaticArrays.MVector and Smallcollections.MutableSmallVector soon.

7 Likes

I’m not familiar with Smallcollections.MutableSmallVector, but a comparison with StaticArrays.MArray is mentioned in the docs: I don’t know about performance, but the use case of FixedSizeArray vs StaticArrays.MArray is slightly different. FixedSizeArray really wants to be a “better” Base.Array for numerical applications, rather than trying to optimise small arrays (FixedSizeArray is based on Memory rather than Tuple, so it should perform equally well at all sizes) or dispatch on array’s size (FixedSizeArray doesn’t allow that), which is what StaticArrays.jl does best.

6 Likes

Possibly clarified in your upcoming talk or somewhere in the devdocs, but do you mean Vector specifically? I had assumed that the “fixed-size” Memory would’ve at least allowed those fixed size optimizations for N>=2 Arrays. If not, how does FixedSizeArray’s inform the compiler that you really won’t reallocate the Memory? It lacks size information in the type parameters that StaticArrays have, so that can’t be it.

Also a couple more favorable points of comparison between FixedSizeArray and MArray:

  1. The default Memory can leverage the isbits-Union optimization; NTuple and anything based on it can’t. To elaborate, a concrete Memory with an element type of a small Union of (up to 256, I think) isbits types can store each element in contiguous blocks of the largest type’s size like a C union, followed directly by contiguous bytes specifying the type of each element (on the same single Memory, unlike a struct of arrays). Concrete tuples are strictly structures of concrete types, so NTuple{n, Union{T, Nothing}} is strictly an abstract type and thus cannot use the isbits-Union optimization, both significantly hurting the performance and data locality of StaticArrays despite decent type inference of indexing.
  2. MArray only supports setindex! for isbitstype elements. The underlying unsafe_store! code wouldn’t work on an abstract NTuple field or dense vectors with abstract element types even with the isbits-Union storage optimization. FixedSizeArray isn’t hacking Tuples into a vector type, so it just forwards the indexing to mem.

FixedSizeArray doesn’t stand in for SArray for now because Memory can’t be reliably stored inlined due to its runtime size and mutability, but that’s a different conversation for a public immutable vector backend or a tuple variant that acts like an anonymous version of structs.

5 Likes

The size is a field of an immutable struct:

This is the same layout as Base.Array, but that’s a mutable struct:

If a function depends only on the size/length of a FixedSizeArray, that can be entirely constant-propagated, see for example FixedSizeArrays.jl Showcase · JuliaArrays/FixedSizeArrays.jl · Discussion #62 · GitHub.

5 Likes

I’m not sure how that could be implemented, though, given that Vector === Array{T, 1} where {T}. That is, Vector and Matrix differ just by that one type parameter, so one can’t simplify Matrix without affecting Vector, I think.

See also this PR:

4 Likes

Ok, I took some time to benchmark StaticArrays.MVector vs FixedSizeVector, using the benchmarks I shared in the first post:

julia> using BenchmarkTools, FixedSizeArrays, StaticArrays
[ Info: Precompiling StaticArraysStatisticsExt [06c1163f-768c-55d8-b451-fddc7c07223a]

julia> for n in (0, 2^0, 2^5, 2^10, 2^15)
           @info "n = $(n)"
           @eval @btime fill!(MVector{$(n),Float32}(undef), 0.0)
           @btime fill!(FixedSizeVectorDefault{Float32}(undef, $(n)), 0.0)
       end
[ Info: n = 0
  3.756 ns (1 allocation: 8 bytes)
  2.854 ns (0 allocations: 0 bytes)
[ Info: n = 1
  3.435 ns (1 allocation: 16 bytes)
  6.389 ns (1 allocation: 32 bytes)
[ Info: n = 32
  4.406 ns (1 allocation: 144 bytes)
  7.078 ns (1 allocation: 160 bytes)
[ Info: n = 1024
  82.811 ns (1 allocation: 4.06 KiB)
  88.521 ns (2 allocations: 4.04 KiB)
[ Info: n = 32768
  2.951 ÎĽs (1 allocation: 128.06 KiB)
  2.004 ÎĽs (2 allocations: 128.04 KiB)

I stopped at 2^15 elements because allocating a 1-milion-element MVector takes pretty much forever (it’s taking more than 30 minutes and using 20 GB of swap on my laptop). This is not particularly surprising: as already mentioned above, StaticArrays is really about small arrays, for longer ones it becomes unusable, it’s not good for a general-purpose container, which is what FixedSizeArrays wants to be.

For the other benchmark:

julia> function g(T)
           v = T(undef, 250)
           rand!(v)
           return foldl(+, v)
       end
g (generic function with 1 method)

julia> function static_g(T)
           v = T(undef)
           rand!(v)
           return foldl(+, v)
       end
static_g (generic function with 1 method)

julia> @btime g(FixedSizeVectorDefault{Float64}) setup=(Random.seed!(1))
  226.910 ns (0 allocations: 0 bytes)
131.05489829603343

julia> @btime static_g(MVector{250, Float64}) setup=(Random.seed!(1))
  368.180 ns (1 allocation: 1.98 KiB)
135.38837641231748

Besides the fact the memory allocation for MVector is not removed, I believe the sum is slower because it’s fully unrolled and can’t be vectorised, which is what FixedSizeVector can do because it’s a “normal” vector which doesn’t try to outsmart the compiler.

Let’s look at SmallCollections.MutableSmallVector now:

julia> using BenchmarkTools, FixedSizeArrays, Random, SmallCollections

julia> for n in (0, 2^0, 2^5, 2^10, 2^15)
           @info "n = $(n)"
           @eval @btime fill!(MutableSmallVector{$(n),Float32}(), 0.0)
           @btime fill!(FixedSizeVectorDefault{Float32}(undef, $(n)), 0.0)
       end
[ Info: n = 0
  3.445 ns (1 allocation: 16 bytes)
  2.864 ns (0 allocations: 0 bytes)
[ Info: n = 1
  3.906 ns (1 allocation: 16 bytes)
  6.550 ns (1 allocation: 32 bytes)
[ Info: n = 32
  4.988 ns (1 allocation: 144 bytes)
  7.499 ns (1 allocation: 160 bytes)
[ Info: n = 1024
  90.897 ns (1 allocation: 4.12 KiB)
  93.841 ns (2 allocations: 4.04 KiB)
[ Info: n = 32768
  1.962 ÎĽs (1 allocation: 128.12 KiB)
  2.066 ÎĽs (2 allocations: 128.04 KiB)

julia> function g(T)
           v = T(undef, 250)
           rand!(v)
           return foldl(+, v)
       end
g (generic function with 1 method)

julia> @btime g(FixedSizeVectorDefault{Float64}) setup=(Random.seed!(1))
  226.787 ns (0 allocations: 0 bytes)
131.05489829603343

julia> @btime g(MutableSmallVector{250, Float64}) setup=(Random.seed!(1))
  383.483 ns (1 allocation: 1.98 KiB)
135.38837641231748

I had again to stop at 2^15 elements because 2^20 is apparently not “small” anymore (I got an error trying to generate that array), and the memory allocation in the g function is again not removed.

I’m sure there will be applications where MVector and MutableSmallVector would perform better, but with these examples I wanted to stress the fact that FixedSizeArrays.jl provides a simple general-purpose fixed-size array which plays nicely with the compiler, instead of trying to trick it.

13 Likes

Very helpful pkg. Any thoughts on how to provide a FixedSizeFixedArray, a fixed size array with immutable elements, and gain performance advantage (saving any checks or tests that mutable elements need to provide)?

1 Like

With MutableFixedVector (the analog of MVector in SmallCollections.jl) it’s quick. Without a warm-up I get:

julia> @time fill!(MutableFixedVector{1048576,Float32}(undef), 0.0)
  0.002612 seconds (1 allocation: 4.000 MiB)

(I’m surprised – when I wrote the package, large vectors were not on my mind.)

This creates an empty vector. You want to have

@eval @btime fill!(MutableSmallVector{$n,Float32}(undef, $n), 0.0)

This is because the length is stored as an Int16.

4 Likes

I’m not quite sure how to do that in a way that helpfully informs the compiler that the data is immutable. The struct FixedSizeArray is already immutable, but the Memory field itself is mutable (you can change the data it points to). I don’t think not defining the getindex methods is enough.

1 Like

I think @oscar_smith wrote somewhere that adding a GenericMemory variant that’s just like Memory but immutable should not be too difficult. Once something like that is added to Julia it should be easy to support in FixedSizeArrays.jl.

6 Likes

Thanks for this, its fantastic to finally have a package for fixed size arrays!

I am curious regarding this point in particular:

Does this mean there is finally a convenient solution to avoiding allocations of temporary arrays?
Specifically: would you recommend using FixedSizeArrays.jl to make temporary allocations as opposed to the default of passing buffer arrays?

I suppose it will always be fastest to use preallocated buffers, but in some cases it can be a pain to do… If this gets rid of the majority of the cose (which is garbage collection, I presume) then this would be a nice alternative.

1 Like

Is there a performance gain in linear algebra ops like matrix multiplies, BLAS, etc. ? Or this is not what you meant here?

2 Likes

Only calling BLAS functions shouldn’t have any performance improvement, and why it should? The performance just depends on the BLAS functions you’re calling, not the container. But having a fixed-size array in your application, which does other operations besides BLAS calls, can give the compiler more optimisation opportunities than with Base.Array.

For example, the code generated for this function

function g(T)
    N = 1024
    A = T(undef, N)
    B = T(undef, N)
    rand!(A)
    rand!(B)

    return A .+ B
end

doesn’t have an error path for, say, T = FixedSizeVectorDefault{Float64}, but it does for T = Vector{Float64}. Being able to let the compiler elide error paths and small memory allocations can have a non-zero impact on your overall application, even if BLAS functions alone aren’t affected. The reference to “linear algebra applications” in my message was because in numerical code you most often only need fixed-size arrays, and dynamic arrays are an overkill which only prevent optimisations.

2 Likes

I’d say try it out! But if you need memory buffers for in-place operations I’m not sure a fixed-size array is a replacement for that.

As mentioned already in the message above, the reason why a fixed-size array is useful is because it can enable optimisations (e.g. eliding memory allocations and error paths) that are harder to achieve with a dynamic array.

2 Likes

Does this mean there is finally a convenient solution to avoiding allocations of temporary arrays?

This (currently) depends heavily on how temporary the temporaries are. Julia’s allocation escape analysis isn’t inter-procedural, so currently, this optimization won’t always remove allocations (but it sometimes will).

5 Likes

Nice package! If I understood correctly, I should expect better performance for creating arrays and mutating their elements, but not for multiplying two matrices or summing their elements?

I thought that in order to get specialization you needed to define

function g(::Type{T}) where {T}

Otherwise, the compiler would treat the input as just a DataType. Is that not the case? Or is that for dispatch, not specialization?

1 Like

This seems off-topic (can a mod split these two messages to another thread maybe?), but anyway:

Sure, but AFAIK the lack of specialization only means there will be no specialization for the LLVM IR. That is, the Julia compiler will still run inference with more precisely specialized argument types, if it feels it. In particular, constant propagation is not disabled (pretty sure). Often it makes sense to avoid specialization, for example in the implementation of typejoin:

I think typejoin is implemented like that because it’s expected to constant fold anyway, so the machine code being too specialized is just wasted compiler effort/unnecessary latency.

There are other differences, too:

Anyway, none of this matters for the quoted message, so of course @giordano went with the simpler option.

Yeah, as long as the matrix multiplication is in-place, the exact same code paths (BLAS, usually) should get used for both Array and FixedSizeArray.

2 Likes