FixedSizeVectors with cache-aligned memory?

FixedSizeVectors offer a first, more performant and cleaner approach to working with (generally, numeric) vectors where, once formed, the size of the vector does not change.
AlignedAllocs provides cache-aligned memory as a Vector{T} where T is a given bitstype:

using FixedSizeArrays
using AlignedAllocs: memalign_clear, alignment  
   # memalign_clear zeros the memory block
   # also exports `memalign` for a memory block of type T that is uninitialized.
   # alignment(xs::AbstractVector) provides the byte alignment of the initial element of xs
   #                                                 and so the byte alignment of the vector

element_type = Float32
element_count = 8

cache_aligned_mem = memalign_clear(element_type, element_count)
fixed_length_mem     = FixedSizeVectorDefault{element_type}(undef, element_count)

alignment(cache_aligned_mem) >= 64
alignment(fixed_length_mem)     >= 32  # for vectors with fewer than 2^9 elements
                                                                # on my machine 

Working with cache-line-size aligned vectors is more perfomant generally and prevents the possible time hijacking of cache-line crossing data where it need not cross cache lines.
Most current pcs have a cache-line-size of 64 bytes (there are exceptions).
When using SIMD operations, aligning to 256 or 512 bytes (processor dependant) is very important to the performance of SIMD driven algorithms.

cache_aligned_mem = memalign_clear(element_type, element_count, 256)
alignment(cache_aligned_mem) >= 256

It would be great to obtain a FixedLengthVector backed by cache-aligned memory.

3 Likes

It has nothing to do with the implementation of AlignedAllocs, the FixedSizeArray type just doesn’t (yet) support directly wrapping an input DenseVector with a possibly different size. It’s just some uninitialized instantiation and collecting input AbstractArrays into new parent DenseVectors (currently defaulting to Memory), which is a bit strange because the latter seems to be what FixedSizeArrayDefault is intended for.

julia> methods(FixedSizeArray.body.body.body)
# 4 methods for type constructor:
 [1] (::Type{FSA})(::UndefInitializer, size::Tuple{Vararg{Integer}}) where {T, FSA<:(FixedSizeArray{T, N, Mem} where {N, Mem<:DenseVector{T}})}
     @ C:\Users\benm1\.julia\packages\FixedSizeArrays\22QFl\src\FixedSizeArray.jl:138
 [2] (::Type{FSA})(::UndefInitializer, size::Integer...) where {T, FSA<:(FixedSizeArray{T, N, Mem} where {N, Mem<:DenseVector{T}})}
     @ C:\Users\benm1\.julia\packages\FixedSizeArrays\22QFl\src\FixedSizeArray.jl:141
 [3] (::Type{FSA})(src::AbstractArray) where {N, FSA<:(FixedSizeArray{var"#s3", N, Mem} where {var"#s3", Mem<:DenseVector{var"#s3"}})}
     @ C:\Users\benm1\.julia\packages\FixedSizeArrays\22QFl\src\FixedSizeArray.jl:336
 [4] (::Type{FSA})(src::AbstractArray) where FSA<:FixedSizeArray
     @ C:\Users\benm1\.julia\packages\FixedSizeArrays\22QFl\src\FixedSizeArray.jl:343

We can verify with a minimal example type:

julia> begin
       # might be an incorrect subtype, but it'll serve the example
       mutable struct OneElement{T} <: DenseVector{T} value::T end
       Base.size(::OneElement) = (1,)
       Base.getindex(A::OneElement, i::Int) = if i == 1 A.value else throw(BoundsError(A, i)) end
       Base.setindex!(A::OneElement, v, i::Int) = if i == 1 A.value = v else throw(BoundsError(A, i)) end
       using FixedSizeArrays
       end

julia> FixedSizeArray(OneElement(2f0)) # input collected into parent Memory
1-element FixedSizeArray{Float32, 1, Memory{Float32}}:
 2.0

julia> FixedSizeVector(OneElement(2f0)) # of course aliases also do this
1-element FixedSizeArray{Float32, 1, Memory{Float32}}:
 2.0

There is only a private inner constructor(?) that can do this, but it’s not ideal because there isn’t an option to compute the size directly from the DenseVector and broadcasting fails at a with_stripped_type_parameters_unchecked call that is only implemented for Vector and GenericMemory.

julia> FixedSizeArrays.new_fixed_size_array(OneElement(2f0), (1,)) # no idea why the alias is printed
1-element FixedSizeVector{Float32, OneElement{Float32}}:
 2.0

julia> 1 .+ FixedSizeArrays.new_fixed_size_array(OneElement(2f0), (1,1))
ERROR: MethodError: no method matching with_stripped_type_parameters_unchecked(::FixedSizeArrays.TypeParametersElementType, ::Type{OneElement{Float32}})

One relevant feature request to Julia:

Assuming wrap were to get publicized without any changes to the interfaces, FixedSizeArrays.jl could add its method to wrap, and what you want would be possible for Julia v1.11 and up. On the other hand, if the wrap interface changes before wrap gets publicized, I suppose FixedSizeArrays.jl could provide its own function to enable wrapping functionality.

It’s not ideally documented, but you can do e.g. vv=[3]; FixedSizeArrays.new_fixed_size_array(vv.ref.mem, (3,)). As my example demonstrates, be careful, for some weird reason FixedSizeArrays decided not to check sizes.

(so you posix_memalign or analogue to allocate your stuff, then unsafe_wrap(Memory...) to get a Memory instance, and then FixedSizeArrays.new_fixed_size_array(mem, size)).

That works – thank you much.


julia> using FixedSizeArrays, AlignedAllocs

julia> n = 8;

julia> v = memalign_clear(Float32, n, 128);

julia> fsa = FixedSizeArrays.new_fixed_size_array(v, (length(v),));

julia> length(fsa), alignment(fsa), typeof(fsa)
(8, 128, FixedSizeVector{Float32, Vector{Float32}})

That’s not what you want! This wraps a FixedSizeVector around a Vector, but you want to wrap it around a Memory (otherwise the entire dance is a nonsensical waste of cycles). Unfortunately, AlignedAllocs does not return a Memory object – afaiu it predates the shift from C-based array to C-based Memory with julia-based Array wrapper. You can extract the Memory instance also on new julia versions, but AlignedAllocs attaches the finalizer to the Array object, not the Memory object, so you have the wrong lifetimes. (…only on windows, where a different library call than free is required, making this extra-fun to test and debug)

TLDR: It doesn’t fit. But since both packages are pretty trivial in complexity, you can just copy & paste the code from either AlignedAllocs or from FixedSizeArrays and adapt it, unless you need to interact with other packages that dispatch on their types.

This is what I would recommend anyways, since both packages are smaller than the “minimum viable package”, i.e. copy-pasting the code is less of a software supply chain hassle than taking them as dependency.

(never forget NPM leftpad. You can and should take on dependencies, but there is a lower limit on the reasonable size that depends on the trade-off between “copy-paste / re-implement these N lines” and “take on an additional dependency requiring a certain amount of vetting”. Where you put the cutoff depends on both the amounts of vetting your org / honor is required to do, and on your tolerance for extra code. The entire world laughed at node developers because they demonstrated a complexity threshold of literal leftpad, which says a lot about either what JS people consider “complex code”, or about their diligence in vetting their transitive dependencies)

PS. Since you @JeffreySarnoff maintain AlignedAllocs, you can also simply fix the bug and attach the finalizer to the correct object, i.e. the Memory instance. I think this is a use-after-free bug anyways, but I am hard pressed to come up with realistic scenarios where this leads to a security compromise.

In case it’s not clear, new_fixed_size_array is not public. My previous message discusses some hypothetical paths to getting a public interface for wrapping other dense vectors.

I’m no expert in C or the Julia guts, but on post-Memory Julia versions, I think unsafe_wrap(Array, ptr, dims; own=true) attaches the finalizer to the underlying Memory instance created, not the returned Array instance, so I don’t think there are any lifetime issues/bugs here. See julia/src/array.c at 760b2e5b7396f9cc0da5efce0cadd5d1974c4069 · JuliaLang/julia · GitHub (specifically, own_buffer is just passed through to jl_ptr_to_genericmemory and not otherwise used in jl_ptr_to_array{_1d}).

That said, I agree it seems sensible for AlignedAllocs.jl to return Memory rather than Vector on Julia >= 1.11.

The problem is only on windows AlignedAllocs.jl/src/AlignedAllocs.jl at eb61320f498b8d6167229d4faea6e4e60583f1c1 · JeffreySarnoff/AlignedAllocs.jl · GitHub

The memory is allocated with some msvcrt _aligned_alloc, and that must be released via _aligned_free and cannot be released via standard free. Hence, own=false and a finalizer is needed, and that is attached to the Array which is a possible UAF bug in julia after the introduction of Memory.

Ah, I see, thanks! I didn’t properly grok your parenthetical and didn’t notice in the code that the special case for Windows also has a special finalizer.

Arrgh. Is the take-away that “I cannot get there from here”?

You can easily get aligned FixedSizeArrays.jl by using non-public interfaces to wrap it around a Memory. It would be reasonable to not care whether the interface is public, or to make a PR that makes the interface public, or it would be reasonable to fork/vendor FixedSizeArrays.jl.

If you want to do the allocation via AlignedAllocs, you can do that (extract the Memory from the Array via array.ref.mem).

Just be aware that currently there is a bug with object lifetimes / finalizers in AlignedAllocs.jl on windows. It is reasonable to fix that bug upstream (it’s your package!), or to not care about windows for your use-case, or to fork/vendor AlignedAllocs.jl (ok, that would be unreasonable for you specifically, but it’s what I’d consider if upstream didn’t like my proposed fix of attaching the finalizer to the Memory in newish julia versions).

I am happy to add support within AlignedAllocs.jl.
What bug?

Here

that must be something like

finalizer(vec.ref.mem) do m
GC.@preserve ccall((:_aligned_free, "msvcrt"), Cvoid, (Ptr{T},), pointer(m))
end

Arrays don’t have property access, so vec.ref doesn’t work. You must use getfield(vec, :ref). I.e.:

    finalizer(getfield(vec, :ref).mem) do m
        @GC.preserve ptr ccall((:_aligned_free, "msvcrt"), Cvoid, (Ptr{T},), pointer(m))
    end

@JeffreySarnoff keep in mind this is only for Julia 1.11+. You actually need something like

    @static if VERSION < v"1.11-"
        finalizer(vec) do v
            @GC.preserve ptr ccall((:_aligned_free, "msvcrt"), Cvoid, (Ptr{T},), pointer(v))
        end
    else
        finalizer(getfield(vec, :ref).mem) do m
            @GC.preserve ptr ccall((:_aligned_free, "msvcrt"), Cvoid, (Ptr{T},), pointer(m))
        end
    end

@static if is resolved at parse time, so this doesn’t create a runtime branch.

2 Likes

Thank you.

Thanks for keeping up LTS Julia!

1 Like