ArrayAllocators.jl
I am happy to announce ArrayAllocators.jl, a registered package that provides new mechanisms of array allocation. ArrayAllocators.jl provides new values that can take the place of undef
when constructing arrays via the Array constructor: Array{T}(allocator, n, m, ...)
.
Quick Start Example
For example, you can now do the following.
using ArrayAllocators
malloced_array = Array{Int}(malloc, 16, 32, 8)
calloced_array = Array{Int}(calloc, 1024)
aligned_array = Array{Int}(MemAlign(2^16), 1024, 2048)
Faster Zeros with Calloc
A few months ago I came across a few circumstances where the implementation of NumPy’s zeros
seemed faster than Julia’s zeros
in several microbenchmarks. Investigating this unveiled that NumPy uses the C standard function calloc
which allocates memory and guarantees that it will be initialized to 0
. calloc
as exposed to Julia via Libc.calloc
allows for fast array allocation and lazy initialization. However, this may result in slower performance when the data is eventually accessed.
ArrayAllocators.jl integrates calloc
into the Array
constructor as follows:
julia> using ArrayAllocators
julia> @time A = Array{UInt8}(undef, 1024^3);
0.001379 seconds (2 allocations: 1.000 GiB, 98.36% gc time)
julia> @time Z = zeros(UInt8, 1024^3);
0.463365 seconds (2 allocations: 1.000 GiB, 1.25% gc time)
julia> @time C = Array{UInt8}(calloc, 1024^3);
0.000026 seconds (5 allocations: 1.000 GiB)
julia> @time sum(Z)
0.226251 seconds
0x0000000000000000
julia> @time sum(C)
0.312937 seconds
0x0000000000000000
julia> @time sum(C)
0.171955 seconds
0x0000000000000000
julia> isequal(Z, C)
true
For a detailed discussion, see the earlier thread.
Aligned Memory
Aligning memory can allow certain vectorized operaitons to be accelerated. Julia typically allocates memory on 16-byte or 64-byte boundaries depending on the size of the array.
julia> A = Array{UInt8}(undef, 1024^2);
julia> reinterpret(UInt, pointer(A)) % 64
0x0000000000000000
@stevegj has earlier provided a mechanism to use posix_memalign
to allocate aligned memory. posix_memalign
allows alignment along 16-byte boundaries or any larger power of 2. Thanks to @carstenbauer for bringing this my attention.
On Windows, I have implemented aligned memory using VirtualAlloc2
, but this requires alignment on 64 kilobyte boundaries or greater. I am considering adding a version based on _aligned_malloc
which would provide more granularity, but use of the C-runtime on Windows can get complicated.
With ArrayAllocators.jl, you can create aligned memory and explicitly specify the allocation via the following mechanism:
julia> using ArrayAllocators
julia> alignment = 2^16
65536
julia> memalign = MemAlign(alignment)
ArrayAllocators.POSIX.PosixMemAlign{ArrayAllocators.ByteCalculators.CheckedMulByteCalculator}(65536)
julia> aligned_array = Array{UInt8}(memalign, 1024^3);
julia> pointer(aligned_array)
Ptr{UInt8} @0x00007fbadd9c0000
julia> reinterpret(UInt, pointer(aligned_array)) % alignment
0x0000000000000000
The underlying platform specific versions of MemAlign
can also be accessed.
julia> using ArrayAllocators.POSIX
julia> posix_memalign = PosixMemAlign(32)
PosixMemAlign{ArrayAllocators.ByteCalculators.CheckedMulByteCalculator}(32)
julia> posix_aligned = Array{Int}(posix_memalign, 1024, 1024);
julia> pointer(posix_aligned)
Ptr{Int64} @0x000000000383f780
julia> reinterpret(UInt, pointer(posix_aligned)) % 32
0x0000000000000000
Overflow Detection
Integer overflow can occur when calculating the number of bytes that are needed to allocate for an array leading to erroneous results.
julia> D = typemax(Int)
9223372036854775807
julia> D * (D-2) * 300
900
900
ArrayAllocators.jl defaults to using Base.checked_mul
to check for integer overflow via ArrayAllocators.ByteCalculators.CheckedMulByteCalculator
aliased as ArrayAllocators.DefaultByteCalculator
.
julia> using ArrayAllocators
julia> Array{Int16}(calloc, DĂ·2, 4)
ERROR: OverflowError: The product of the dimensions results in integer overflow.
Stacktrace:
...
julia> Array{UInt8}(calloc, D, D-2, 300)
ERROR: OverflowError: The product of the dimensions results in integer overflow.
Stacktrace:
...
julia> Array{Int}(calloc, DĂ·2)
ERROR: OverflowError: The product of array length and element size will cause an overflow.
...
The AbstractByteCalculator
used is a parameter of AbstractAllocator
. Alternative ways of calculating the number of bytes can be used or the overflow detection can be unsafely disabled:
julia> using ArrayAllocators, ArrayAllocators.ByteCalculators
julia> unsafe_calloc = CallocAllocator{UnsafeByteCalculator}()
CallocAllocator{UnsafeByteCalculator}()
julia> bad_array = Array{UInt8}(unsafe_calloc, D, D-2, 300);
julia> size(bad_array)
(9223372036854775807, 9223372036854775805, 300)
julia> length(bad_array)
900
julia> bad_array
9223372036854775807Ă—9223372036854775805Ă—300 Array{UInt8, 3}:
[:, :, 1] =
signal (11): Segmentation fault
...
Allocating non-bitstypes
All allocators can allocate bitstypes. Non-bitstypes can only be allocated by allocators that initialize their arrays to 0
such as calloc
.
julia> using ArrayAllocators
julia> mutable struct NotaBitstype
somefield
end
julia> isbitstype(NotaBitstype)
false
julia> ArrayAllocators.iszeroinit(typeof(malloc))
false
julia> Array{NotaBitstype}(malloc, 16);
ERROR: ArgumentError: NotaBitstype is not a bitstype
Stacktrace:
...
julia> ArrayAllocators.iszeroinit(typeof(calloc))
true
julia> Array{NotaBitstype}(calloc, 16)
16-element Vector{NotaBitstype}:
#undef
#undef
#undef
#undef
#undef
#undef
#undef
#undef
#undef
#undef
#undef
#undef
#undef
#undef
#undef
#undef
Other Allocators
ArrayAllocators.jl also implements the malloc
singleton of MallocAllocator
and UndefAllocator
, which wraps around undef
.
malloc
may allow the use of alternative memory allocators as discussed below.
Generally, allocators that do not require additional dependencies can be added to ArrayAllocators.jl. This includes other mecanisms in libc
or native to specific operating systems. Feel free to open an issue or pull request for your favorite allocator.
Extensions
Allocators that do require additional dependencies to be cross platform should go into dependent packages. For example, NumaAllocators.jl is a package that implements allocators for Non-Uniform Memory Access that is currently being registered. I will post a separate package announcement for this when it is registered.
Extensions can subtype AbstractArrayAllocator which provides abstract functions to make array construction easier. The interface currently consists of allocate
, iszeroinit
, and extending Base.unsafe_wrap
.
Summary
ArrayAllocators.jl overloads the the Array
constructor to provide additional allocation options. This allows for fine grained control of memory allocation for arrays. This allows easy access to specialized memory allocation procedures that can impact performance.