[ANN] ArrayAllocators.jl v0.3 composes with OffsetArrays.jl v1.12.1+ for faster zeros with offset indexing

ArrrayAllocators.jl v0.3.0+ and OffsetArrays.jl v1.12.1+ now compose. This means that you can now construct an OffsetArray of 0s using the following equivalent lines of code.

julia> using OffsetArrays, ArrayAllocators

julia> OA = OffsetArray{Int}(undef, -512:511); fill!(OA, 0);

julia> OA_calloc = OffsetArray{Int}(calloc, -512:511);

julia> OA[-512] == OA_calloc[-512]
true

julia> isequal(OA_calloc, OA)
true

In some cases, this can yield a reduction in the initialization time of the OffsetArray as illustrated below.

julia> using BenchmarkTools

julia> @btime begin
           OA2 = OffsetArray{Int}(undef, -512:511, -512:511)
           fill!(OA2, 0)
       end;
  1.483 ms (2 allocations: 8.00 MiB)

julia> @btime begin
           OA2 = OffsetArray{Int}(calloc, -512:511, -512:511)
       end;
  1.089 ms (8 allocations: 8.00 MiB)

julia> @btime begin
           fill!(OA2, 1)
       end setup = (OA2 = OffsetArray{Int}(undef, -512:511, -512:511));
  1.521 ms (0 allocations: 0 bytes)

julia> @btime begin
           fill!(OA2_calloc, 1)
       end setup = (OA2_calloc = OffsetArray{Int}(calloc, -512:511, -512:511));
  1.312 ms (0 allocations: 0 bytes)

I encourage you to vigorously benchmark your application when using this as the potential optimization may be operating system and hardware dependent.

Also with the subpackage NumaAllocators v0.2.0, you may now directly allocate an OffsetArray on a specific non-uniform memory architecture (NUMA) node.

julia> using OffsetArrays, NumaAllocators

julia> OA_numa_0 = OffsetArray{Int}(numa(0), -512:512, -9:9);

The composition was enabled by implementing Base.unsafe_wrap for OffsetArray. Neither package depends directly upon the other, so ensure that both packages are at the required version or later.

julia> ptr = Libc.calloc(1024, 8)
Ptr{Nothing} @0x0000000004e3ae10

julia> OA3 = unsafe_wrap(OffsetArray, Ptr{Int}(ptr), 1024);

If you missed the original ArrayAllocators.jl announcement, see below.

4 Likes

Another feature of ArrayAllocators.jl v0.3 is an alternate implementation of zeros via calloc. Note that this ArrayAllocators.zeros is not exported. Here ArrayAllocators.zeros(T, ...) is essentially just Array{T}(calloc, ...).

julia> using ArrayAllocators

julia> @time AAZ = ArrayAllocators.zeros(Int, 1024, 1024, 1024);
  0.029364 seconds (61.31 k allocations: 8.004 GiB, 99.89% compilation time)

julia> @time AAZ = ArrayAllocators.zeros(Int, 1024, 1024, 1024);
  0.000037 seconds (4 allocations: 8.000 GiB)

julia> @time BZ = Base.zeros(Int, 1024, 1024, 1024);
  4.448959 seconds (2 allocations: 8.000 GiB, 0.52% gc time)

julia> @time BZ = Base.zeros(Int, 1024, 1024, 1024);
  4.665584 seconds (2 allocations: 8.000 GiB, 2.48% gc time)

Note that on some operating systems this may defer the actual allocation of memory until writing.

julia> @time fill!(AAZ, 1)
  4.849603 seconds (8.17 k allocations: 491.429 KiB, 0.64% compilation time)

julia> @time fill!(AAZ, 2);
  1.634943 seconds

julia> @time fill!(BZ, 1);
  1.710879 seconds

julia> @time fill!(BZ, 2);
  1.717882 seconds

If you wanted to switch your code to use ArrayAllocators.zeros instead of Base.zeros, you can import it before your first reference to zeros.

julia> using ArrayAllocators: zeros

julia> @time zeros(Int, 1024, 1024, 1024);
  0.000028 seconds (4 allocations: 8.000 GiB)

julia> @time Base.zeros(Int, 1024, 1024, 1024);
  4.442414 seconds (2 allocations: 8.000 GiB, 0.10% gc time)

While this may look like a free lunch, note @Sukera’s caveats in the original post:

1 Like