Disabling allocations

It is possible in Julia:

using BenchmarkTools, LoopVectorization, PaddedMatrices#v0.2.1

@noinline function dostuff(A, B)
    C = A*B
    s = zero(eltype(C))
    @avx for i in eachindex(C)
        s += C[i]
    end
    s
end
function main_v1()
    A = @StrideArray rand(8,8);
    B = @StrideArray rand(8,8);
    dostuff(A, B)
end
function main_v2()
    A = @StrideArray rand(8,8);
    B = @StrideArray rand(8,8);
    @gc_preserve dostuff(A, B)
end
@benchmark main_v1()
@benchmark main_v2()

Because a StrideArray is your typical mutable array:

julia> A = @StrideArray rand(2,5)
2×5 StrideMatrix{Tuple{StaticInt{2}, StaticInt{5}}, (true, true), Float64, 1, 0, (1, 2), Tuple{StaticInt{8}, StaticInt{16}}, Tuple{StaticInt{1}, StaticInt{1}}, PaddedMatrices.MemoryBuffer{10, Float64}} with indices StaticInt{1}():StaticInt{1}():StaticInt{2}()×StaticInt{1}():StaticInt{1}():StaticInt{5}():
 0.429701  0.318488  0.842704  0.0217103  0.212563
 0.82351   0.245693  0.890502  0.941539   0.626707

julia> A[1,3] = 8;

julia> A
2×5 StrideMatrix{Tuple{StaticInt{2}, StaticInt{5}}, (true, true), Float64, 1, 0, (1, 2), Tuple{StaticInt{8}, StaticInt{16}}, Tuple{StaticInt{1}, StaticInt{1}}, PaddedMatrices.MemoryBuffer{10, Float64}} with indices StaticInt{1}():StaticInt{1}():StaticInt{2}()×StaticInt{1}():StaticInt{1}():StaticInt{5}():
 0.429701  0.318488  8.0       0.0217103  0.212563
 0.82351   0.245693  0.890502  0.941539   0.626707

main_v1 of course causes allocations:

julia> @benchmark main_v1()
BenchmarkTools.Trial:
  memory estimate:  1.06 KiB
  allocs estimate:  2
  --------------
  minimum time:     71.279 ns (0.00% GC)
  median time:      91.891 ns (0.00% GC)
  mean time:        101.286 ns (10.74% GC)
  maximum time:     924.190 ns (77.74% GC)
  --------------
  samples:          10000
  evals/sample:     974

But main_v2 stack allocates them:

julia> @benchmark main_v2()
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     40.155 ns (0.00% GC)
  median time:      40.206 ns (0.00% GC)
  mean time:        40.346 ns (0.00% GC)
  maximum time:     57.046 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     990

Also, for good measure, using immutable structs will normally be stack allocated:

using StaticArrays
function main_v3()
    A = @SMatrix rand(8,8);
    B = @SMatrix rand(8,8);
    dostuff(A, B)
end
@benchmark main_v3()

yielding

julia> @benchmark main_v3()
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     186.780 ns (0.00% GC)
  median time:      187.906 ns (0.00% GC)
  mean time:        188.042 ns (0.00% GC)
  maximum time:     203.646 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     891

The macro PaddedMatrices.@gc_preserve works by

  1. Using GC.@preserve on all arguments to the call
  2. Tries to replace AbstractArrays with a PtrArray that holds a pointer to the original array, along with holding its size, strides, and offsets.

GC.@preserve protects the memory from getting collected, but it is still stack allocated as it cannot escape; the array itself is replaced with the PtrArray. Of course, you as the user have to guarantee that the PtrArray doesn’t escape, as it’s only valid for as long as GC.@preserve protects the data.

You could take a similar approach with whatever data structures you need.

Also, you should be able to avoid all dynamic dispatches. Try creating branches at the point a type could take one more than one value, and immediately call into those functions. That is, instead of

function foo(args...)
    # do stuff
    if hit_metalic_object
        thing_its_reflecting_off_of = MetalType()
    elseif #...
        thing_its_reflecting_off_of = #...
    else #...
        # ...
    end
     # computations continue
end

do something like

Now, I don’t know what your code looks like

function foo(args...)
    # do stuff
    if hit_metalic_object
        foo_continued(MetalType(), args...)
    elseif #...
        # ...
    else #...
        # ...
    end
 end
function foo_continued(thing_its_reflecting_off_of, args...)
    # computations continue
end

I don’t know what your code looks like, but you should be able to restructure/organize it in a way to avoid dynamic dispatches.

3 Likes