Reusing preallocated memory without unsafe wraps


I need to use large arrays of floats in a large number of iterations. In each iteration, the size of the arrays will be different. If memory permits, I can pre-allocate the memory needed for all iterations at once and all is well.

But what are my options if it doesn’t all fit in memory? I can allocate at each iteration, but that’s painfully slow. So suppose that I pre-allocate the maximum amount of memory used in a single iteration.

I now want to address that memory efficiently. I can reshape a view as indicated below, but that still allocates the memory. Viewing a reshape would presumably not have that problem, but the dimensions may not allow reshaping.

I found an old post that does exactly what I want (How do I reinterpret and reshape a multi-dimensional array `J` into a two-dimensional array `j` such that `typeof(j) <: Matrix` is true in Julia 0.7 - #2 by zygmuntszpak), but the unsafe_wrap business scares me and the follow up suggestions in the same post depend on the dimensions allowing reshaping.

Is there a more Julian (sanctioned) way of achieving the same goal equally fast?

Here are the times if I run with two threads:

(hungry) 209.590382 seconds (349.13 k allocations: 743.015 GiB, 10.23% gc time, 0.05% compilation time)
(stuffed) 87.106966 seconds (102.11 k allocations: 1.495 GiB, 0.04% gc time, 0.01% compilation time)

(I’m running @time only once, because I only get to run the whole exercise once in practice anyway, so compilation time is part of the story)

const T = Float64

function hungry( x, y, z )
    zeros( x, y, z )

function stuffed( pool, x, y, z )
    unsafe_wrap(Array{T,3}, pointer(pool), (x,y,z))  # *** this does what I want

# *** the rest below is for testing

function experiment( which, howoften )
    nth = Base.Threads.nthreads()
    if which == :stuffed
        pool = [ zeros( T, 100_000_000 ) for th ∈ 1:nth ]
    redundant = zeros( T, nth )
    for r ∈ 1:howoften
        @info "iteration $r"
        x = rand( 1:2000 )
        y = rand( 1:div(2_000_000,x) )
        z = rand( 1:div(100_000_000,x*y ) )
        Base.Threads.@threads for th ∈ 1:nth
            if which == :hungry
                A = hungry( x, y, z )
            elseif which == :stuffed
                A = stuffed( pool[th], x, y, z )
            for k ∈ axes( A, 3 )
                for j ∈ axes( A, 2 )
                    for i ∈ axes( A, 1 )
                        A[i,j,k] = rand(T)   # do something with the memory
            redundant[th] = sum( A ) 
    println( sum(redundant) )

function timeme( )
    @time experiment( :hungry, 1000 )
    @time experiment( :stuffed, 1000 )
    # @time experiment( :hungry, 5 )
    # @time experiment( :stuffed, 5 )


1 Like

I would create my own matrix type with the maximum possible dimensions, and define appropriate setindex! and getindex functions for it.

Something structured like this:

mutable struct MyMatrix

m = MyMatrix(3,3,zeros(10,10)) 

getindex(m::MyMatrix,i,j) = ... something with N,N and size(buff)
setindex(m::MyMatrix,i,j) = ...
size(m::MyMatrix) = (m.N,m.M)


where buff can have a greater capacity than actually required by (N,M), and the indexing is defined appropriately taking all dimensions in consideration (would have to think about the details).


I believe this is how they did it in Fortran times (and probably somewhere deep in Julia hidden, too)…

mutable struct MyMatrix

getindex(m::MyMatrix,i,j) = m.buff[i * m.N + j]

It might be a call for reshape!?

Thanks @lmiq . That could work, but would be slow.

1 Like

For future reference, this is how slow it can be compared to a standard matrix, in a matrix-vector product:

julia> include("./matrix.jl")
  360.333 ns (1 allocation: 304 bytes) # buffered matrix
  160.543 ns (1 allocation: 304 bytes) # "true" matrix
  160.121 ns (1 allocation: 304 bytes) # unsafe_wrap


using Test, BenchmarkTools

struct M{T} <: AbstractMatrix{T}

import Base: getindex, setindex!, size, length
getindex(m::M,i,j) = @inbounds m.vals[(j-1)*m.N + i]
setindex!(m::M,val,i,j) = @inbounds m.vals[(j-1)*m.N + i] = val
size(m::M) = (m.N,m.M)
length(m::M) = m.N*m.M

vals = rand(1000)

# the matrix is 30x30, using only the first 900 elements
a = M{Float64}(30,30,vals)
b = [ vals[(i-1)*30 + j] for j in 1:30, i in 1:30 ]

@test a ≈ b

x = rand(30)

@btime $a*$x
@btime $b*$x

# using unsafe_wrap
c = unsafe_wrap(Matrix{Float64}, pointer(vals), (30, 30))
@test c ≈ a
@btime $c*$x

1 Like

I may be missing something obvious, but it seems to me that a straightforward solution would be along these lines…

buffer = zeros(1_000_000)  # As large as needed for maximal case
function makemat(buffer::AbstractVector, m, n)
    mn = m*n
    mn ≤ length(buffer) || error("buffer size exceeded")
    mymat = reshape(view(buffer, 1:mn), m, n)

using BenchmarkTools
x = rand(30)
a = rand(30, 30)
b = makemat(buffer, 30, 30)
b .= a

@btime $a*$x  # 176.366 ns (1 allocation: 304 bytes)
@btime $b*$x  # 178.700 ns (1 allocation: 304 bytes)

1 Like

As far as I understand reshape was part of the original benchmark?

1 Like

Indeed, it allocates like crazy.

I’m not understanding what you’re referring to. When I test makemat as defined above, I get zero allocations:

julia> @btime x = makemat($buffer,20,20);
  2.300 ns (0 allocations: 0 bytes)

julia> @btime x = makemat($buffer,200,20);
  2.300 ns (0 allocations: 0 bytes)