Reusing preallocated memory without unsafe wraps

Joris_Pinkse · February 25, 2022, 2:01pm

Hello,

I need to use large arrays of floats in a large number of iterations. In each iteration, the size of the arrays will be different. If memory permits, I can pre-allocate the memory needed for all iterations at once and all is well.

But what are my options if it doesn’t all fit in memory? I can allocate at each iteration, but that’s painfully slow. So suppose that I pre-allocate the maximum amount of memory used in a single iteration.

I now want to address that memory efficiently. I can reshape a view as indicated below, but that still allocates the memory. Viewing a reshape would presumably not have that problem, but the dimensions may not allow reshaping.

I found an old post that does exactly what I want (How do I reinterpret and reshape a multi-dimensional array `J` into a two-dimensional array `j` such that `typeof(j) <: Matrix` is true in Julia 0.7 - #2 by zygmuntszpak), but the unsafe_wrap business scares me and the follow up suggestions in the same post depend on the dimensions allowing reshaping.

Is there a more Julian (sanctioned) way of achieving the same goal equally fast?

Here are the times if I run with two threads:

(hungry) 209.590382 seconds (349.13 k allocations: 743.015 GiB, 10.23% gc time, 0.05% compilation time)
(stuffed) 87.106966 seconds (102.11 k allocations: 1.495 GiB, 0.04% gc time, 0.01% compilation time)

(I’m running @time only once, because I only get to run the whole exercise once in practice anyway, so compilation time is part of the story)

const T = Float64

function hungry( x, y, z )
    zeros( x, y, z )
end


function stuffed( pool, x, y, z )
    unsafe_wrap(Array{T,3}, pointer(pool), (x,y,z))  # *** this does what I want
end


# *** the rest below is for testing

function experiment( which, howoften )
    nth = Base.Threads.nthreads()
    if which == :stuffed
        pool = [ zeros( T, 100_000_000 ) for th ∈ 1:nth ]
    end
    redundant = zeros( T, nth )
    for r ∈ 1:howoften
        @info "iteration $r"
        x = rand( 1:2000 )
        y = rand( 1:div(2_000_000,x) )
        z = rand( 1:div(100_000_000,x*y ) )
        Base.Threads.@threads for th ∈ 1:nth
            if which == :hungry
                A = hungry( x, y, z )
            elseif which == :stuffed
                A = stuffed( pool[th], x, y, z )
            end
            for k ∈ axes( A, 3 )
                for j ∈ axes( A, 2 )
                    for i ∈ axes( A, 1 )
                        A[i,j,k] = rand(T)   # do something with the memory
                    end
                end
            end
            redundant[th] = sum( A ) 
        end
    end
    println( sum(redundant) )
end


function timeme( )
    @time experiment( :hungry, 1000 )
    @time experiment( :stuffed, 1000 )
    # @time experiment( :hungry, 5 )
    # @time experiment( :stuffed, 5 )
end

timeme()

lmiq · February 25, 2022, 2:27pm

I would create my own matrix type with the maximum possible dimensions, and define appropriate setindex! and getindex functions for it.

Something structured like this:

mutable struct MyMatrix
       N::Int
       M::Int
       buff::Matrix{Float64}
end

m = MyMatrix(3,3,zeros(10,10)) 

getindex(m::MyMatrix,i,j) = ... something with N,N and size(buff)
setindex(m::MyMatrix,i,j) = ...
size(m::MyMatrix) = (m.N,m.M)

etc.

where buff can have a greater capacity than actually required by (N,M), and the indexing is defined appropriately taking all dimensions in consideration (would have to think about the details).

goerch · February 25, 2022, 4:38pm

I believe this is how they did it in Fortran times (and probably somewhere deep in Julia hidden, too)…

mutable struct MyMatrix
       N::Int
       M::Int
       buff::Vector{Float64}
end

getindex(m::MyMatrix,i,j) = m.buff[i * m.N + j]

It might be a call for reshape!?

Joris_Pinkse · February 25, 2022, 4:39pm

Thanks @lmiq . That could work, but would be slow.

lmiq · February 25, 2022, 5:57pm

For future reference, this is how slow it can be compared to a standard matrix, in a matrix-vector product:

julia> include("./matrix.jl")
  360.333 ns (1 allocation: 304 bytes) # buffered matrix
  160.543 ns (1 allocation: 304 bytes) # "true" matrix
  160.121 ns (1 allocation: 304 bytes) # unsafe_wrap

Code:

using Test, BenchmarkTools

struct M{T} <: AbstractMatrix{T}
    N::Int
    M::Int
    vals::Vector{T}
end

import Base: getindex, setindex!, size, length
getindex(m::M,i,j) = @inbounds m.vals[(j-1)*m.N + i]
setindex!(m::M,val,i,j) = @inbounds m.vals[(j-1)*m.N + i] = val
size(m::M) = (m.N,m.M)
length(m::M) = m.N*m.M

vals = rand(1000)

# the matrix is 30x30, using only the first 900 elements
a = M{Float64}(30,30,vals)
b = [ vals[(i-1)*30 + j] for j in 1:30, i in 1:30 ]

@test a ≈ b

x = rand(30)

@btime $a*$x
@btime $b*$x

# using unsafe_wrap
c = unsafe_wrap(Matrix{Float64}, pointer(vals), (30, 30))
@test c ≈ a
@btime $c*$x

nothing

PeterSimon · February 25, 2022, 10:05pm

I may be missing something obvious, but it seems to me that a straightforward solution would be along these lines…

buffer = zeros(1_000_000)  # As large as needed for maximal case
function makemat(buffer::AbstractVector, m, n)
    mn = m*n
    mn ≤ length(buffer) || error("buffer size exceeded")
    mymat = reshape(view(buffer, 1:mn), m, n)
end

using BenchmarkTools
x = rand(30)
a = rand(30, 30)
b = makemat(buffer, 30, 30)
b .= a

@btime $a*$x  # 176.366 ns (1 allocation: 304 bytes)
@btime $b*$x  # 178.700 ns (1 allocation: 304 bytes)

nothing

goerch · February 25, 2022, 10:12pm

As far as I understand reshape was part of the original benchmark?

Joris_Pinkse · February 27, 2022, 3:12pm

Indeed, it allocates like crazy.

PeterSimon · February 27, 2022, 8:01pm

I’m not understanding what you’re referring to. When I test makemat as defined above, I get zero allocations:

julia> @btime x = makemat($buffer,20,20);
  2.300 ns (0 allocations: 0 bytes)

julia> @btime x = makemat($buffer,200,20);
  2.300 ns (0 allocations: 0 bytes)

Topic		Replies	Views
Resize!(matrix) General Usage question	44	11218	April 22, 2018
How to pre-allocate matrices to avoid constructing new matrices during the for-loop? Performance memory-allocation	6	2923	December 20, 2017
Filling a preallocated array without allocations Performance	7	2068	July 29, 2019
Using `view` in 0.6 on a ragged array General Usage question	12	1119	July 22, 2017
Help to reduce number of allocations General Usage	23	5559	January 6, 2019

Reusing preallocated memory without unsafe wraps

Related topics