Mmap Symmetric Matricies

RobertGregg · June 18, 2021, 8:32pm

I have a matrix that is too large to fit into memory, so I’ve decided to use Mmap to store it on disk which works perfectly. However, the matrix I’m storing is symmetric and it seems like a waste to use twice the amount of storage for the duplicated values.

Is there a way to use Mmap with an UpperTriangular or sparse matrix? I’ve tried:

using Mmap, LinearAlgebra

#Make up some data
n = 1000
X = rand(n,n)

io = open("myMatrix.bin", "w+");
Mmap.mmap(io, UpperTriangular{Float64, Matrix{Float64}}, (n,n))

but it leads to a method error.

Usually I wouldn’t mind the duplicated storage but the matricies I’m working with can be extremely large (e.g. 350,000 × 350,000 which takes up ~900 Gb of space).

Is there a way to reduce disk storage while still maintaining Mmap’s functionality?

blackeneth · June 18, 2021, 9:38pm

Can’t say I’ve done it … but look into storing your triangular matrix with HDF5, then use mmap.

RobertGregg · June 18, 2021, 10:37pm

Oh would HDF5 take up less space, compared to a bin file?

Another thought I had was saving the triangular matrix to disk as a vector and reshaping it somehow after reading in the file.

I can also store everything as Float32 to use half the storage

RobertGregg · June 19, 2021, 8:30pm

Just as a follow-up, I came up with this solution:

Store the upper triangular matrix as a Float32 vector onto disk
Create a subtype of AbstractMatrix to properly index the vector

#How many data points are being stored?
n = 5050

#Open an Mmap connection
io = open("X.bin", "w+");
X = Mmap.mmap(io, Vector{Float32}, n)

#Fill with some random data
X .= rand(Float32,n)

#Sync and close the connection
Mmap.sync!(X)
close(io)

Next I read the data back in (just to be thorough)

io = open("X.bin", "r+");
Xread = Mmap.mmap(io, Vector{Float32}, n);

Finally I create an array to properly index with (i.e. the hard part):

#A symmetric upper triangular matrix
mutable struct SymUpperTri{T} <: AbstractMatrix{T}
    data::AbstractVector{T}
    dim::Int

    function SymUpperTri(data::AbstractVector{T}) where T
        #Check if data is the right size to fit into upper triangular matrix
        n = length(data)
        sq = sqrt(8*n+1) #if this is a perfect square, then n is a triangular number
        floor(sq) == ceil(sq) || error("Data is not the correct length to fit into an upper triangular matrix")

        dim = (sq - 1) ÷ 2 # solving n = dim*(dim+1)/2 with quadratic formula
        return new{T}(data, dim)
    end
end

#Julia needs to know how to size and index
Base.size(A::SymUpperTri) = (A.dim, A.dim)

function Base.getindex(A::SymUpperTri{T}, i::Int, j::Int) where T
    return i > j ? A.data[i*(i-1)÷2 + j] : A.data[j*(j-1)÷2 + i]
end

# Found this fancy piece of code from the LinearAlgebra standard library
function Base.replace_in_print_matrix(A::SymUpperTri, i::Integer, j::Integer, s::AbstractString)
    return i ≤ j ? s : Base.replace_with_centered_mark(s)
end

#Check if it works (it does!)
A = SymUpperTri(Xread)   # 100×100 SymUpperTri{Float32}:

The combination of using Float32 and storing as a vector means the data (350,000 × 350,000) takes up about 225 Gb instead of 900 Gb. Hopefully this helps someone in the future ~

Topic		Replies	Views
I need advices to solve Ax=b with Mmap New to Julia mmap	16	1579	September 13, 2019
Writing an array too large to store in memory Performance question , jld	2	2206	March 16, 2018
Use of Memory-mapped I/O General Usage memory , memory-allocation	9	2872	September 5, 2019
Storage-based array that implements @views? General Usage question , array , memory-allocation	9	871	January 1, 2021
Does Mmap.sync! from disk to array, array to disk, or both? General Usage mmap	2	70	February 17, 2025

Mmap Symmetric Matricies

Related topics