Potential Julia 1.3 memory bug. Can someone help me diagnose before I submit an issue?

I am trying to define a Vector type where it has a pointer to a compressed vector and a pointer to an uncompressed vector. To save RAM initially only the compressed vector is stored in memory. Upon accessing an element of the vector, the compressed vector gets decompressed and now the uncompressed pointer will point to it. To illustrate

using Revise
using Blosc
using CmpVectors # defined below

a = rand(Int, 100_000_000)
b = Blosc.compress(a)
c = CmpVector{Int}(b)

c[1]
c[2]

c[1] # invariably crashes 1.3 on Windows 10

The above crash Julia 1.3-rc1. I think it might be a bug with Julia but I may not be understanding Julia memory correctly. Is what I am doing OK? Sorry for the long code, but it’s as simple a MWE as I can show.

If I set a = rand(Int, 3) then it works and won’t crash. So I think the code is doing thing correctly somewhat.

The full module definition is below

module CmpVectors

using Blosc

import Base: show, getindex, setindex!, eltype, size

export CmpVector

mutable struct CmpVector{T} <: AbstractVector{T}
    ptr_compressed::Ptr{UInt8}
    ptr_uncompressed::Ptr{T}
    inited::Bool

end

CmpVector{T}(ptr_compress::Ptr{UInt8}) where T= CmpVector{T}(ptr_compress, pointer_from_objref(T[zero(T)]), false)

CmpVector{T}(ptr_compress::Vector{UInt8}) where T= CmpVector{T}(pointer_from_objref(ptr_compress), pointer_from_objref(T[zero(T)]), false)

Base.eltype(cv::CmpVector{T}) where T = eltype(cv.ptr_uncompressed)

Base.size(cv::CmpVector{T}) where T = begin
    decomp(cv)
    size(unsafe_pointer_to_objref(cv.ptr_uncompressed))
end

decomp(cv) = begin
    if !cv.inited
        println("decompressing")
        cv.ptr_uncompressed = Blosc.decompress(
            eltype(cv),
            unsafe_pointer_to_objref(cv.ptr_compressed)
        ) |> pointer_from_objref

        cv.ptr_compressed = pointer_from_objref(UInt8[0])
        cv.inited = true
        println("decompressing: done")
    else
        println("didn't do nothing")
    end
end

Base.getindex(cv::CmpVector{T}, i...)  where T = begin
    decomp(cv)
    getindex(unsafe_pointer_to_objref(cv.ptr_uncompressed), i...)
end

Base.setindex!(cv::CmpVector{T}, i...)   where T = begin
    decomp(cv)
    setindex!(unsafe_pointer_to_objref(cv.ptr_uncompressed), i...)
end

Base.show(io::IO, A::MIME"text/plain", cv::CmpVector{T}) where T = begin
    if cv.inited
        show(io, A, unsafe_pointer_to_objref(cv.ptr_uncompressed))
    else
        print("data in compressed form; not shown until first used")
    end
end

end # module
pointer_from_objref(T[zero(T)])

What would keep this array that you created alive (prevent if from being garbage collected)?

Writing code like this is very tricky to get right, why are you using so many pointers and unsafe functions?

3 Likes

Had a sleep over this. Actually, I don’t think I need the pointers. Thought I was saving some RAM by doing it.

BTW, how do I mark a region of mem and dont let them be garbage collected. I think I just don’t understand how the GC works

Ram in any GC’d language is use it or lose it. The way you say you care about a piece of memory is by having a struct of some sort that is holding that ram. The reason I (on stackoverflow) and kristov asked “what are you trying to do” rather than giving a more direct answer is because if you want to be able to later fit something into a region of ram, you need to have allocated it, so there’s almost no use case where storing a pointer will save memory, because if that pointer is usable, you must be already using the ram it points to.

2 Likes

I just realised I don’t need pointers because I can just assign a single element array to save RAM and only replace it with large array when I need them

FYI checkout GC.@preserve if you need it.