Buffer to store data vectors

andrey2185 · October 27, 2022, 2:47pm

Hello, I need a buffer to store data vectors.

A pre-allocated vector as a buffer needs to be defragmented.

A vector dictionary is 30-60 times slower than a vector due to memory allocation and needs a crutch to check the memory used.

Please can you tell me the data structure or a package that implements similar functionality?

struct Buffer{K,T}
    # ...
end

buffer = Buffer{Int,Int}(10^6)  # memory allocate is advance

v1 = ones(Int,10^4)
v2 = ones(Int,10^6)

buffer[1] = v1  # -> Buffer{Int64, Int64} with 10000 entry
buffer[2] = v2  # -> Error: not enough memory

buffer[1]       # -> 10000-element view(::Vector{Int64}, 1:10000) with eltype Int64:

pop!(buffer,1)  # -> Vector{Int64} with 10000 elements
buffer[2] = v2  # -> Buffer{Int64, Int64} with 1000000 entry

lmiq · October 27, 2022, 4:06pm

Playing around:

julia> mutable struct Buffer{T}
           n::Int
           v::Vector{T}
       end

julia> Base.getindex(b::Buffer,i) = i == 1 ? @view(b.v[1:b.n]) : nothing

julia> function Base.setindex!(b::Buffer, vec, i::Int)
           if i == 1
               if length(vec) > length(b.v)
                   error("Not enough memory")
               else
                   k = 0
                   for val in vec
                       k += 1
                       b.v[k] = val
                   end
                   b.n = length(vec)
               end
           end
       end

julia> Buffer{T}(size::Int) where {T} = Buffer{T}(0,Vector{T}(undef,size))

julia> buff = Buffer{Int}(2);

julia> buff[1]
0-element view(::Vector{Int64}, 1:0) with eltype Int64

julia> buff[1] = (2 for _ in 1:2);

julia> buff[1]
2-element view(::Vector{Int64}, 1:2) with eltype Int64:
 2
 2

julia> buff[1] = (2 for _ in 1:3);
ERROR: Not enough memory

Henrique_Becker · October 27, 2022, 5:39pm

I have to admit that I do not understood your use case very well. Without a better description of what this buffer object should accomplish, it is hard to help.

DNF · October 27, 2022, 6:01pm

Why can’t you just use a Vector of Vectors?

buffer = [ones(Int,10^4), ones(Int,10^6)]

Edit: Hm. Ok, so you want to have limited memory, so you want a vector of views into a contiguous buffer. I see.

andrey2185 · October 27, 2022, 6:36pm

the buffer object must allocate the amount of memory in advance, then:

push!(buffer, vec, id) # write vector data to buffer, each vector has id
buffer[id] | buffer(id) # request vector data from buffer by vector id
pop!(buffer, id) # remove vector data from the buffer by id (physical removal is optional)
view(buffer, l) # get view, l is the length of the view

if there is no free space in the buffer - error, if there is free space, but it is distributed over the buffer - defragmentation (the biggest difficulty here)

andrey2185 · October 27, 2022, 6:36pm

therefore, we are interested in a data structure or a package that implements this same defragmentation algorithm or data structure for similar tasks

andrey2185 · October 27, 2022, 6:38pm

implementing such a dictionary-based buffer is not a problem, but memory allocation and garbage collection are damage buffering profit

lmiq · October 27, 2022, 6:43pm

You can sort of implement all that by overloading getindex, setindex!, and all functions you want to function over your buffer. You probably will need to have in your struct an array of ranges, to indicate where each vector stored is in the whole buffer.

All that is feasible, I don’t know if there is any package already ready for that.

At the same time, are you sure this is not a premature optimization? If the necessary arrays are allocated in advance of the hot parts of the code, avoiding GC there, usually relying on system defaults to distribute and manage the memory is fine.

andrey2185 · October 27, 2022, 7:05pm

I think it is not. Vectors are intermediate results of calculations. They do not fit completely in RAM. The order of calculations is difficult to predict. It takes a long time to calculate again or save to disk, so I will use a buffer.

but i could be wrong

lmiq · October 27, 2022, 7:52pm

You should probably preallocate the arrays, effectivelly. Just it is possible that using a simple vector of vectors, like proposed by @DNF, may be enough. I have sometimes implemented buffers that contain arrays that do not shrink, thus preserving the allocations. Something like:

julia> mutable struct Buffer{T}
           maxmem::Int
           n::Int
           v::Vector{T}
       end

julia> Buffer{T}(maxmem::Int) where {T} = Buffer{T}(maxmem,0,Vector{T}(undef,0))

julia> Base.getindex(buff::Buffer, i) = i <= buff.n ? buff.v[i] : error("Out of bounds")

julia> Base.setindex!(buff::Buffer, val, i) = i <= buff.n ? buff.v[i] = val : error("Out of bounds")

julia> function Base.push!(buff::Buffer, val) 
           buff.n += 1
           buff.n > buff.maxmem && error("Out of buffer memory")
           if buff.n < length(buff.v)
               buff.v[buff.n] = val
           else
               push!(buff.v, val)
           end
           return buff
       end

julia> Base.empty!(buff::Buffer) = buff.n = 0

julia> buff = Buffer{Float64}(3)
Buffer{Float64}(3, 0, Float64[])

julia> push!(buff, 1.0)
Buffer{Float64}(3, 1, [1.0])

julia> buff[1]
1.0

julia> buff[1] = 2.0
2.0

julia> push!(buff, 1.0)
Buffer{Float64}(3, 2, [2.0, 1.0])

julia> push!(buff, 1.0)
Buffer{Float64}(3, 3, [2.0, 1.0, 1.0])

julia> push!(buff, 1.0)
ERROR: Out of buffer memory

julia> empty!(buff)
0

julia> buff
Buffer{Float64}(3, 0, [2.0, 1.0, 1.0])

julia> push!(buff, 5.0)
Buffer{Float64}(3, 1, [5.0, 1.0, 1.0])

julia> buff[1]
5.0

julia> buff[2]
ERROR: Out of bounds
...

How much that is complicated depends on what you actually need from the interface.

Henrique_Becker · October 28, 2022, 12:24am

I do believe they do not want to push a single value at a time, but a whole vector, and have an ID to just retrieve it.

My take is that this is premature optimization until profiling proves it to be the contrary. There are some problems with that idea:

You want to push whole vectors into it, but from where do those vectors come from, if you want to avoid all allocation and use this buffer? You need the buffer to also give you unused memory regions of size n as views, otherwise all this is for nothing.
Even if the memory for data is fixed, you will either need a flexible-sized memory to store the list of views, or to define a maximum number of simultaneous views and allocate them all during the buffer creation.
The defragmentation needs to be written carefully to not allocate, and if you use a small (but large enough) memory buffer, then you probably will lose lots of time copying memory to defragment.

If you go to the route of preallocating everything, I suggest the following:

For each function that needs some buffer, write a struct that holds all the memory it needs, and makes it take it as the first parameter.
If this function is called by another function, and you also want this outer function to not allocate, then the struct for this outer function will have a field of the struct type associated to each function it calls and needs a buffer.
Basically, you can do this recursively until you reach your main function, what will create a copy of each of those structs for the functions it directly call, and so all allocations will happen just at the main function.

andrey2185 · October 28, 2022, 10:40am

Functions (functional objects) are generated according to the scheme. I stored result in functor . Then I decided to move it to buffer because there is not enough memory

Henrique_Becker · October 29, 2022, 12:16am

If you do not have enough memory to preallocate everything
simultaneously, then this idea does not work, it is true. But your
idea is basically to reimplement the memory management yourself,
unless done very well (or your case being very restrict), it probably
will not be better than Julia’s own memory management.

andrey2185 · October 29, 2022, 2:19pm

thank you

Topic		Replies	Views
Shrinking and expanding vectors New to Julia	4	1365	November 28, 2017
Reusing buffer arrays by providing them as arguments to functions Performance	5	1671	March 17, 2019
Memory{T} in 1.11 New to Julia	1	1163	March 19, 2024
Problem with creating view of vectors in vector of vectors (it keeps allocating) Performance memory-allocation , vector	2	121	July 9, 2024
Vector of Vectors / Vector of Tuples Performance question	6	183	December 3, 2024

Buffer to store data vectors

Related topics