Buffer to store data vectors

Hello, I need a buffer to store data vectors.

A pre-allocated vector as a buffer needs to be defragmented.

A vector dictionary is 30-60 times slower than a vector due to memory allocation and needs a crutch to check the memory used.

Please can you tell me the data structure or a package that implements similar functionality?

struct Buffer{K,T}
    # ...
end

buffer = Buffer{Int,Int}(10^6)  # memory allocate is advance

v1 = ones(Int,10^4)
v2 = ones(Int,10^6)

buffer[1] = v1  # -> Buffer{Int64, Int64} with 10000 entry
buffer[2] = v2  # -> Error: not enough memory

buffer[1]       # -> 10000-element view(::Vector{Int64}, 1:10000) with eltype Int64:

pop!(buffer,1)  # -> Vector{Int64} with 10000 elements
buffer[2] = v2  # -> Buffer{Int64, Int64} with 1000000 entry

Playing around:

julia> mutable struct Buffer{T}
           n::Int
           v::Vector{T}
       end

julia> Base.getindex(b::Buffer,i) = i == 1 ? @view(b.v[1:b.n]) : nothing

julia> function Base.setindex!(b::Buffer, vec, i::Int)
           if i == 1
               if length(vec) > length(b.v)
                   error("Not enough memory")
               else
                   k = 0
                   for val in vec
                       k += 1
                       b.v[k] = val
                   end
                   b.n = length(vec)
               end
           end
       end

julia> Buffer{T}(size::Int) where {T} = Buffer{T}(0,Vector{T}(undef,size))

julia> buff = Buffer{Int}(2);

julia> buff[1]
0-element view(::Vector{Int64}, 1:0) with eltype Int64

julia> buff[1] = (2 for _ in 1:2);

julia> buff[1]
2-element view(::Vector{Int64}, 1:2) with eltype Int64:
 2
 2

julia> buff[1] = (2 for _ in 1:3);
ERROR: Not enough memory

1 Like

I have to admit that I do not understood your use case very well. Without a better description of what this buffer object should accomplish, it is hard to help.

Why can’t you just use a Vector of Vectors?

buffer = [ones(Int,10^4), ones(Int,10^6)] 

Edit: Hm. Ok, so you want to have limited memory, so you want a vector of views into a contiguous buffer. I see.

the buffer object must allocate the amount of memory in advance, then:

push!(buffer, vec, id) # write vector data to buffer, each vector has id
buffer[id] | buffer(id) # request vector data from buffer by vector id
pop!(buffer, id) # remove vector data from the buffer by id (physical removal is optional)
view(buffer, l) # get view, l is the length of the view

if there is no free space in the buffer - error, if there is free space, but it is distributed over the buffer - defragmentation (the biggest difficulty here)

therefore, we are interested in a data structure or a package that implements this same defragmentation algorithm or data structure for similar tasks

implementing such a dictionary-based buffer is not a problem, but memory allocation and garbage collection are damage buffering profit

You can sort of implement all that by overloading getindex, setindex!, and all functions you want to function over your buffer. You probably will need to have in your struct an array of ranges, to indicate where each vector stored is in the whole buffer.

All that is feasible, I don’t know if there is any package already ready for that.

At the same time, are you sure this is not a premature optimization? If the necessary arrays are allocated in advance of the hot parts of the code, avoiding GC there, usually relying on system defaults to distribute and manage the memory is fine.

2 Likes

I think it is not. Vectors are intermediate results of calculations. They do not fit completely in RAM. The order of calculations is difficult to predict. It takes a long time to calculate again or save to disk, so I will use a buffer.

but i could be wrong

You should probably preallocate the arrays, effectivelly. Just it is possible that using a simple vector of vectors, like proposed by @DNF, may be enough. I have sometimes implemented buffers that contain arrays that do not shrink, thus preserving the allocations. Something like:

julia> mutable struct Buffer{T}
           maxmem::Int
           n::Int
           v::Vector{T}
       end

julia> Buffer{T}(maxmem::Int) where {T} = Buffer{T}(maxmem,0,Vector{T}(undef,0))

julia> Base.getindex(buff::Buffer, i) = i <= buff.n ? buff.v[i] : error("Out of bounds")

julia> Base.setindex!(buff::Buffer, val, i) = i <= buff.n ? buff.v[i] = val : error("Out of bounds")

julia> function Base.push!(buff::Buffer, val) 
           buff.n += 1
           buff.n > buff.maxmem && error("Out of buffer memory")
           if buff.n < length(buff.v)
               buff.v[buff.n] = val
           else
               push!(buff.v, val)
           end
           return buff
       end

julia> Base.empty!(buff::Buffer) = buff.n = 0

julia> buff = Buffer{Float64}(3)
Buffer{Float64}(3, 0, Float64[])

julia> push!(buff, 1.0)
Buffer{Float64}(3, 1, [1.0])

julia> buff[1]
1.0

julia> buff[1] = 2.0
2.0

julia> push!(buff, 1.0)
Buffer{Float64}(3, 2, [2.0, 1.0])

julia> push!(buff, 1.0)
Buffer{Float64}(3, 3, [2.0, 1.0, 1.0])

julia> push!(buff, 1.0)
ERROR: Out of buffer memory

julia> empty!(buff)
0

julia> buff
Buffer{Float64}(3, 0, [2.0, 1.0, 1.0])

julia> push!(buff, 5.0)
Buffer{Float64}(3, 1, [5.0, 1.0, 1.0])

julia> buff[1]
5.0

julia> buff[2]
ERROR: Out of bounds
...

How much that is complicated depends on what you actually need from the interface.

1 Like

I do believe they do not want to push a single value at a time, but a whole vector, and have an ID to just retrieve it.

My take is that this is premature optimization until profiling proves it to be the contrary. There are some problems with that idea:

  1. You want to push whole vectors into it, but from where do those vectors come from, if you want to avoid all allocation and use this buffer? You need the buffer to also give you unused memory regions of size n as views, otherwise all this is for nothing.
  2. Even if the memory for data is fixed, you will either need a flexible-sized memory to store the list of views, or to define a maximum number of simultaneous views and allocate them all during the buffer creation.
  3. The defragmentation needs to be written carefully to not allocate, and if you use a small (but large enough) memory buffer, then you probably will lose lots of time copying memory to defragment.

If you go to the route of preallocating everything, I suggest the following:

  1. For each function that needs some buffer, write a struct that holds all the memory it needs, and makes it take it as the first parameter.
  2. If this function is called by another function, and you also want this outer function to not allocate, then the struct for this outer function will have a field of the struct type associated to each function it calls and needs a buffer.
  3. Basically, you can do this recursively until you reach your main function, what will create a copy of each of those structs for the functions it directly call, and so all allocations will happen just at the main function.
3 Likes

Functions (functional objects) are generated according to the scheme. I stored result in functor . Then I decided to move it to buffer because there is not enough memory

If you do not have enough memory to preallocate everything
simultaneously, then this idea does not work, it is true. But your
idea is basically to reimplement the memory management yourself,
unless done very well (or your case being very restrict), it probably
will not be better than Julia’s own memory management.

3 Likes

thank you