docs = Vector{Tuple{Tuple{Int64, Int64}, Vector{Int32}}}(60000000)
for i = 1:60000000
docs[i] = ((i, i), Vector{Int32}(2))
end
while true
end
I reduced my problem to be above code. I expect docs to consume about 1.5GiB of memory. However, top shows it consumes about 7.8GiB.
1.5GiB because 60M * 24Bytes = 1440 MBytes < 1.5GiB.
I think there are probably a lot of things that I don’t yet understand about Julia Object’s memory footprint. Could someone please help me understand why this consumes so much more memory than I expected and maybe some suggestions to reduce the memory footprint?
Vectors take a very large amount of space.
You might have thought that a Vector{Int32}(2) would only take a small about of space, but in addition to the 8 bytes to store the 32-bit integers, it will take at least another 40 more (on a 64-bit platform).
Do those vectors need to be growable?
If they are always of length 2, you could just use a Tuple{Int32,Int32} to store two integer values.
The vectors don’t have to grow but its content will need to be updated frequently. Also, it’s not necessarily a 2-element vector at each position. The length at each position actually depends on the input data. I use a constant here to make the presentation simple.
Even though a tuple (or struct) is immutable, if they are small, frequently it’s better just to update the whole thing, if they are contained in something that is mutable, such as a vector.
Is there a small range of sizes for what you have as a vector there? If so, it might still be more efficient to use a tuple large enough for the largest)
If the lengths of the vectors tend to be large, the overhead for using a Vector won’t be so noticeable.
Thanks! I think tuple is probably going to help, but it would be nice if there was a mutable tuple type.
The problem is that the range of vector length depends on the input data and this program needs to be general. In practice, I think most vectors are small (a few elements) but there could be a few very large ones that contain tens of elements.
I ran the code on my machine, the created Arrayit looks like this (with N = 6):
julia> docs
6-element Array{Tuple{Tuple{Int64,Int64},Array{Int32,1}},1}:
((1, 1), Int32[132080144, 0])
((2, 2), Int32[76064064, 0])
((3, 3), Int32[76064000, 0])
((4, 4), Int32[135077488, 0])
((5, 5), Int32[143207064, 0])
((6, 6), Int32[143207064, 0])
It appears to me that the tuple ((1, 1), Int32[132080144, 0]) is the index of the array anyway (with 2 identical values in the tuple, so get rid of it.
What remains is this:
(Int32[143207064, 0])
so, this could be setup by a simple Array:
julia: a=Array{Int32}(N,2)
then you iterate through it.
simple Example:
function iterat(N)
ar= Array{Int32}(N,2)
for k = 1:2
for i in 1:N
ar[i,k]= k×i # example calculation
end
return ar
end
julia> @time iterat(60_000_000)
0.492816 seconds (2.17 k allocations: 457.874 MiB, 12.88% gc time)
Nonetheless, this could be optimized.
A Vector has more than 40 byte overhead: You have the header struct, which is one cache-line, plus the type-tag, plus whatever gc / malloc / free uses.
Then, you wrap your Vectors in Tuples. This is bad: Extra allocations, type-tags, pointers, etc.
As far as I gathered, each doc[i] needs to store two Int64 and a variable number of Int32. The easiest update would be to store doc_meta=Matrix{Int64}(2,num_docs) (the same as Vector{Tuple{Int64,Int64}}, pointers are literally exchangable, choose whatever is nicer for syntax or has better dispatch; possibly keep both variants around, via reinterpret) and doc_var = Vector{Vector{Int32}}(num_docs).
This still has the bad memory access patterns and memory overhead in doc_var. Do you construct doc_var once and then never change length(doc_var[i])?
In this case I would store doc_var_elems::Vector{Int32} as well as doc_var_idx::Vector{Tuple{Int64,Int64}} where the first tells us where in doc_var_elems the first doc_var[i] is stored, and the second tells us the length(doc_var[i]). If you only need 1<<32doc_var_elems, feel free to use Int32 or UInt32. If the none of the doc_var_elems are very large, feel free to use UInt8 or UInt16 for the lengths (but then you need to store them in a separate vector, because of alignment/padding). Feel free to store pointers to the data instead of offsets into the array if you need 64bit because you have so many Int32s to store. Feel free to reuse some of these 64 bit for tags.
edit: If you can construct in order, you don’t need to store lengths at all, but instead each doc_var_idx[i] stores the offset of doc_var[i], and length(doc_var[i])=doc_var_idx[i+1]-doc_var_idx[i].
Thanks! But yeah, it’s not a 2-element vector at every position. Besides, the Tuple{Int64, Int64} is not really the array index. Again I did that for simple presentation. So I can just get rid of it.
For future readers’ benefit, I did an experiment to measure the memory overhead of using Vector{Int32}, Tuple (abstract type) or Tuple{Int32, Int32} to store the two Int32 in each doc.
Using Vector{Int32} consumed 8005MiB, Tuple consumed 3613MiB and Tuple{Int32, Int32} consumed 1504MiB. This makes sense to me.
Another question, is the number of Int32’s fixed once a “doc” element is allocated?
One very memory efficient structure would be to simple have one big growable Vector{Int32} to store the data,
and then use the indices into that to find the actual data.
For example:
struct MyStruct
fixdata::Tuple{Int,Int}
offsets::Vector{UInt32} # The size of this depends on how many Int32 vector elements you'll have total, if more than 4 billion, use UInt64.
data::Vector{Int32}
end
FWIW, what I’m using in my crappy code is the following:
struct uvec{T}<:AbstractVector{T}
ptr::Ptr{T}
len::Int64
end
Base.length(x::uvec{T}) where {T} =x.len
Base.size(x::uvec{T}) where {T} =(x.len,)
Base.IndexStyle(::uvec) = Base.IndexLinear()
Base.getindex(x::uvec,i) = unsafe_load(x.ptr,i)
Base.setindex!(x::uvec,v,i) = unsafe_store!(x.ptr,v,i)
struct ragVec{T} <:AbstractVector{uvec{T}}
offsets::Vector{Int64}
elems::Vector{T}
end
function ragVec(x::Vector{Vector{T}}) where T
n = length(x)
nelems = sum(length.(x))
offsets=Vector{Int}(n+1)
elems = Vector{T}(nelems)
off=1
for i=1:n
offsets[i]=off
for j=1:length(x[i])
elems[off] = x[i][j]
off+=1
end
end
offsets[n+1]=off
return ragVec(offsets,elems)
end
Base.length(x::ragVec{T}) where T =length(x.offsets)-1
Base.size(x::ragVec{T}) where {T} =(length(x),)
Base.IndexStyle(::ragVec) = Base.IndexLinear()
Base.@propagate_inbounds function Base.getindex(x::ragVec{T},i) where T
ptr = pointer(x.elems, x.offsets[i])
len = x.offsets[i+1]-x.offsets[i]
return uvec(ptr,len)
end
In my use, it is about avoiding the extra indirection and cache friendlyness (I construct once, and then spend a lot of time indexing into ragVec), not about amount of data.