Cannot understand the results from sizeof and @allocated

I know that an Int64 number requires 64 bits = 8 bytes.
But what about a Int64[2]?
And, why does a naive + requires 64 bytes? I thought 64 bits was enough, in an exaggerated sense. (I’m not from computer science.)

julia> const a = Int64[2];

julia> const b = Int64[3];

julia> sizeof(2) # expected
8

julia> sizeof(a) # why the same? it is in a Vector!
8

julia> sizeof(a + b) # only takes 8 bytes to store
8

julia> @allocated a + b # why 64 bytes? seems horrible
64

the key point is that an array requires a significant amount of metadata. specifically, you need to store the location in memory, the length of the array, and because array is resizable, you also need to store the length of the underlying fixed size collection.

1 Like

This is a bit confusing, and I don’t really know why sizeof works like this, but for an array, a, only the size of the underlying data buffer is reported. To get the metadata as well, use Base.summarysize

julia> Base.summarysize([2])
48

julia> Base.summarysize(Int[])  # size of an empty array
40
5 Likes

If you have small vectors (<100 elements) and want less overhead, use SVectors:

julia> using StaticArrays

julia> a = SVector(1,2)
2-element SVector{2, Int64} with indices SOneTo(2):
 1
 2

julia> Base.summarysize(a)
16
1 Like

I think this still misses some Array metadata:

julia> Base.summarysize(Vector{Any}(undef, 0))
40

julia> Base.summarysize(Vector{Any}(undef, 2)) # +16 bytes = 2 64-bit pointers
56

julia> Base.summarysize(Any[1, 1.0]) # +16 bytes = 2 8-byte numbers
72

But there should also be some indication of the type of each element in order to properly interpret the 8-byte values. Might be flagged in the otherwise unused bits of the pointer’s virtual address, but that seems strange for an element type with an unlimited number of subtypes. I think boxed type information isn’t counted generally:

julia> Base.summarysize(Ref{Any}(1)) # 8 byte pointer, 8 byte value
16

julia> Base.summarysize(Core.Box(1)) # internal analog
16

Base.summarysize also intentionally doesn’t count some things:

julia> Base.summarysize(Int)
124

julia> Base.summarysize(DataType[Int,Int]) # base 40 + 2 8-byte pointers
56

I took this example from here arguing that it wouldn’t be accurate to add the size of the Int type for each element, but the vector’s report weirdly doesn’t count the one Int type at all.

It’s also worth pointing out that @allocated doesn’t plainly measure expected memory usage. In every example in my comment, replacing Base.summarysize with @allocated reports 0. Not sure why, I presume the compiler just figured the values were entirely discarded and just optimized it all away, which wasn’t possible for a+b.

1 Like

If you look how vectors are stored, this is understandable. A Vector{Int} is really a struct:

julia> dump(Vector{Int})
mutable struct Vector{Int64} <: DenseVector{Int64}
  ref::MemoryRef{Int64}
  size::Tuple{Int64}

And a MemoryRef{Int} is also a struct:

julia> dump(MemoryRef{Int})
struct MemoryRef{Int64} <: Ref{Int64}
  ptr_or_offset::Ptr{Nothing}
  mem::Memory{Int64}

And so is Memory{Int}:

julia> dump(Memory{Int})
mutable struct Memory{Int64} <: DenseVector{Int64}
  const length::Int64
  const ptr::Ptr{Nothing}

In addition comes the actual memory for the data, i.e. the chunk of memory pointed to by the ptr in the Memory struct. The structs also have their type information encoded, at least a pointer to a static DataType struct, that’s 8 bytes.

When you create a as a = [2], the data buffer has only room for 1 Int, but if you push more to the vector it’s reallocated to make room for more:

julia> a = [2]
1-element Vector{Int64}:
 2
julia> a.ref.mem.length
1
julia> @allocated push!(a, 1)
96
julia> a.ref.mem.length
8
julia> @allocated push!(a, 1)
0
julia> a.ref.mem.length
8

Removing from the front of the vector just changes the ptr_or_offset in the MemoryRef:

julia> a.ref.ptr_or_offset
Ptr{Nothing}(0x00007f73096751c0)
julia> popfirst!(a)
2
julia> a.ref.ptr_or_offset
Ptr{Nothing}(0x00007f73096751c8)

It’s also possible to get hold of the DataType pointer which is at the word before the Vector{Int} object, but the lower 4 bits are used temporarily by the garbage collector, and the rest doubles as different types of data, so this is highly unsafe, and easily gives segfaults:

julia> unsafe_load(Ptr{DataType}(unsafe_load(Ptr{UInt}(pointer_from_objref(a)), 0) & ~UInt(0xf)))
Array{Int64, 1}
2 Likes