Understanding AbstractFloat Arrays

I’m having a hard time understanding the size (in bytes) of Arrays with AbstractFloat types because of the following:

julia> ar1 = Array{AbstractFloat,3}(rand(50,50,50));

julia> ar2 = Array{Float64,3}(rand(50,50,50));

julia> Base.summarysize(ar1)
2000056

julia> Base.summarysize(ar2)
1000056

Probably this is where I’m wrong, but I had the impression that the supertype AbstractFloat is sort of like Union{BigFloat,Float64,Float32,Float16}. So if you create an array with any of the two, you should be able to fill that array with any type of float. So, since BigFloat is 40 bytes, the size of the array should be, at most, 40*50*50*50 (+ some overhead). In my example, since rand() generates Float64’s then both arrays should be 8*50*50*50 = 1 000 000 (+ overhead). My first guess was that the data was being duplicated to have every Float precision, but that doesn’t make sense number wise (or at all really), so if anyone could explain this I would really appreciate.

1 Like

For the Float64 array ar2, the size is straightforward: 50 * 50 * 50 elements, and a 64-bit float takes 8 bytes, so that’s 1 MB. Then 56 bytes of overhead for the array itself.

For the AbstractFloat array ar1, it’s not possible to determine the size of every element in the array. The user can always define a new type MyType <: AbstractFloat with whatever size the user wants. Therefore, Julia needs to store every float individually on the heap, and then fill the array itself with pointers to those floats on the heap.
So, we have 50 * 50 * 50 = 125,000 floats. Each of these take up 8 bytes on the heap (plus some overhead which is somehow not counted) for 1 MB total, and then the array itself takes up 50 * 50 * 50 8-byte pointer for another 1 MB total. Then the 56 byte overhead for the array.

16 Likes

Oh, I see. So there are two arrays, one with the data and one with the pointers to each data element. That’s why the size doubled, if ar1 was filled with, let’s say, Float32 then Base.summarysize(ar1) would show 1,500,000 plus the overhead. Thank you for answering.

1 Like

Is there an old issue somwhere about summarysize not counting the boxing overhead of abstractly typed fields or elements, at minimum specifying the runtime concrete type?

No, there is only one array of pointers to 50^3 individually allocated Float64 objects on the heap.

(It is necessary to allocate them individually because they could all be different sizes, and this could change at any time. For example, imagine what would happen if you randomly chose 50% of the elements and replaced them with Float32 values.)

8 Likes

Then that’s why there’s no overhead of two arrays in Base.summarysize(ar1). summarysize is counting the bytes correctly

Arguably it should count the bytes for the type tags (every AbstractFloat allocated on the heap also has a pointer to a type), but this doesn’t seem to be counted.

2 Likes