Difference in allocations when constructing array-of-arrays vs array-of-structs


#1

I have a similar question concerning a very different use cases: the case where most of the inner vectors are actually equal and pretty large (for example, useful when working with “shifted versions” of the same Array, that can and often will share the underlying data but have different shifts). Making a large Array of views would behave similarly. The usecase is: working with DataFrames where some columns represent a time signal shifted with different shifts.

The situation is the following:

julia> using LinearAlgebra, Random, ShiftedArrays

julia> L = 1:100000
1:100000

julia> v = rand(100000);

julia> using BenchmarkTools

julia> @benchmark [ShiftedArray(v, -i) for i in L]
BenchmarkTools.Trial: 
  memory estimate:  5.33 MiB
  allocs estimate:  199493
  --------------
  minimum time:     7.076 ms (0.00% GC)
  median time:      7.264 ms (0.00% GC)
  mean time:        8.692 ms (16.24% GC)
  maximum time:     64.684 ms (88.45% GC)
  --------------
  samples:          576
  evals/sample:     1

julia> @benchmark [(v, -i) for i in L]
BenchmarkTools.Trial: 
  memory estimate:  5.33 MiB
  allocs estimate:  199493
  --------------
  minimum time:     14.498 ms (0.00% GC)
  median time:      14.728 ms (0.00% GC)
  mean time:        16.352 ms (9.47% GC)
  maximum time:     74.507 ms (79.07% GC)
  --------------
  samples:          306
  evals/sample:     1

julia> @benchmark [v for i in L]
BenchmarkTools.Trial: 
  memory estimate:  781.42 KiB
  allocs estimate:  5
  --------------
  minimum time:     156.358 μs (0.00% GC)
  median time:      162.510 μs (0.00% GC)
  mean time:        211.878 μs (17.43% GC)
  maximum time:     41.303 ms (99.60% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark [-i for i in L]
BenchmarkTools.Trial: 
  memory estimate:  781.36 KiB
  allocs estimate:  3
  --------------
  minimum time:     46.720 μs (0.00% GC)
  median time:      53.779 μs (0.00% GC)
  mean time:        74.173 μs (25.85% GC)
  maximum time:     40.476 ms (99.87% GC)
  --------------
  samples:          10000
  evals/sample:     1

Though when doing the same with broadcasting, the situation is not so bad in terms of time, but still not ideal in terms of memory:

julia> @benchmark tuple.((v,), L)
BenchmarkTools.Trial: 
  memory estimate:  3.82 MiB
  allocs estimate:  100040
  --------------
  minimum time:     497.080 μs (0.00% GC)
  median time:      538.889 μs (0.00% GC)
  mean time:        1.456 ms (62.69% GC)
  maximum time:     58.943 ms (99.07% GC)
  --------------
  samples:          3427
  evals/sample:     1

julia> @benchmark ShiftedArray.((v,), L)
BenchmarkTools.Trial: 
  memory estimate:  3.82 MiB
  allocs estimate:  100056
  --------------
  minimum time:     559.148 μs (0.00% GC)
  median time:      645.822 μs (0.00% GC)
  mean time:        1.615 ms (59.04% GC)
  maximum time:     62.889 ms (99.00% GC)
  --------------
  samples:          3088
  evals/sample:     1

What I don’t understand is the following: the memory estimate and time are massively higher for the case where I store the vector and the shift together (rather than separately).

Does this mean that, for this use case, I need to create a custom ShiftsArray obect that stores the vectors and the shifts in two separate vectors?

In that case maybe it could also be useful to have a ViewsVector to do that with views, or even automatize the process that goes from a ArrayOfStructs to StructOfArrays (we have that for NamedTuples already in the IndexableTables package, but I imagine it could be generalized to work with any struct, not just NamedTuples.


Quick Longitudinal Access to arrays
#2

This has to do with the nitty-gritty details of how arrays store their elements. If a value isbits(), then it can be stored directly into the array, “flattened” out and arranged sequentially. Otherwise, the array is simply a list of n pointers to independently stored objects elsewhere in memory.

With v=rand(…), isbits(v) is false, so it gets stored into arrays as a pointer. [v for _ in 1:10] is just a list of pointers to an already existing object. All it needs to do is create a vector of length sizeof(Ptr)*10 and then store the same value 10 times.

isbits((v,)) and isbits(ShiftedArray(v, i)) are also false, but in the case of [(v,) for _ in 1:10] it’s actually constructing a new object each time. This means that it not only needs to store the list of pointers, but it’s also got to keep track of the “type tag” for each object independently. Note that these are only !isbits because v is a mutable array. Watch what happens when you use v=1:2 instead:

julia> v = rand(100000);

julia> isbits((v, 1))
false

julia> @btime [($v,i) for i in 1:100000];
  1.095 ms (100004 allocations: 3.81 MiB)

julia> v = 1:100000;

julia> isbits((v, 1))
true

julia> @btime [($v,i) for i in 1:100000];
  872.530 μs (2 allocations: 2.29 MiB)

#3

Thanks a lot for the detailed answer!

I think I understand the logic of storing the pointer (which in my mind should be reasonably cheap: I get sizeof(Ref(v)) == 16, I hope I’m checking the correct thing). In my use case I most definitely need to store the pointer as the actual array I’m pointing to is generally huge.

So the difference is:

  1. The element is ShiftedArray(v,1): the array element is a pointer to a ShiftedArray, which means it has information about where the ShiftedArray is and it also has information that those bits have to be read as a ShiftedArray (in turn, some of those bits will be a pointer to the actual Array)

  2. The element is v: the array element is a pointer to v, which means it has information about where the vector v is and it also has information that those bits have to be read as a Vector

So it’d be simpler if I defined a different container types that stores the pointer to the vectors in one array and the shifts in another (otherwise I have to go through multiple pointers).

I do not understand the type tag part though: is it a consideration only in terms of creating the array [(v,) for i in 1:100000] ? Once it is created, the size should be the same as [v for i in 1:100000] as both need to store the pointer and the type tag (Tuple{Vector} in one case, Vector in the other, but that shouldn’t matter I imagine).


#4

Perhaps it’s a little helpful to pull back the curtain here and see what’s actually happening:

julia> v = rand(5);

julia> A = [v for _ in 1:3]
3-element Array{Array{Float64,1},1}:
 [0.00496301, 0.892864, 0.164809, 0.542822, 0.126373]
 [0.00496301, 0.892864, 0.164809, 0.542822, 0.126373]
 [0.00496301, 0.892864, 0.164809, 0.542822, 0.126373]

julia> unsafe_wrap(Array, Ptr{UInt}(pointer(A)), (3,)) # Look at the array's data as UInts
3-element Array{UInt64,1}:
 0x000000011fad6a10
 0x000000011fad6a10
 0x000000011fad6a10

So here I’m asking Julia to look at the same data, but interpreted as a Vector{UInt} instead of an Vector{Vector{Float64}}. This lets us look at the raw pointer values instead of having Julia follow them. You can see here that all three are the same — they’re pointing to the same object. We can ask Julia to load what is at that location:

julia> unsafe_pointer_to_objref(Ptr{Any}(0x000000011fad6a10))
5-element Array{Float64,1}:
 0.00496301
 0.892864
 0.164809
 0.542822
 0.126373

Note that I didn’t need to tell Julia that it was a Ptr{Vector{Float64}} — it figured that out on its own. How’d it do that? That’s the “type tag.” Julia hides a bit of information away just before the object. We can load that, too. Let’s look at the UInt just before the object itself — we can do this by loading the 0th index relative to the pointer (out of bounds!):

julia> unsafe_load(Ptr{UInt}(0x000000011fad6a10), 0)
0x000000010f204130

Hunh, that kinda looks like a similar value. I wonder if it’s also a pointer. Let’s see if Julia knows what it is. (This is dangerous and can segfault, but I’m just exploring):

julia> unsafe_pointer_to_objref(Ptr{Any}(0x000000010f204130))
Array{Float64,1}

Oh hey! It’s the type! If you dig through the source you’ll find that this type tag could be interpreted in two other ways, but this happens to work in this case.

Ok, now you can try the same thing with tuples or your struct — you should find that it’s allocating a bunch of small objects that just hold onto the pointer to the array and that int, and you should be able to follow the pointers along the way.


#5

Of course, this all skirts around the bigger question you’re asking — which relates to isbits and what gets stored “flattened” into an array and what gets stored as pointers to other (tagged) values elsewhere. Watch what happens when I try the same trick with an array of integers:

julia> A = [i for i in 1:3]
3-element Array{Int64,1}:
 1
 2
 3

julia> unsafe_wrap(Array, Ptr{UInt}(pointer(A)), (3,)) # Look at the array's data as UInts
3-element Array{UInt64,1}:
 0x0000000000000001
 0x0000000000000002
 0x0000000000000003

Those aren’t pointers! They’re just the Ints themselves! And there’s no type tag — they’re just smashed together one after another. That’s because the Array itself is holding onto the type tag as a type parameter — it knows how to interpret each element. This is different if it’s an Array{Any}:

julia> A = Any[i for i in 1:3]
3-element Array{Any,1}:
 1
 2
 3

julia> unsafe_wrap(Array, Ptr{UInt}(pointer(A)), (3,)) # Look at the array's data as UInts
3-element Array{UInt64,1}:
 0x000000010f198090
 0x000000010f1980d0
 0x000000010f198110

julia> unsafe_pointer_to_objref(Ptr{Any}(0x000000010f198090))
1

Now we’re back to a list of pointers since the array now needs to ask each element what its type is… and it doesn’t know how big it might be. Now try this with some other immutable values like UnitRanges.

Now, theoretically, a tuple (rand(3), 3) could be considered isbits and stored flattened into an array — just alternate between the pointer and the Int. But due to considerations of how our garbage collector tracks these objects, we don’t do that right now. Hopefully someday we’ll be able to make that optimization. Check out https://github.com/JuliaLang/julia/pull/18632 and the linked issues for more info.


#6

Thanks a lot for the lengthy explanation! I’ll definitely play around with these under the hood concepts as I still need to get a better grasp of storing complex objects in Arrays and the performance implications of doing so.