Deepsimilar()?

I am writing an optimization package where users can input arrays of arrays. For simple arrays, initialization can be done using similar() without copying values. For arrays of arrays however, I haven’t found anything other than deepcopy() to initialize them. Since I only need the memory allocated with the appropriate structure, something like deepsimilar() would seem cleaner and save some time. Does it already exist? Is it too specific to be useful?

2 Likes

Depending on how many levels of nesting your array has, map(similar, array_of_arrays) does the trick. This assumes none of the inner arrays are the same, e.g. (x = [...]; array_of_arrays = [x, x]).

On that note, many packages which expose specialized types for arrays of arrays also implement this for you. Here is how RecursiveArrayTools does it.

3 Likes

Thanks that works! Always amazed seeing answers coming in faster than the time it took to write my question :slight_smile:.

2 Likes

why does this limitation exist? I think even there’s aliasing in the inner arrays, when you “deep similar”, you probably don’t want the new array to keep the aliasing structure anyway?

As you pointed out, it depends on how many levels x has. map(similar, x) works for arrays of arrays, but not for arrays of arrays of arrays (and deeper). And RecursiveArrayTools works for vectors of arrays. I think your solution can be made to work for an arbitrary number of levels this way:

deepsimilar(x) = eltype(x) == eltype(eltype(x)) ? similar(x) : map(deepsimilar, x)

By the way, the code for deepsimilar will create independent elements even when the initial arrays all pointed to the same object:

julia> deepsimilar(x) = eltype(x) == eltype(eltype(x)) ? similar(x) : map(deepsimilar, x)
deepsimilar (generic function with 1 method)

julia> x = [1,2];

julia> y = [x,x];

julia> z = [y,y]
2-element Vector{Vector{Vector{Int64}}}:
 [[1, 2], [1, 2]]
 [[1, 2], [1, 2]]

julia> deepsimilar(z)
2-element Vector{Vector{Vector{Int64}}}:
 [[234558320, 234558208], [287367152, 1976577520]]
 [[287367152, 2094960320], [287367152, 1976577520]]

that’s what I want to know, is there any reason to NOT do this?

There are cases where you want structural sharing in a collection of arrays (e.g. tied weights in ML models), but I’m not aware of any for arrays of arrays. The note about inner arrays was mostly added for completeness.

1 Like