Arrays with non-uniform dimensions

I’m looking for a data-structure that essentially works as an array-of-arrays, where each “row” can have a different number of “columns”. The point of that is being able to easily perform map or broadcast, etc, as if it were a multidimensional array. I thought this was possible with OffsetArrays but I couldn’t find a reference. Is there any module for that?

Maybe RaggedArrays.jl
https://github.com/mbauman/RaggedArrays.jl

1 Like

Just found this thread from a couple of years ago mentioning an idea that actually relates to this topic, “recursive broadcasts”…

In my opinion the best approach would be to have a data-structure that can be easily mapped to/from arrays of arrays, and then support mapping etc (isomorphism? catamorphism? I forgot the correct term). There’s an AbstractTrees module, but I’m not sure how easy is it to use that.

If you just want to apply a single function to an array of arrays, you may consider defining a function having as input an array of arrays. I think something like the following should work:

function f(x::T)::T where T <: Real
    return x^2
end

function f(x_n::Array{Array{T,1}})::Array{Array{T,1}} where T <: Real
    return [f.(x) for x in x_n]
end

Indeed, we can define such a function and get the task done. The point is to be able to do it using the same code you might write for eg arrays. If your data is stored in a regular multi-dimensional tensor, you can map or broadcast a function over the data structure, and get a new data structure containing the mapped results. If you now have this different, irregular data-structure, your code written with map, comprehensions or broadcasts cannot be used anymore.

What I’m looking for is a way to have a view of this array-of-arrays into a type, say VectorTree, and then now we could define map(x::VectorTree{T}), like you did. Now we can reuse code written with map or broadcasts, etc, and then feed this high-level code either a multidimensional array or a VectorTree, and the code still works, it does the right thing depending on what the input type of the data-structure is.

The point is Julia has great support for high-level constructs like map as long as you’re using multi-dimensional arrays. What is understandable. I stumbled upon a case of having written my code on top of arrays, and now I wanted to reuse the same code processing an irregular array-of-arrays, but I need to reuse the same map functors, etc. And perhaps the biggest question would be how to generalize whatever we implement, such as your function, to an arbitrary number of levels… Maybe just assume the data is a tree with a certain height?

Here’s one proposal for how to do this. The structure merely holds a reference to the array-of-arrays, and the type of the tree leaves. map runs recursively until the eltype matches the leaf type, and then we apply the function. This should support any tree height, trees with different depths for each branch, and also trees where the leaves are actually Arrays that should not be traversed.

import Base.map

struct ArrayTree{L,T,N}
    a::Array{T,N}
end

function Base.map(f, at::ArrayTree{L,T,N}) where {L,T,N}
    ArrayTree{L,T,N}(arraytreemap_(f, at.a, L))
end

function arraytreemap_(f, array, L)
    if eltype(array) == L
        map(f, array)
    else
        map(array) do x
            arraytreemap_(f, x, L)
        end
    end
end

myarray = [rand(3), rand(1), rand(2)]
@show myarray
aa = ArrayTree{Float64, Array{Float64, 1}, 1}(myarray)
ma = map(aa) do x x^2+100 end
@show ma.a

# myarray = [[0.27519533394619367, 0.5878159031539618, 0.3940555494438669], [0.07240825460216072], [0.43409689370057825, 0.4750862276744192]]
# ma.a = [[100.07573247182576, 100.3455275360007, 100.15527977604751], [100.00524295533454], [100.18844011312049, 100.22570692372591]]

I created a package GitHub - nlw0/ArrayTrees.jl: ArrayTrees for Julia

map and broadcast work totally fine with arrays and collections containing any types, including inner arrays. They always do the same thing: apply a function to each element of the given collection, whatever those elements are. This is crucial for generic programming!

Recursing while/till some arbitrary type is encountered can also be useful in some cases, but the specific recursion criteria should be clearly spelled or even defined explicitly by the user.
Do you want to recurse into all abstractarrays? If yes, this would go into SVectors and FieldArrays, that are often treated as atomic objects. If not, then even views of simple arrays won’t be recursed into.
Do you also want to traverse other collections like dicts, sets, and iterators? And so on.

Your example problem

can be directly solved with Accessors.jl: @modify(x -> x^2 + 100, myarray |> Elements() |> Elements()). Accessors also have the Recursive selector with arbitrary conditions.

I’m extremely interested in generic programming, that’s the whole point. map does what it does on Arrays or AbstractArrays and there’s absolutely no question of changing this. This would be ludicrous. The question is if you have eg a Vector{Vector{X}} and you would like to treat it as a different data structure, with its own semantics, where we can map over it, what can you do? Accessors seems pretty interesting, but it still does not go all the way from taking the list-of-lists and making it work with map. There’s AbstractTrees.jl as well but I’m not sure it offers a solution. The point of the solution I suggested is not whether we are capable of getting the task done, which is trivial, the point is to make it generic and working with map. We seem to achieve that. I’d love to hear ideas of other ways we might potentially take a Vector{Vector{X}} and run some code with map over it, not merely achieve the same effect with eg a double for loop or whatever, but using map specifically, which I assume could then be adapted to work with broadcasting as well.