I’m looking for a data-structure that essentially works as an array-of-arrays, where each “row” can have a different number of “columns”. The point of that is being able to easily perform map or broadcast, etc, as if it were a multidimensional array. I thought this was possible with OffsetArrays
but I couldn’t find a reference. Is there any module for that?
Maybe RaggedArrays.jl
https://github.com/mbauman/RaggedArrays.jl
Just found this thread from a couple of years ago mentioning an idea that actually relates to this topic, “recursive broadcasts”…
In my opinion the best approach would be to have a data-structure that can be easily mapped to/from arrays of arrays, and then support mapping etc (isomorphism? catamorphism? I forgot the correct term). There’s an AbstractTrees module, but I’m not sure how easy is it to use that.
If you just want to apply a single function to an array of arrays, you may consider defining a function having as input an array of arrays. I think something like the following should work:
function f(x::T)::T where T <: Real
return x^2
end
function f(x_n::Array{Array{T,1}})::Array{Array{T,1}} where T <: Real
return [f.(x) for x in x_n]
end
Indeed, we can define such a function and get the task done. The point is to be able to do it using the same code you might write for eg arrays. If your data is stored in a regular multi-dimensional tensor, you can map
or broadcast a function over the data structure, and get a new data structure containing the mapped results. If you now have this different, irregular data-structure, your code written with map
, comprehensions or broadcasts cannot be used anymore.
What I’m looking for is a way to have a view of this array-of-arrays into a type, say VectorTree
, and then now we could define map(x::VectorTree{T})
, like you did. Now we can reuse code written with map
or broadcasts, etc, and then feed this high-level code either a multidimensional array or a VectorTree
, and the code still works, it does the right thing depending on what the input type of the data-structure is.
The point is Julia has great support for high-level constructs like map
as long as you’re using multi-dimensional arrays. What is understandable. I stumbled upon a case of having written my code on top of arrays, and now I wanted to reuse the same code processing an irregular array-of-arrays, but I need to reuse the same map
functors, etc. And perhaps the biggest question would be how to generalize whatever we implement, such as your function, to an arbitrary number of levels… Maybe just assume the data is a tree with a certain height?
Here’s one proposal for how to do this. The structure merely holds a reference to the array-of-arrays, and the type of the tree leaves. map
runs recursively until the eltype
matches the leaf type, and then we apply the function. This should support any tree height, trees with different depths for each branch, and also trees where the leaves are actually Arrays
that should not be traversed.
import Base.map
struct ArrayTree{L,T,N}
a::Array{T,N}
end
function Base.map(f, at::ArrayTree{L,T,N}) where {L,T,N}
ArrayTree{L,T,N}(arraytreemap_(f, at.a, L))
end
function arraytreemap_(f, array, L)
if eltype(array) == L
map(f, array)
else
map(array) do x
arraytreemap_(f, x, L)
end
end
end
myarray = [rand(3), rand(1), rand(2)]
@show myarray
aa = ArrayTree{Float64, Array{Float64, 1}, 1}(myarray)
ma = map(aa) do x x^2+100 end
@show ma.a
# myarray = [[0.27519533394619367, 0.5878159031539618, 0.3940555494438669], [0.07240825460216072], [0.43409689370057825, 0.4750862276744192]]
# ma.a = [[100.07573247182576, 100.3455275360007, 100.15527977604751], [100.00524295533454], [100.18844011312049, 100.22570692372591]]
I created a package GitHub - nlw0/ArrayTrees.jl: ArrayTrees for Julia
map and broadcast work totally fine with arrays and collections containing any types, including inner arrays. They always do the same thing: apply a function to each element of the given collection, whatever those elements are. This is crucial for generic programming!
Recursing while/till some arbitrary type is encountered can also be useful in some cases, but the specific recursion criteria should be clearly spelled or even defined explicitly by the user.
Do you want to recurse into all abstractarrays? If yes, this would go into SVector
s and FieldArray
s, that are often treated as atomic objects. If not, then even view
s of simple arrays won’t be recursed into.
Do you also want to traverse other collections like dicts, sets, and iterators? And so on.
Your example problem
can be directly solved with Accessors.jl
: @modify(x -> x^2 + 100, myarray |> Elements() |> Elements())
. Accessors
also have the Recursive
selector with arbitrary conditions.
I’m extremely interested in generic programming, that’s the whole point. map
does what it does on Arrays
or AbstractArrays
and there’s absolutely no question of changing this. This would be ludicrous. The question is if you have eg a Vector{Vector{X}}
and you would like to treat it as a different data structure, with its own semantics, where we can map
over it, what can you do? Accessors
seems pretty interesting, but it still does not go all the way from taking the list-of-lists and making it work with map
. There’s AbstractTrees.jl as well but I’m not sure it offers a solution. The point of the solution I suggested is not whether we are capable of getting the task done, which is trivial, the point is to make it generic and working with map
. We seem to achieve that. I’d love to hear ideas of other ways we might potentially take a Vector{Vector{X}}
and run some code with map
over it, not merely achieve the same effect with eg a double for loop or whatever, but using map
specifically, which I assume could then be adapted to work with broadcasting as well.