Nested map

I’d like to apply a function to each element of a nested array. To illustrate

julia> arr = [[1,2,3],[4,5,6]]
2-element Array{Array{Int64,1},1}:
 [1, 2, 3]
 [4, 5, 6]

julia> map(arr) do ls
           map(ls) do x
               x+1
           end
       end
2-element Array{Array{Int64,1},1}:
 [2, 3, 4]
 [5, 6, 7]

Now say arr is nested 3 levels, then things start to get quite unwieldy. I thought of doing something like map(k for i in arr for j in i for k in j) but that flattens the list. Unfortunately, the syntax map(k for i in arr, j in i, k in j) doesn’t work like it does for nested arrays of fixed size. Any suggestion on how to implement an operation like this in a nicer way? Loops solve it somewhat nicely, but you are forced to iteratively push into some array.

Here’s one solution:

nest_map(f, v::Array) = map(f, v);
nest_map(f, v::Array{<:Array}) = map(vi -> nest_map(f, vi), v);
julia> arr = [[[1,2], [3, 4, 5]], [[6,],[7,8,9,10]]];

julia> nest_map(x -> x + 1, arr)
2-element Vector{Vector{Vector{Int64}}}:
 [[2, 3], [4, 5, 6]]
 [[7], [8, 9, 10, 11]]

However, I should note that having arrays of arrays is a bit of a warning sign in julia. If those nested arrays are of the same size, chances are you would be better served by using a multidimensional array instead. E.g.

[i + (j - 1)*2 + (k-1)*4 for i in 1:2, j in 1:2, k in 1:2]
2×2×2 Array{Int64, 3}:
[:, :, 1] =
 1  3
 2  4

[:, :, 2] =
 5  7
 6  8

should be preferred over

[[[i + (j - 1)*2 + (k-1)*4 for i in 1:2] for j in 1:2] for k in 1:2]
2-element Vector{Vector{Vector{Int64}}}:
 [[1, 2], [3, 4]]
 [[5, 6], [7, 8]]
5 Likes

The nested arrays are not of the same size so I’m stuck with this kind of solution, unfortunately. I guess it can’t be much nicer than nestedd map blocks.

For triple nested array you can use the following

arr_arr_arr .|> x -> x .|> x -> x .|> f 

As for recursive, it’s not very good to use isa, especially with nonconrete types. It’s better to use multiple dispatch (not sure, maybe something like that already exists in Base)

apply_function(f, x) = f(x)
apply_function(f, x::AbstractArray) = apply_function.(f, x)
2 Likes

Say you have

arr = [[[1,2], [3, 4, 5]], [[6,],[7,8,9,10]]]
myf(x) = x+0.1

The following generalizes well to n levels of nesting (just replace 3 with the appropriate value):

julia> ∘(fill(f->x->map(f, x), 3)...)(myf)(arr)
2-element Vector{Vector{Vector{Float64}}}:
 [[1.1, 2.1], [3.1, 4.1, 5.1]]
 [[6.1], [7.1, 8.1, 9.1, 10.1]]

Hard to read so I would not use it :wink:

It’s interesting that with a curried map we could just write ∘(fill(map, 3)...)(myf)(arr)!

This makes me realize that the question in this thread is a good example why a curried map would be useful. For the special case of 3-level nesting you could just write:

map(map(map(myf)))(arr)

or

(myf |> map |> map |> map)(arr)

or

arr |> (map∘map∘map)(myf)

Proof of concept:

arr = [[[1,2], [3, 4, 5]], [[6,],[7,8,9,10]]]
myf(x) = x+0.1

# Curried version of map
mapper(f) = x->map(f, x)

arr |> (mapper∘mapper∘mapper)(myf)

# Output:
2-element Vector{Vector{Vector{Float64}}}:
 [[1.1, 2.1], [3.1, 4.1, 5.1]]
 [[6.1], [7.1, 8.1, 9.1, 10.1]]
1 Like

I like how you use composition of map functions. When I think about what I would expect Julia to look like, it would use some sort of broadcasting. Indeed if the nested array was some k-dimensional array instead of nested k levels, then a simple f.(arr) would suffice. Is there a way to specify the dimension along which is broadcast? Or to only broadcast at the leaves?

An alternative is to attempt to recreate an imperative loop.

[[foo.(y) for y in x] for x in arr]

But turning this into a loop requires a lot of, say, push! calls to iteratively build an array.

julia> vec = Vector{Vector{Vector{Int}}}()
Array{Array{Int64,1},1}[]

julia> for x in arr
           vecx = Vector{Vector{Int}}()
           for y in x
               vecy = Int[]
               for z in y
                   push!(vecy, foo(z))
               end
               push!(vecx, vecy)
           end
           push!(vec, vecx)
       end

julia> vec
2-element Array{Array{Array{Int64,1},1},1}:
 [[2, 3], [4, 5, 6]]
 [[7, 9]]

Loops don’t happen to be expressions as well right (like Scala)? I think in Scala you can write

res = for x in arr
    for y in x
        for z in y
            foo(z)
        end
    end
end

What exactly is the problem with the nested_map from @mason? The only case I see in which it falls short is if you want to maps specifically k levels and your nested structure has more than k levels (in this case nested_map has no notion of a threshold/max_depth in which it should stop, it will keep recursively applying until the elements are not AbstractVectors anymore).

1 Like

Ah, I didn’t realize that aspect of nest_map, where it uses dispatch to create a base case on the type. Yes, that would address it nicely.

1 Like

Okay, so I was thinking today about how it’s kinda annoying you can’t do recursion inside a do block, but then I realized you totally can!

julia> map([[[1,2], [3, 4, 5]], [[6,],[7,8,9,10]]]) do x
           if x isa Array
               map(var"#self#", x)
           else
               x + 1
           end
       end
2-element Vector{Vector{Vector{Int64}}}:
 [[2, 3], [4, 5, 6]]
 [[7], [8, 9, 10, 11]]

I wouldn’t really recommend doing this, but it does work (assuming you’re on version 1.3 or newer where the var string macro was introduced). Just thought it was neat.

5 Likes

Please, please don’t do it in the production code (but I love the snippet :joy:).

1 Like

In case anyone is wondering why Takafumi thinks this is such a bad idea: there are various language constructs which might create closures, so writing code that depends on which function in a closure it’s written in is a bad idea. For instance, for current versions of julia, we have that this:

julia> map([[[1,2], [3, 4, 5]], [[6,],[7,8,9,10]]]) do x
           if x isa Array
               map(var"#self#", x)
           else
               x + 1
           end
       end

is equivalent to this:

julia> map([[[1,2], [3, 4, 5]], [[6,],[7,8,9,10]]]) do x
           if x isa Array
               [var"#self#"(y) for y in x]
           else
               x + 1
           end
       end

but it won’t work on version 1.7 onwards due to lowering: remove incorrect comprehension eta reduction by vtjnash · Pull Request #39139 · JuliaLang/julia · GitHub!

There are various other contexts where using self would be a bad idea too, like inside @async or Threads.@spawn and so on.

5 Likes

What if we do something like this:

map([[[1,2], [3, 4, 5]], [[6,],[7,8,9,10]]]) do x
    self = var"#self#"
    if x isa Array
        [self(y) for y in x]
    else
        x + 1
    end
end

Is this also risky? It doesn’t look different from other uses of recursion…

Yes,that will work, but the point is that it’s easy to get wrong, and makes the meaning of code highly dependant on what’s enclosing it. For instance,

f(x)

should really always be equivalent to

@sync @async f(x)

but here, if we wrote

map([[[1,2], [3, 4, 5]], [[6,],[7,8,9,10]]]) do x
    @sync @async begin
        self = var"#self#"
        if x isa Array
            [self(y) for y in x]
        else
            x + 1
        end
    end
end

it’ll blow up in your face because. It’s always possible to use self safely and carefully, but the point is just that using it can subtly violate a lot of naive expectations. It’s bad style.

2 Likes