Map a vector to multiple vectors

Is there a simple way to map a vector to multiple vectors?

vin = [1,2,3,4]
v1, v2, v3 = map(v) do x
  return x, 2x, 3x

The above doesn’t work, but that shows what I mean. Usually, you’d get a vector of tuples from the above, but is there an idiomatic way to get arrays directly instead?

I know that I could do something like:

vin = [1,2,3,4]
v1 = Vector{Int}(undef, length(vin))
v2 = Vector{Int}(undef, length(vin))
v3 = Vector{Int}(undef, length(vin))
for i in eachindex(vin)
  v1[i] = x
  v2[i] = 2x
  v3[i] = 3x

so I’m asking if there is an existing more concise way.

Note: I would prefer a solution that is at least as performant as that. For example, doing 3 separate map/broadcasts which seems to be about 25% slower for large arrays (which is what I’m working with).

Performance test for 3 approaches
div2(x) = x / 2
times2(x) = 2*x
times3(x) = 3*x

function test1(v)
    return map(div2, v), map(times2, v), map(times3, v)

function test2(v)
    return div2.(v), times2.(v), times3.(v)

function test3(v)
    len = length(v)
    v1 = Vector{Float64}(undef, len)
    v2 = Vector{Int}(undef, len)
    v3 = Vector{Int}(undef, len)
    @inbounds for i in eachindex(v)
        x = v[i]
        v1[i] = div2(x)
        v2[i] = times2(x)
        v3[i] = times3(x)
    return v1, v2, v3

using BenchmarkTools
function time3(n=100000)
    v = fill(17, n)
    @btime test1($v)
    @btime test2($v)
    @btime test3($v)

I haven’t used it, but Unzip.jl might be of use to you!

1 Like

Here’s an old snippet I’ve used before for repacking an array of tuples into a tuple of arrays

function repack(x::AbstractArray{T}) where T<:Tuple
	# turn Array{Tuple{T1,T2...}} to Tuple{Array{T1},Array{T2},...}
	fT = ntuple(i->fieldtype(T,i),fieldcount(T)) # fieldtypes(T) would be the obvious choice but is type-unstable
	arrs = similar.(Ref(x),fT)
	for i in eachindex(x)
		@inbounds setindex!.(arrs,x[i],Ref(i))
	return arrs
function repack(x::AbstractArray{NamedTuple{N,T}}) where {N,T<:Tuple}
	# turn Array{NamedTuple{N,Tuple{T1,T2...}}} to NamedTuple{N,Tuple{Array{T1},Array{T2},...}}
	fT = ntuple(i->fieldtype(T,i),fieldcount(T)) # fieldtypes(T) would be the obvious choice but is type-unstable
	arrs = similar.(Ref(x),fT)
	for i in eachindex(x)
		@inbounds setindex!.(arrs,Tuple(x[i]),Ref(i))
	return NamedTuple{N}(arrs)

Unfortunately, it’s only doing the processing after the map is complete so does impose a little extra overhead, which may be relevant when the map is quick and performance is important. Perhaps there is a cleaner solution using StructArrays.jl?

You could also look into LazyArrays.jl or use something like

v1 = Broadcast.Broadcasted(tup -> tup[1], (array_of_tuples,))

to make lazy array-like wrappers over the single output of map, depending on your precise use case.

  • If you already did
mv = map(v) do x
  return x, 2x, 3x

and want separate (abstract)vectors:

using FlexiMaps
v1 = mapview(1, mv)
v2 = mapview(2, mv)

This doesn’t copy or allocate anything new.

  • If you just have v and want these three vectors, a more direct way is:
using StructArrays

mv = map(StructArray(_=vin)) do x
    x._, 2*x._, 3*x._

# this is free - doesn't allocate anything:
v1 = mv.:1

or simply use mv.:1 in place of v1. Unlike the first scenario with mapview, these are actual Vectors not just AbstractVectors,

If you seek efficiency and your original vector contains some structs, not just Ints, then try making it a StructArray from the beginning.

1 Like

Sorry for the too basic question, but I do not understand why not just:

v1, v2, v3 = vin, 2vin, 3vin

That was just a trivial example to illustrate. My real use case is more complex.

Thank you for the suggestions. I put the note in about performance because I didn’t want to do post repacking. I expect doing the map/broadcast separately would be faster than repacking anyway. And for the view suggestions, that would work for some cases, but not others. For example, putting the data into another datastructure where it would reuse the memory if it was a vector, but it will allocate otherwise, in which case the view doesn’t help.

So, I’m gathering that the answer to my question is no, it doesn’t exist. Which is fine, I just wanted to know. I realized there is a challenge regarding type stability because the vector eltype isn’t known until you run the function at least once. But I think the implementation of map has a way to solve that, so one would just have to do it the same way.

Could you map the transformations instead:

F = [x->x, x->2x, x->3x]
vin = [1,2,3,4]
v1, v2, v3 = map(F) do f

I’m pretty sure your concerns about performance should be solved by the StructArrays approach.

Hard to know if that applies to your problem, but this is quite concise:

julia> map.((div2,times2,times3), Ref(vin))
([0.5, 1.0, 1.5, 2.0], [2, 4, 6, 8], [3, 6, 9, 12])

It doesn’t seem to perform worse than the other alternatives.

1 Like

I assume that (in the actual use case) the multiple outputs of this map share calculations, which is why they’re being computed together rather than with multiple separate map calls. Although if they’re complicated enough that this matters, the repacking (or lazy views) afterward are probably of minimal cost. That’s been my experience when encountering this problem in the wild.

Note that the above suggestion using Broadcast.Broadcasted created a lazy wrapper over the array-of-tuples, rather than materializing a result. It’s basically free. The LazyArrays package is a way to do this that doesn’t rely on semi-internal functionality, if you want a more robust solution. The only reason these wouldn’t work is if you need a certain memory layout to pass these to some external library.

But I think StructArrays addresses your case directly.

1 Like

Yes, these two points are important, and I should have made that more clear in my original post. That said, the replies have been very educational, and I appreciate them. Especially because, StructArrays does exactly what I’m looking for. The StructArray example in a previous post didn’t compile for me because map’ing a StructArray returned a Vector and not a StructArray. However, there is collect_structarray. Although I don’t know if there is a single method, it’s trivial to create one, so here it is:

function maparray(f, v)
    sa = StructArrays.collect_structarray(f(x) for x in v)
    return StructArrays.components(sa)

function test6(v)
    return maparray(v) do x
        (div2(x), times2(x), times3(x))

It is a little slower than the “ideal” one (explicitly allocate and loop), but as the length of the vector gets larger, it approaches similar performance. The allocations and return types are the same.

Thank you all!

That’s strange… I just ran the code

and it did return a StructArray as expected.

1 Like

and it did return a StructArray as expected.

Sorry, it seems I have StructArray-0.5.1 because some other package is blocking it from upgrading. I ran it in a separate env with version 0.6.15 and it runs successfully. Thanks again!