Transpose vector of vectors

Is there a shorter way to write this?

n = 1000; m = 5000
xss = [repeat([j], n) for j in 1:m]


yss = [[] for _ in xss[1]]
for xs in xss
    for (x, ys) in zip(xs, yss)
        push!(ys, x)
    end
end

Maybe there is an Iterator in a package?

It is similar to

zip(xss...)

but zip(...) makes tuples instead of vectors so the compiler crashes for large n,m.

yss = map(i -> getindex.(xss, i), 1:length(xss[1]))
3 Likes

Using TensorCast’s intuitive syntax:

using TensorCast        
@cast yss[j][i] := xss[i][j]
4 Likes
using SplitApplyCombine

yss = invert(xss)

invert is exactly the function you are looking for:

Return a new nested container by reversing the order of the nested container a, for example turning a dictionary of arrays into an array of dictionaries, such that a[i][j] === invert(a)[j][i].

Examples
≡≡≡≡≡≡≡≡≡≡

julia> invert([[1,2,3],[4,5,6]])
3-element Array{Array{Int64,1},1}:
[1, 4]
[2, 5]
[3, 6]

julia> invert((a = [1, 2, 3], b = [2.0, 4.0, 6.0]))
3-element Array{NamedTuple{(:a, :b),Tuple{Int64,Float64}},1}:
(a = 1, b = 2.0)
(a = 2, b = 4.0)
(a = 3, b = 6.0)

1 Like

Is that standard terminology? This looks for like a generalized transpose than invert.

2 Likes

This may be the function I’m looking for.

Hard to say where to look for “standard terminology” in this case. Don’t know about the motivation for naming that function invert (@andyferris?), I’m just a happy user (: Is there is any truly obvious name anyway?

I’d’ve called it insideout :slight_smile:

Or simply a comprehension?

yss = [[xss[j][i] for j=1:m] for i=1:n];

@aplavin, would you mind benchmarking invert() for OP’s data?

I see it about ~400x slower than TensorCast.
(Win11, Julia 1.7, SplitApplyCombine v1.2.0, TensorCast v0.4.3)

versioninfo()

julia> versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core™ i7-1065G7 CPU @ 1.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, icelake-client)
Environment:
JULIA_PKG_USE_CLI_GIT = true
JULIA_STACKFRAME_FUNCTION_COLOR = blue
JULIA_WARN_COLOR = cyan
JULIA_EDITOR = code.cmd -g
JULIA_NUM_THREADS = 8

using TensorCast
function trans_tc(xss)
    @cast yss[j][i] := xss[i][j];
    return yss
end

using SplitApplyCombine
trans_sap(xss) = invert(xss)

n = 1000; m = 5000
xss = [repeat([j], n) for j in 1:m];

trans_tc(xss) == trans_sap(xss)   # true

using BenchmarkTools
@btime trans_tc($xss);   # 33.3 μs (5 allocations: 39 KiB)
@btime trans_sap($xss);  # 13.5 ms (2001 allocations: 38.2 MiB)
1 Like

Yes, I see similar results. That’s the difference between lazy (@cast) and eager (invert) operations.

I guess SplitApplyCombine could get a lazy invertview function if someone is interested and makes a PR. For now, there are splitdimsview + combinedimsview:

julia> splitdimsview(combinedimsview(xss), 1) == invert(xss)
true

julia> @btime splitdimsview(combinedimsview($xss), 1)
1.695 ns (0 allocations: 0 bytes)

Another 20000 times faster than @cast (:

2 Likes

@aplavin, I can’t help but say: this is awesome! Thank you.