Transpose vector of vectors

Is there a shorter way to write this?

n = 1000; m = 5000
xss = [repeat([j], n) for j in 1:m]


yss = [[] for _ in xss[1]]
for xs in xss
    for (x, ys) in zip(xs, yss)
        push!(ys, x)
    end
end

Maybe there is an Iterator in a package?

It is similar to

zip(xss...)

but zip(...) makes tuples instead of vectors so the compiler crashes for large n,m.

1 Like
yss = map(i -> getindex.(xss, i), 1:length(xss[1]))
3 Likes

Using TensorCast’s intuitive syntax:

using TensorCast        
@cast yss[j][i] := xss[i][j]
4 Likes
using SplitApplyCombine

yss = invert(xss)

invert is exactly the function you are looking for:

Return a new nested container by reversing the order of the nested container a, for example turning a dictionary of arrays into an array of dictionaries, such that a[i][j] === invert(a)[j][i].

Examples
≡≡≡≡≡≡≡≡≡≡

julia> invert([[1,2,3],[4,5,6]])
3-element Array{Array{Int64,1},1}:
[1, 4]
[2, 5]
[3, 6]

julia> invert((a = [1, 2, 3], b = [2.0, 4.0, 6.0]))
3-element Array{NamedTuple{(:a, :b),Tuple{Int64,Float64}},1}:
(a = 1, b = 2.0)
(a = 2, b = 4.0)
(a = 3, b = 6.0)

2 Likes

Is that standard terminology? This looks for like a generalized transpose than invert.

2 Likes

This may be the function I’m looking for.

https://github.com/JuliaLang/julia/issues/13942

Hard to say where to look for “standard terminology” in this case. Don’t know about the motivation for naming that function invert (@andyferris?), I’m just a happy user (: Is there is any truly obvious name anyway?

I’d’ve called it insideout :slight_smile:

Or simply a comprehension?

yss = [[xss[j][i] for j=1:m] for i=1:n];

@aplavin, would you mind benchmarking invert() for OP’s data?

I see it about ~400x slower than TensorCast.
(Win11, Julia 1.7, SplitApplyCombine v1.2.0, TensorCast v0.4.3)

versioninfo()

julia> versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core™ i7-1065G7 CPU @ 1.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, icelake-client)
Environment:
JULIA_PKG_USE_CLI_GIT = true
JULIA_STACKFRAME_FUNCTION_COLOR = blue
JULIA_WARN_COLOR = cyan
JULIA_EDITOR = code.cmd -g
JULIA_NUM_THREADS = 8

using TensorCast
function trans_tc(xss)
    @cast yss[j][i] := xss[i][j];
    return yss
end

using SplitApplyCombine
trans_sap(xss) = invert(xss)

n = 1000; m = 5000
xss = [repeat([j], n) for j in 1:m];

trans_tc(xss) == trans_sap(xss)   # true

using BenchmarkTools
@btime trans_tc($xss);   # 33.3 ÎĽs (5 allocations: 39 KiB)
@btime trans_sap($xss);  # 13.5 ms (2001 allocations: 38.2 MiB)
2 Likes

Yes, I see similar results. That’s the difference between lazy (@cast) and eager (invert) operations.

I guess SplitApplyCombine could get a lazy invertview function if someone is interested and makes a PR. For now, there are splitdimsview + combinedimsview:

julia> splitdimsview(combinedimsview(xss), 1) == invert(xss)
true

julia> @btime splitdimsview(combinedimsview($xss), 1)
1.695 ns (0 allocations: 0 bytes)

Another 20000 times faster than @cast (:

4 Likes

@aplavin, I can’t help but say: this is awesome! Thank you.