jzr
December 19, 2021, 1:29am
#1
Is there a shorter way to write this?
n = 1000; m = 5000
xss = [repeat([j], n) for j in 1:m]
yss = [[] for _ in xss[1]]
for xs in xss
for (x, ys) in zip(xs, yss)
push!(ys, x)
end
end
Maybe there is an Iterator in a package?
It is similar to
zip(xss...)
but zip(...)
makes tuples instead of vectors so the compiler crashes for large n,m
.
yss = map(i -> getindex.(xss, i), 1:length(xss[1]))
3 Likes
Using TensorCast’s intuitive syntax:
using TensorCast
@cast yss[j][i] := xss[i][j]
4 Likes
aplavin
December 19, 2021, 7:20am
#4
using SplitApplyCombine
yss = invert(xss)
invert
is exactly the function you are looking for:
Return a new nested container by reversing the order of the nested container a, for example turning a dictionary of arrays into an array of dictionaries, such that a[i][j] === invert(a)[j][i].
Examples
≡≡≡≡≡≡≡≡≡≡
julia> invert([[1,2,3],[4,5,6]])
3-element Array{Array{Int64,1},1}:
[1, 4]
[2, 5]
[3, 6]
julia> invert((a = [1, 2, 3], b = [2.0, 4.0, 6.0]))
3-element Array{NamedTuple{(:a, :b),Tuple{Int64,Float64}},1}:
(a = 1, b = 2.0)
(a = 2, b = 4.0)
(a = 3, b = 6.0)
1 Like
DNF
December 19, 2021, 8:31am
#5
Is that standard terminology? This looks for like a generalized transpose than invert.
2 Likes
jzr
December 19, 2021, 8:52am
#6
This may be the function I’m looking for.
opened 11:40PM - 10 Nov 15 UTC
Hi there,
apologies if this has already been addressed somewhere, but is there … a reason that there is no `unzip()` function in Base?
Ideally this would be a function that would take a `Vector{Tuple{ ... }}` and return a `Tuple{Vector, ..., Vector}` for output. E.g.
```
julia> v = [(1,"a",:meow), (2,"b",:woof), (3,"c",:doh!)]; unzip(v)
([1,2,3],ASCIIString["a","b","c"],[:meow,:woof,:doh!])
```
A naive implementation might be something like
``` julia
function unzip(input::Vector)
n = length(input)
types = map(typeof, first(input))
output = map(T->Vector{T}(n), types)
for i = 1:n
@inbounds for (j, x) in enumerate(input[i])
(output[j])[i] = x
end
end
return (output...)
end
```
aplavin
December 19, 2021, 9:17am
#7
Hard to say where to look for “standard terminology” in this case. Don’t know about the motivation for naming that function invert
(@andyferris ?), I’m just a happy user (: Is there is any truly obvious name anyway?
cjdoris
December 19, 2021, 9:37am
#9
I’d’ve called it insideout
Or simply a comprehension?
yss = [[xss[j][i] for j=1:m] for i=1:n];
@aplavin , would you mind benchmarking invert()
for OP’s data?
I see it about ~400x slower than TensorCast.
(Win11, Julia 1.7, SplitApplyCombine v1.2.0, TensorCast v0.4.3)
versioninfo()
julia> versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core™ i7-1065G7 CPU @ 1.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, icelake-client)
Environment:
JULIA_PKG_USE_CLI_GIT = true
JULIA_STACKFRAME_FUNCTION_COLOR = blue
JULIA_WARN_COLOR = cyan
JULIA_EDITOR = code.cmd -g
JULIA_NUM_THREADS = 8
using TensorCast
function trans_tc(xss)
@cast yss[j][i] := xss[i][j];
return yss
end
using SplitApplyCombine
trans_sap(xss) = invert(xss)
n = 1000; m = 5000
xss = [repeat([j], n) for j in 1:m];
trans_tc(xss) == trans_sap(xss) # true
using BenchmarkTools
@btime trans_tc($xss); # 33.3 ÎĽs (5 allocations: 39 KiB)
@btime trans_sap($xss); # 13.5 ms (2001 allocations: 38.2 MiB)
1 Like
aplavin
December 19, 2021, 12:53pm
#12
Yes, I see similar results. That’s the difference between lazy (@cast
) and eager (invert
) operations.
I guess SplitApplyCombine
could get a lazy invertview
function if someone is interested and makes a PR. For now, there are splitdimsview
+ combinedimsview
:
julia> splitdimsview(combinedimsview(xss), 1) == invert(xss)
true
julia> @btime splitdimsview(combinedimsview($xss), 1)
1.695 ns (0 allocations: 0 bytes)
Another 20000 times faster than @cast
(:
2 Likes
@aplavin , I can’t help but say: this is awesome! Thank you.