Get the transpose of Vector{Vector{Float64}}

I am coming from numpy, trying tosee if Julia can be faster. Right now I am dealing with a function that generates for me what I thought had to be a matrix, but no, it’s a vector of vector. Fine. How do I make it’s transpose?

I generate the Vector{Vector} from list comprehension
tab = [ map( (x) -> d[1] + (d[2]-d[1] )*x , rand(Uniform(0,1),Np) ) for d in domains ]

For Np=2 and a 5D domain domains=( (100,1000), (-1,1), (-1,1), (-1,1), (-1,1) ) I get a

size(tab)=(5,)

I need to make it become a (5,2).

I have tried to use cat, but it’s not the right tool. transpose gives me a 1x5 and I cannot manipulate it to a 2D Matrix (2,5).

Any suggestion? Must be a common roadbloack for someone coming from numpy…

stack

1 Like

I figured reduce(hcat, tab) also works. What’s the difference with stack?

Here’s how you can both simplify this and tweak it to return a matrix:

tab_matrix = stack(rand(Uniform(d[1], d[2]), Np) for d in domains)

In this case, the difference is that, to use hcat, you either need to wrap it in reduce as you did, or splat the argument as follows: hcat(tab...). Meanwhile, stack takes tab as a single argument: stack(tab).

In general, stack takes a single iterator of arrays and stacks them along a new dimension—for example, it can take a vector of matrices and stack them to form a 3D array. hcat takes multiple vectors/matrices and concatenates them along the second (column) dimension. For example, hcat([1 2; 3 4], [5; 6]) == [1 2 5; 3 4 6].

More simply, you can use tab = [rand(Uniform(d[1], d[2])) for d in domains, _ in 1:Np]. Or you can generate it transposed from the beginning by switching the dimensions tab_transposed = [rand(Uniform(d[1], d[2])) for _ in 1:Np, d in domains].

5 Likes

Yes, it seems unnecessary and inefficient to first generate a vector of vectors, only to stack or cat it. Just directly use a 2D array (matrix) comprehension that avoids all intermediates.

2 Likes

To quantify that:

julia> using BenchmarkTools, Distributions

julia> domains = ((100, 1000), (-1, 1), (-1, 1), (-1, 1), (-1, 1));

julia> Np = 2;

julia> @btime stack(rand(Uniform(d...), $Np) for d in $domains);
  101.438 ns (6 allocations: 544 bytes)

julia> @btime [rand(Uniform(d...)) for _ in 1:$Np, d in $domains];
  27.010 ns (1 allocation: 144 bytes)

So the 2D array comprehension is ~4x faster in addition to avoiding the 5 intermediate allocations

2 Likes

perhaps this was the form to use to directly obtain a matrix

[d[1] + (d[2]-d[1] )*x for x in  rand(2), d in domains ]

or this

[d[1] + (d[2]-d[1] )*x for  d in domains, x in  rand(2) ]

Or better yet, re-think your whole approach to avoid creating the matrix in the first place. The biggest habit to unlearn, coming from numpy, is the urge to “vectorize” everything into a sequence of steps that generate a bunch of intermediate arrays from relatively tiny calculations (e.g. adding two vectors). Better to avoid calculating the intermediate arrays, and instead fuse the calculations into a single pass. Loops are fast in Julia. See why vectorized code is not as fast as it could be.

2 Likes

The following is also a readable option. Define:

@inline interp(λ, rng) = λ*rng[1]+(1-λ)*rng[2]

Then use it (have domains and Np as in OP):

julia> interp.(rand(size(domains, 1), Np), domains)
5×2 Matrix{Float64}:
 148.587      155.122
   0.118355     0.928856
  -0.0124529   -0.694386
  -0.638283    -0.260207
   0.694686    -0.929882

julia> @btime interp.(rand(size($domains, 1), $Np), $domains);
  63.291 ns (4 allocations: 320 bytes)

Perhaps interp is defined somewhere else as it is quite basic.

1 Like