How to reshape a 2d array into a 1d array of its rows

This seems simple, but I can’t see how to do it. I have

julia> X = [[1 2]; [3 4]; [5 6]]
3×2 Array{Int64,2}:
 1  2
 3  4
 5  6

julia> 

I want to turn this (reshape it) into

julia> X1 = [[1 2], [3 4], [5 6]]
3-element Array{Array{Int64,2},1}:
 [1 2]
 [3 4]
 [5 6]

zip should work, but I can’t test it at the moment.

X = rand(5,2)
X1 = collect(zip(X[:,1],X[:,2]))
1 Like

On Julia 1.1+ you can use:

julia> collect(eachrow(X))
3-element Array{SubArray{Int64,1,Array{Int64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true},1}:
 [1, 2]
 [3, 4]
 [5, 6]

Note two things, though — it’s a vector of 1-dimensional vectors, and each element is a view into the original array. You could transpose each element (with map or broadcast) if you really need them to be row-vectors.

The zip is different in that it returns an array of tuples and is limited to a hard-coded number of columns.

3 Likes

Thanks to both. What I’m tying to do is find the first row of a large 2d matrix of Float64s that is approximately equal to a given vector. I wanted to turn the matrix into a 1d array of rows so that I could use findfirst().

I ended up writing a simple function that loops through the rows of the matrix and tries isapprox() on each one.

1 Like

Can’t you use the matrix-vector product to see which row is most similar in terms of cosine distance?

2 Likes

I have to repeat this operation for a large number of rows. Also, I know that the row I’m trying to match is there, the only reason for using isapprox() is numerical uncertainties.

You can then probably just use eachrow without collecting it into an array.

1 Like

You don’t need to create a vector of rows just to use findfirst. Instead, you can tell findfirst to call a function for each row to see if it’s equal to your target, without ever actually making an expensive copy of every single row:

julia> x = rand(5, 2)
5×2 Array{Float64,2}:
 0.975035  0.229764
 0.421957  0.897612
 0.157232  0.768174
 0.671349  0.149556
 0.807792  0.913012

julia> target = x[3, :]
2-element Array{Float64,1}:
 0.1572322024665258
 0.7681739192100947

julia> findfirst(axes(x, 1)) do i
         @view(x[i, :]) == target
       end
3

It’s also worth noting that, because of the way Julia arrays are arranged (column-major), you may find this task to be easier and more efficient if you can transpose your data and iterate over the columns instead. If you can do that, then you can use one of my favorite Julia tricks, which is reinterpreting a matrix as a vector of SVectors from StaticArrays.jl. Because you’re treating the columns as the elements, it’s easy (and computationally very cheap) to reinterpret the matrix as a collection of fixed-size vectors:

julia> x = rand(2, 5)
2×5 Array{Float64,2}:
 0.777874  0.975625  0.538278  0.731676  0.024341
 0.907149  0.619085  0.735197  0.528057  0.379817

julia> target = x[:, 3]
2-element Array{Float64,1}:
 0.5382778725280228
 0.7351971037657361

julia> using StaticArrays

julia> columns = reinterpret(SVector{2, Float64}, x)
1×5 reinterpret(SArray{Tuple{2},Float64,1,2}, ::Array{Float64,2}):
 [0.777874, 0.907149]  [0.975625, 0.619085]  [0.538278, 0.735197]  [0.731676, 0.528057]  [0.024341, 0.379817]

julia> I = findfirst(isequal(target), columns)
CartesianIndex(1, 3)

julia> columns[I]
2-element SArray{Tuple{2},Float64,1,2}:
 0.5382778725280228
 0.7351971037657361
1 Like

I made some experiments to benchmark @rdeits solution. The results depend a bit on if you can prepare your x data with the good shape (already transposed or already a Vector of StaticVectors.

I was a bit surprised that the vector comparison is not automatically vectorized.

Here are the timings I obtain with different functions (with differently prepared data):

For 1000000 rows of size 4

3.749 ms (500002 allocations: 22.89 MiB) #find_vector (no transpose)
1.987 ms (3 allocations: 64 bytes) ) #find_vector_sa (transpose and reinterpret last @rdeits solution) 
1.419 ms (2 allocations: 48 bytes) #find_vector_sa_tr (already transpose and S.A. reinterpret) 
1.292 ms (1 allocation: 32 bytes) #find_vector_sa_tr2 (explicit SA conversion of target)
1.276 ms (0 allocations: 0 bytes)  #find_vector_sa_trs (pre-oonstruct of Vector of SA)
867.237 μs (0 allocations: 0 bytes)#find_vector_sa_trs2 (S.A. + simd loop)

I guess that the allocations can make a difference if the number of rows is small:

For 30 rows of sizes 4:

  154.493 ns (17 allocations: 864 bytes)
  86.302 ns (3 allocations: 64 bytes)
  60.622 ns (2 allocations: 48 bytes)
  52.229 ns (1 allocation: 32 bytes)
  42.577 ns (0 allocations: 0 bytes)
  21.645 ns (0 allocations: 0 bytes)

The MWE

Summary
using StaticArrays
using BenchmarkTools
using LinearAlgebra

const L=4

function allbench_findrow(N)
   x= rand(N,L)
   xt=collect(transpose(x))
   xts=collect(reinterpret(SVector{L, Float64},xt))
   target = x[div(N,2),:]

   @show target
   @show find_vector(x,target,N)
   @show find_vector_sa(x,target,N)
   @show find_vector_sa_tr(xt,target,N)
   @show find_vector_sa_tr2(xt,target,N)
   @show find_vector_sa_trs(xts,target,N)
   @show find_vector_sa_trs2(xts,target,N)

   @assert target==find_vector(x,target,N)
   @assert target==find_vector_sa(x,target,N)
   @assert target==find_vector_sa_tr(xt,target,N)
   @assert target==find_vector_sa_tr2(xt,target,N)
   @assert target==find_vector_sa_trs(xts,target,N)
   @assert target==find_vector_sa_trs2(xts,target,N)

   @btime find_vector($x,$target,$N)
   @btime find_vector_sa($x,$target,$N)
   @btime find_vector_sa_tr($xt,$target,$N)
   @btime find_vector_sa_tr2($xt,$target,$N)
   @btime find_vector_sa_trs($xts,$target,$N)
   @btime find_vector_sa_trs2($xts,$target,$N)


end


function find_vector(x,target,N)
   I=findfirst(axes(x, 1)) do i
         @view(x[i, :]) == target
       end
   return x[I,:]
end

function find_vector_sa(xt,target,N)
   x=transpose(xt)
   columns = reinterpret(SVector{L, Float64}, x)
   I = findfirst(isequal(target), columns)
   columns[I]
end

function find_vector_sa_tr(x,target,N)
   @inbounds columns = reinterpret(SVector{L, Float64}, x)
   I = findfirst(isequal(target), columns)
   columns[I]
end
function find_vector_sa_tr2(x,target,N)
   @inbounds columns = reinterpret(SVector{L, Float64}, x)
   starget=SVector{L, Float64}(target)
   I = findfirst(isequal(starget), columns)
   columns[I]
end
function find_vector_sa_trs(x,target,N)
   columns = x
   starget=SVector{L, Float64}(target)
   I = findfirst(isequal(starget), columns)
   columns[I]
end


function find_vector_sa_trs2(x,target,N)
   starget=SVector{L, Float64}(target)
   @inbounds @simd for i in 1:N
      n=x[1,i]-starget
      dot(n,n)==0.0 && return x[1,i]
   end
   nothing
end
   
allbench_findrow(1000000)
1 Like