How to hcat a vector of vectors to produce a matrix with a specified eltype?

jishnub · July 25, 2022, 7:34am

What I want is basically an equivalent of

julia> A = [1:4, 5:8]
2-element Vector{UnitRange{Int64}}:
 1:4
 5:8

julia> Base.typed_hcat(Float64, A...)
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

except without splatting and calling internal functions. I can do this as

julia> convert(Array{Float64}, reduce(hcat, A))
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

however, this allocates an intermediate array, which I’m trying to avoid. Ideally I’ll want to use public Base functions to evaluate the result.

rocco_sprmnt21 · July 25, 2022, 9:34am

try to see if combinedimsview function of SplitApplyCombine.jl is right for you

jishnub · July 25, 2022, 10:49am

Does this allow passing the eltype as a parameter? From the docstrings it seems that the eltype of the output is automatically inferred

One way is to use the internal type directly

julia> CombineDimsArray{Float64, 2, 1, typeof(A)}(A, (2,))
4×2 CombineDimsArray{Float64, 2, 1, Vector{UnitRange{Int64}}}:
 1  5
 2  6
 3  7
 4  8

but this defeats the purpose.

GunnarFarneback · July 25, 2022, 10:56am

Not a solution but for completeness Float64[A...;;] is public Base only and short, but splatting and not particularly efficient.

stevengj · July 25, 2022, 12:04pm

You can use the init argument to reduce:

julia> reduce(hcat, A, init=Array{Float64}(undef,4,0))
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

(Unfortunately reduce with hcat is currently unoptimized in the case with an init argument, but this could be easily fixed if someone wanted to work on a PR.)

rocco_sprmnt21 · July 25, 2022, 9:16pm

some functions compared

julia> using BenchmarkTools

julia> using SplitApplyCombine

julia> function recombdims(A) 
           od=last(eachindex(A))
           id=last(eachindex(first(A)))
           out=similar(A,Float64, id,od)
           for j in eachindex(A)
               for i in eachindex(first(A))
                   out[i,j]=A[j][i]
               end
           end
           out
       end
recombdims (generic function with 1 method)

julia> function hcat_typ1(A)
           od=first(size(A))
           id=first(size(first(A)))
           out=similar(A[1],Float64,id,od)
           for i in eachindex(A)
               copyto!(out,id*(i-1)+1, A[i], 1)
           end
           out
       end
hcat_typ1 (generic function with 1 method)

julia> A=[rand(1:10, 7) for _ in  1:10^4];

julia> @btime hcat_typ1($A)
  89.900 μs (2 allocations: 546.92 KiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0      

julia> @btime recombdims($A)
  175.400 μs (2 allocations: 546.92 KiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0

julia> @btime convert(Array{Float64}, reduce(hcat, $A))
  151.900 μs (4 allocations: 1.07 MiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0

julia> @btime convert(Array{Float64},combinedims($A))
  208.600 μs (4 allocations: 1.07 MiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0

julia> @btime reduce(hcat, $A, init=Array{Float64}(undef,7,0))
  351.056 ms (19709 allocations: 2.61 GiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0

DNF · July 25, 2022, 10:14pm

See lastindex and firstindex functions.

rocco_sprmnt21 · July 26, 2022, 7:08am

using these functions instead of my crude combination improves performance.
I also propose a change to hcat_typ1 which seems to do slightly better

julia> function recombdims(A) 
                   A1=first(A)
                  od=lastindex(A)
                  id=lastindex(A1)
                  out=similar(A,Float64, id,od)
                  for j in eachindex(A)
                      for i in eachindex(A1)
                          out[i,j]=A[j][i]
                      end
                  end
                  out
              end
recombdims (generic function with 1 method)

julia> @btime recombdims($A);
  74.500 μs (2 allocations: 546.92 KiB)


julia> function hcat_typ2(A)
                  od=first(size(A))
                  id=first(size(first(A)))
                  out=similar(A[1],Float64,id,od)
                  pos=1
                  for i in eachindex(A)
                      copyto!(out,pos, A[i], 1, id)
                      pos+=id
                  end
                  out
       end
hcat_typ2 (generic function with 1 method)

julia> @btime hcat_typ2($A)
  69.900 μs (2 allocations: 546.92 KiB)

I can’t understand why the reduce with init = is so much worse than the one without it.
I could not take the test (it did not finish after several minutes and I had to restart the session) with A such that size (A) = (10 ^ 7,)

julia> @btime convert(Array{Float64}, reduce(hcat, $A))
  153.200 μs (4 allocations: 1.07 MiB)

julia> @btime reduce(hcat, $A, init=Array{Float64}(undef,7,0))
  431.882 ms (19709 allocations: 2.61 GiB)

aplavin · July 26, 2022, 8:13am

This should work, and is both intuitive and efficient:

using SplitApplyCombine

# get a materialized array without intermediate allocations:
map(Float64, combinedimsview(A))

# get a view of the original array - basically free, you pay when accessing it:
mapview(Float64, combinedimsview(A))

jishnub · July 26, 2022, 8:50am

julia> map(Float64, combinedimsview(A))
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

julia> map(Real, combinedimsview(A))
4×2 Matrix{Int64}:
 1  5
 2  6
 3  7
 4  8

julia> convert(Array{Real}, combinedimsview(A))
4×2 Matrix{Real}:
 1  5
 2  6
 3  7
 4  8

julia> convert(Array{Float64}, combinedimsview(A))
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

convert is better than map here, as it preserves the exact eltype even if it is an abstract type.

Topic		Replies	Views
Reduce + hcat is type unstable General Usage	4	221	April 11, 2024
Concatenation arguments New to Julia	1	265	June 10, 2021
How to convert Vector of Vectors to Matrix General Usage question , arrays	39	35363	December 8, 2023
How to efficiently re-arrange a vector of vectors into a matrix? New to Julia array	5	5612	January 6, 2022
Trying to understand broadcast better General Usage question	9	539	November 2, 2021

How to hcat a vector of vectors to produce a matrix with a specified eltype?

Related topics