How to hcat a vector of vectors to produce a matrix with a specified eltype?

What I want is basically an equivalent of

julia> A = [1:4, 5:8]
2-element Vector{UnitRange{Int64}}:
 1:4
 5:8

julia> Base.typed_hcat(Float64, A...)
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

except without splatting and calling internal functions. I can do this as

julia> convert(Array{Float64}, reduce(hcat, A))
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

however, this allocates an intermediate array, which I’m trying to avoid. Ideally I’ll want to use public Base functions to evaluate the result.

try to see if combinedimsview function of SplitApplyCombine.jl is right for you

Does this allow passing the eltype as a parameter? From the docstrings it seems that the eltype of the output is automatically inferred

One way is to use the internal type directly

julia> CombineDimsArray{Float64, 2, 1, typeof(A)}(A, (2,))
4×2 CombineDimsArray{Float64, 2, 1, Vector{UnitRange{Int64}}}:
 1  5
 2  6
 3  7
 4  8

but this defeats the purpose.

Not a solution but for completeness Float64[A...;;] is public Base only and short, but splatting and not particularly efficient.

You can use the init argument to reduce:

julia> reduce(hcat, A, init=Array{Float64}(undef,4,0))
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

(Unfortunately reduce with hcat is currently unoptimized in the case with an init argument, but this could be easily fixed if someone wanted to work on a PR.)

1 Like

some functions compared

julia> using BenchmarkTools

julia> using SplitApplyCombine

julia> function recombdims(A) 
           od=last(eachindex(A))
           id=last(eachindex(first(A)))
           out=similar(A,Float64, id,od)
           for j in eachindex(A)
               for i in eachindex(first(A))
                   out[i,j]=A[j][i]
               end
           end
           out
       end
recombdims (generic function with 1 method)

julia> function hcat_typ1(A)
           od=first(size(A))
           id=first(size(first(A)))
           out=similar(A[1],Float64,id,od)
           for i in eachindex(A)
               copyto!(out,id*(i-1)+1, A[i], 1)
           end
           out
       end
hcat_typ1 (generic function with 1 method)

julia> A=[rand(1:10, 7) for _ in  1:10^4];

julia> @btime hcat_typ1($A)
  89.900 μs (2 allocations: 546.92 KiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0      

julia> @btime recombdims($A)
  175.400 μs (2 allocations: 546.92 KiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0

julia> @btime convert(Array{Float64}, reduce(hcat, $A))
  151.900 μs (4 allocations: 1.07 MiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0

julia> @btime convert(Array{Float64},combinedims($A))
  208.600 μs (4 allocations: 1.07 MiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0

julia> @btime reduce(hcat, $A, init=Array{Float64}(undef,7,0))
  351.056 ms (19709 allocations: 2.61 GiB)
7×10000 Matrix{Float64}:
  3.0  8.0   5.0  1.0  2.0   5.0   5.0  …   9.0   3.0   3.0  10.0  6.0  10.0
 10.0  1.0   4.0  9.0  7.0   9.0  10.0      5.0  10.0   3.0   9.0  3.0   5.0
  4.0  5.0   8.0  1.0  4.0   7.0   3.0      5.0   2.0   8.0   3.0  4.0   7.0
  2.0  4.0   6.0  6.0  8.0   8.0   9.0      2.0   5.0  10.0   3.0  7.0   4.0
  9.0  6.0   9.0  9.0  3.0   4.0   3.0      8.0   5.0   3.0   4.0  4.0   4.0
  8.0  2.0   6.0  8.0  2.0  10.0   2.0  …  10.0   9.0   4.0   7.0  8.0   2.0
  6.0  1.0  10.0  1.0  6.0   4.0   8.0      9.0  10.0   1.0   8.0  8.0   4.0

See lastindex and firstindex functions.

using these functions instead of my crude combination improves performance.
I also propose a change to hcat_typ1 which seems to do slightly better

julia> function recombdims(A) 
                   A1=first(A)
                  od=lastindex(A)
                  id=lastindex(A1)
                  out=similar(A,Float64, id,od)
                  for j in eachindex(A)
                      for i in eachindex(A1)
                          out[i,j]=A[j][i]
                      end
                  end
                  out
              end
recombdims (generic function with 1 method)

julia> @btime recombdims($A);
  74.500 μs (2 allocations: 546.92 KiB)


julia> function hcat_typ2(A)
                  od=first(size(A))
                  id=first(size(first(A)))
                  out=similar(A[1],Float64,id,od)
                  pos=1
                  for i in eachindex(A)
                      copyto!(out,pos, A[i], 1, id)
                      pos+=id
                  end
                  out
       end
hcat_typ2 (generic function with 1 method)

julia> @btime hcat_typ2($A)
  69.900 μs (2 allocations: 546.92 KiB)

I can’t understand why the reduce with init = is so much worse than the one without it.
I could not take the test (it did not finish after several minutes and I had to restart the session) with A such that size (A) = (10 ^ 7,)

julia> @btime convert(Array{Float64}, reduce(hcat, $A))
  153.200 μs (4 allocations: 1.07 MiB)

julia> @btime reduce(hcat, $A, init=Array{Float64}(undef,7,0))
  431.882 ms (19709 allocations: 2.61 GiB)

This should work, and is both intuitive and efficient:

using SplitApplyCombine

# get a materialized array without intermediate allocations:
map(Float64, combinedimsview(A))

# get a view of the original array - basically free, you pay when accessing it:
mapview(Float64, combinedimsview(A))
1 Like
julia> map(Float64, combinedimsview(A))
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

julia> map(Real, combinedimsview(A))
4×2 Matrix{Int64}:
 1  5
 2  6
 3  7
 4  8

julia> convert(Array{Real}, combinedimsview(A))
4×2 Matrix{Real}:
 1  5
 2  6
 3  7
 4  8

julia> convert(Array{Float64}, combinedimsview(A))
4×2 Matrix{Float64}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

convert is better than map here, as it preserves the exact eltype even if it is an abstract type.