Fastest way to concatenate many arrays along existing axis?

gdalle · June 19, 2024, 5:16am

I’m wondering if there is a fast way to concatenate a list a of many arrays without creating a new axis. I know about stack(a), but that function creates a new axis and I don’t want that. Its docstring says

The various cat functions also combine arrays. However, these all extend the arrays’ existing (possibly trivial) dimensions, rather than placing the arrays
along new dimensions. They also accept arrays as separate arguments, rather than a single collection.

So it seems the only solutions are hcat(a...) or reduce(hcat, a)? I think the second one is better because it avoids splatting really large collections, but it is probably still slower than stack.

EDIT: I just discovered that reduce(hcat, a) has a specialized method for hcat, so this is probably close to optimal? But does it work well for iterables a which are not vectors?

rveltz · June 19, 2024, 5:35am

maybe GitHub - SciML/RecursiveArrayTools.jl: Tools for easily handling objects like arrays of arrays and deeper nestings in scientific machine learning (SciML) and other applications. For example, VectorOfArray says:

VectorOfArray(u::AbstractVector)

A VectorOfArray is an array which has the underlying data structure Vector{AbstractArray{T}} (but, hopefully, concretely typed!). This wrapper over such data structures allows one to lazily act like it's a higher-dimensional vector, and easily convert it to different forms. The indexing structure is:

A.u[i] # Returns the ith array in the vector of arrays
A[j, i] # Returns the jth component in the ith array
A[j1, ..., jN, i] # Returns the (j1,...,jN) component of the ith array

mikmoore · June 19, 2024, 1:28pm

I’ve always hated reduce(hcat,...). It’s unusual to specialize higher-order functions on specific other functions except when the higher-order function has a more commonly used alias (like sum is implemented in terms of reduce(add_sum,...)).

It’s rather undiscoverable, since knowledge of reduce’s typical binary reduction would normally suggest that reduce(hcat,...) is an awful way to do this. Also, the specializations only cover

julia> methods(reduce)
# 6 methods for generic function "reduce" from Base:
 [1] reduce(::typeof(merge), items::Vector{<:Dict})
     @ dict.jl:747
 [2] reduce(::typeof(hcat), A::AbstractVector{<:AbstractVecOrMat})
     @ abstractarray.jl:1721
 [3] reduce(::typeof(vcat), A::AbstractVector{<:AbstractVecOrMat})
     @ abstractarray.jl:1718
...

which misses a lot of (less common) cases and also any use with mapreduce or a composed version of these functions. I would prefer a collection-oriented analog of cat (like max has maximum) that operates on a collection rather than vararg, although there hasn’t been a lot of enthusiasm for this in the past. At least stack covers a decent fraction of those cases now.

I suppose my suggestion (assuming the reduce specializations don’t cover your needs) might be stack into reshape. The bummer is that this is the sort of tomfoolery I thought I left behind with MATLAB. It’s probably performant, at least.

julia> reshape(stack([reshape(1:8,2,2,2) for _ in 1:3]; dims=2), 2, :, 2)
2×6×2 Array{Int64, 3}:
[:, :, 1] =
 1  1  1  3  3  3
 2  2  2  4  4  4

[:, :, 2] =
 5  5  5  7  7  7
 6  6  6  8  8  8

This would probably look nicer if you made an explicit flattendim function to compute the reshape for you:

"""
	flattendim(x::AbstractArray; dim::Int)

Use `reshape` on `x` to combine dimensions `dim` and `dim`+1.
"""
function flattendim(x::AbstractArray{<:Any,D}; dim::Int) where D
	1 <= dim <= D-1 || throw(ArgumentError(lazy"Cannot combine dimensions $(dim:dim+1) in $D-dimensional array"))
	newsize = ntuple(Val{D-1}()) do d
		d < dim ? size(x,d) : d == dim ? size(x,d)*size(x,d+1) : size(x,d+1)
	end
	return reshape(x, newsize)
end

gdalle · June 19, 2024, 1:56pm

I was hoping to avoid a dependency from the SciML world, and stick with stuff from Base if possible. It’s for DifferentiationInterface.jl, which I want to keep very lightweight.

Yeah that’s what I was thinking too, especially since stack is type-stable and reduce(hcat, ...) isn’t always.

mcabbott · June 19, 2024, 2:43pm

Very much agree about reduce(hcat, xs) and reduce(vcat, xs) being a bit too clever… hard to discover, and once you know they exist, easy to accidentally miss the magic method & get something much slower.

When we did stack there was some discussion of adding another function for other cases. Maybe we should still. My LazyStack.jl has prototypes of a concatenate and an eager flatten function, to explore. At present concatenate takes no options, so I’m not sure it’s exactly what you ask for… if that’s hcat(arrs...)?

julia> arrs = [reshape(1:8,2,2,2).+i/10 for i in 1:3];

julia> stack(arrs) |> size  # ==  cat(arrs...; dims=4)
(2, 2, 2, 3)

julia> stack(arrs; dims=1) |> size
(3, 2, 2, 2)

julia> using LazyStack

julia> concatenate(arrs) |> size  # == vcat(arrs...)
(6, 2, 2)

julia> flatten(arrs) |> size  # == collect(Iterators.flatten(arrs))
(24,)

julia> concatenate(arrs) == flattendim(stack(arrs; dims=2); dim=1)  # notice dims=2
true

julia> concatenate(permutedims(arrs)) |> size  #  == hcat(arrs...); could perhaps be spelled concatenate(arrs; dims=2)
(2, 6, 2)

mcabbott · June 19, 2024, 3:02pm

Maybe worth noting also that this path only works when the pieces all have the same size. The cat functions don’t need to assume that:

julia> vv = [[1,2], [3,4,5], [6]];

julia> reduce(vcat, vv) == vcat(vv...)
true

julia> reduce(vcat, vv) == concatenate(vv) == flatten(vv)  # using LazyStack
true

julia> vec(stack(vv))
ERROR: DimensionMismatch: stack expects uniform slices, got axes(x) == (1:3,) while first had (1:2,)

Topic		Replies	Views
How can I convert a matrix of vectors to a 3d array? New to Julia arrays	13	761	March 28, 2023
Problem with cat() General Usage	9	2670	October 2, 2018
Very best way to concatenate an array of arrays General Usage arrays	47	28345	July 24, 2021
Package for lazy hcat/vcat of a large number of vectors General Usage question	9	568	September 1, 2022
Slow `reduce(vcat, itr)` Performance cat	26	1398	September 13, 2022

Fastest way to concatenate many arrays along existing axis?

Related topics