Problem with cat()

judober · October 1, 2018, 6:05pm

Hello,
I’m relatively new to julia, so I’m not sure if the observed behavior is intended or a bug.
I want to concatenate a three-dimensional array as in this simple example:

julia> cat(ones(3,1,1000)..., dims=2)
1×3000 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

But when the array is bigger, this throws a StackOverflowError:

julia> cat(ones(3,1,10000)..., dims=2)
ERROR: StackOverflowError:

I don’t think that it’s a memory issue because I have no problem to generate such an array directly:

julia> ones(1,30000)
1×30000 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

I can reproduce this behavior on two Windows 10 systems. Here it the versioninfo() of one of them:

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "C:\Users\Arbeit\AppData\Local\atom\app-1.31.1\atom.exe" -a
  JULIA_NUM_THREADS = 2

So, I’m curious, is the behavior to be expected?

kristoffer.carlsson · October 1, 2018, 6:15pm

cat(ones(3,1,10000)..., dims=2) calls cat with 10000 arguments which is a bit much.

DNF · October 1, 2018, 6:19pm

It’s even 30001(!)

Yes, one should be careful with splatting, it’s definitely not the right approach here. It’s a bit unclear what you really need to do, @judober, can’t you just write ones(1, 30000) directly? Or maybe reshape is what you’re looking for.

judober · October 1, 2018, 6:46pm

All right, thank you two. I already solved my problem using reshape (my array is more complex than just ones), so your suggestion was right. I was just wondering if cat() works as expected.

Tomas_Pevny · October 1, 2018, 7:11pm

I would like to continue on this, as I use hcat(x...), where x is and array of arrays (or some other objects) quite extensively. I prefer this over reduce(hcat,x), because the former is way faster.

Besides the problem judober has identified, we have just identified another one, which is if hcat(x...) is called many times with x containing different number of element, it slowly consumes all the memory of Julia (because it keeps all versions of arguments) and eventually, it fails.

My question is, what would be the ideal way implement function that takes arbitrary number of arrays and concatenate them at once. Using reduce for this is just inefficient. Should we write our function, something like
hhcat(Vector{T}) where {T}?

Thanks for answers.

DNF · October 1, 2018, 7:27pm

Are you certain about that? A find reduce(hcat, x) to be faster. Have you tried timing it like this:

using BenchmarkTools
@btime hcat($x...)
@btime reduce(hcat, $x)

Tamas_Papp · October 2, 2018, 7:41am

I think that implementing version of hcat/vcat which

takes a vector of arrays,
calculates the final size and type,
does the copying

would be worthwhile.

Whether this could be a method of hcat etc is an API design question. IMO a different function name would be best, but perhaps someone can come up with a signature that would fit in nicely with existing ones.

janfrancu · October 2, 2018, 7:48am

In julia 0.6.4 the difference is striking.

# 2000 arrays of length 2053
hcat - 21.037 ms (10 allocations: 31.37 MiB)
reduce - 15.698 s (18952 allocations: 15.34 GiB)

# 4000 arrays of length 2053
hcat - 37.280 ms (10 allocations: 62.74 MiB)
reduce - 32.509 s (37906 allocations: 30.75 GiB)

However using 0.7, reduce even takes the lead a little bit.

# 2000 arrays of length 2053
hcat - 13.012 ms (8 allocations: 31.37 MiB)
reduce - 10.009 ms (2 allocations: 31.33 MiB)

# 4000 arrays of length 2053
hcat - 24.859 ms (8 allocations: 62.74 MiB)
reduce - 19.291 ms (2 allocations: 62.65 MiB)

# 10000 arrays of length 2053
hcat - 134.433 ms (8 allocations: 156.86 MiB)
reduce - 125.283 ms (2 allocations: 156.63 MiB)

nalimilan · October 2, 2018, 7:50am

Yes, that’s due to this PR:
https://github.com/JuliaLang/julia/pull/27188

Tomas_Pevny · October 2, 2018, 12:08pm

This is what I have been looking for. Thanks a lot.

Topic		Replies	Views
"Cat" very slow for relativelly small tasks Performance	2	444	February 9, 2022
`cat` allocates too much memory Performance question	11	1335	November 19, 2020
Fastest way to concatenate many arrays along existing axis? General Usage linearalgebra , arrays	5	428	June 19, 2024
Initialize array to concatenate in a for loop New to Julia question , arrays	9	675	September 25, 2022
`vcat([], [1 2])` # OK. `vcat([], [1 "2"])` # Not OK General Usage question , syntax , cat	18	411	April 21, 2024

Problem with cat()

Related topics