Using hcat with splatting on arrays of different lengths kills Julia

janfrancu · October 1, 2018, 12:44pm

In my current project I have this workflow of creating a lot of 1D arrays, hcat-ing into a matrix and then transforming into a sparse representation, as shown bellow (without the sparse call).

function getData(n::Int)
	X = Vector{Vector{Float64}}(0)
	for i = 1:n
		a = rand(2053)
		a[a .> 0.05] .= 0.0
		push!(X, a)
	end
	hcat(X...)
end

i = 1
while true
    A = getData(rand(1:20000))
end

This creates an issue when the number of arrays n varies a lot, because Julia compiles a “new” hcat(X...) for each n, that is encountered (my intuition). Though the memory consumption slows down as there eventually are fewer methods to compile, it can easily exceed even quite modest barriers. For example the code above fails after a while when running with a 4GB memory limit and in a multiprocessing environment this behavior is even more devastating.

I have been able to solve this problem by writing my own version of hcat, however I feel like it deserves more general solution as it affects other functions, where such argument splatting is used.

nalimilan · October 1, 2018, 8:11pm

Use reduce(hcat, X) instead, which calls an efficient specialized method. In general one should avoid splatting collections whose number of elements can vary a lot, since as you noted it triggers a lot of compilation.

janfrancu · October 2, 2018, 7:48am

Thanks for the answer. I was worried about the performance of such workaround, however it seems, there is a striking difference between 0.6 and 0.7 reduce function. In the former version reduce did not scale really well, as it had big memory overhead, however with the later the reduce function seams to be even faster. I have posted some benchmarks in the other topic, where this issue came up - Problem with cat() - #5 by Tomas_Pevny.

Topic		Replies	Views
Reduce + hcat is type unstable General Usage	4	221	April 11, 2024
`cat` allocates too much memory Performance question	11	1331	November 19, 2020
Concatenation arguments New to Julia	1	266	June 10, 2021
Slow `reduce(vcat, itr)` Performance cat	26	1381	September 13, 2022
"Cat" very slow for relativelly small tasks Performance	2	443	February 9, 2022

Using hcat with splatting on arrays of different lengths kills Julia

Related topics