# Is there a faster alternative to `hcat(xx...)`?

Hello,

I have this piece of code in a function:

``````  triangles = Matrix{Int}(undef, 3, 0)
for i in 1:ntriangles
a, b, c = triangle = ∂₂₀(i)
nkeep = tokeep[a] + tokeep[b] + tokeep[c]
if nkeep > 0
push!(counts, nkeep)
triangles = hcat(triangles, triangle)
push!(connec, connect((a, b, c)))
end
end
``````

And this function is horribly slow (I have a similar function in R and it is faster). I believe this is due to this piece of code, because there’s almost nothing other than that in the function. I suspected this is due to the repetitive call `triangles = hcat(triangles, triangle)`. So I replaced this piece of code with:

``````  xx = Vector{Vector{Int}}(undef, 0)
for i in 1:ntriangles
a, b, c = triangle = ∂₂₀(i)
nkeep = tokeep[a] + tokeep[b] + tokeep[c]
if nkeep > 0
push!(counts, nkeep)
push!(xx, triangle)
push!(connec, connect((a, b, c)))
end
end
triangles = hcat(xx...)
``````

But this is still slow.

Is it known that `hcat(xx...)` is slow? Is there a faster alternative?

Otherwise, how can one identify which part of a function is slow?

You could use `reduce(hcat,xx)'` (see How to convert Vector of Vectors to Matrix - #6 by rafael.guerra ), or in Julia 1.9 you can use `stack`.

To improve speed further, you could maybe post a minimal working example so that people can run the code. (Not visible here, but just a guess: Make sure that you don’t use globals by accident.)

There might also be a way to write the code differently, like

``````function filter_triangles(triangles, tokeep)
counts = map(t -> tokeep[t] + tokeep[t] + tokeep[t], triangles)
keep = counts .> 0
new_triangles = triangles[keep]
return (counts = counts[keep],
triangles = new_triangles,
connec = connect.(new_triangles))
end
``````

(This uses more memory, but depending on your setting, it might even be faster.)

EDIT: As @fatteneder pointed out, I removed my wrong comment that `Int` is an abstract type…

The `stack` function will do this efficiently, see https://github.com/JuliaLang/julia/pull/43334. I think it will be available in Julia v1.9.

You nay want to represent each triangle, not as a `Vector`, but as a tuple of three values.

Minor correction: `Int` is not abstract, but `Integer` is:

``````julia> isabstracttype(Int)
false

julia> isabstracttype(Integer)
true
``````
3 Likes

Thanks to all of you. Everything you proposed was instructive.

I’ve done numerous attempts to conclude that the slowness is elsewhere!!!

I have

``````values = [fn(vert) for vert in verts]
``````

where `verts` is a large vector of vectors of length 3 and `fn` is `LinearAlgebra.norm`. And this is the slow part of the function!

But why this code is so slow? It is slower than R. I tried to get rid of the square roots: `fn(v) = v * v + v * v + v * v` but this is still slow.

Maybe `values` is not being inferred correctly? Did you try to put the type of the element in front of the `[` of the generator? Like `values = Int[fn(vert) for vert in verts]` (note: I do not know if your array is of `Int`s, just use the element type there).

I tried to isolate the under investigation part of your code, to make the results clearer.
As you can see your approach allocates 10^4 times more and is 600 to 2*10^3 slower than the other two in comparison.

``````julia> ntriangles=[[rand(-100:100),rand(-100:100),rand(-100:100)] for _ in 1:10^5]
100000-element Vector{Vector{Int64}}:
[-14, -33, 29]
[90, 39, 16]

⋮

[63, -86, -31]
[-67, 84, -76]
[-5, -4, -98]

julia> @time begin
triangles = Matrix{Int}(undef, 3, 0)
for t in ntriangles
triangles = hcat(triangles, t)
end
end
41.366821 seconds (398.81 k allocations: 111.771 GiB, 23.27% gc time)

julia> @time begin
triangles = Matrix{Int}(undef, 3, 10^5)
for i in eachindex(ntriangles)
triangles[:,i] = ntriangles[i]
end
end
0.070287 seconds (498.47 k allocations: 11.421 MiB)

julia> @time begin
triangles = Matrix{Int}(undef, 3, 10^5)
for i in eachindex(ntriangles)
copyto!(triangles, 3*(i-1)+1, ntriangles[i], 1)
end
end
0.020577 seconds (598.13 k allocations: 12.941 MiB)

``````

anhoter faster way to do it, if you dont know the number of triangles in adavance

``````@time begin
triangles = Vector{Int}()
for t in ntriangles
push!(triangles, t...)
end
reshape(triangles,3,:)
end
``````

Thanks!

Still wrong! The slowness is elsewhere. And it is hard to identify where. I will ask to the developers of Meshes.jl, this is the package I’m currently contrbuting to and this is some code for this package.

My new guess is that the culprit is the conversion of the topology:

``````topo = convert(HalfEdgeTopology, topology(tmesh))
``````

@juliohm What do you think? Is this conversion slow? However that topology is not that big (< 300000 elements). Shame on us if we’re slower than R There’s no such conversion in the R code, that would explain the difference of speed.

Will crible my code with some `println` to find the culprit.

Thank you @rocco_sprmnt21 I will incorporate your suggestions in the code!

@juliohm bingo!

``````  topo = convert(HalfEdgeTopology, topology(tmesh)) # this is slow
∂₂₀ = Boundary{2,0}(topo) # this is slow as well
``````

Let’s continue this discussion on Github or Zulip!

Not sure how you do your benchmarking, but inserting artificial IO operations when benchmarking (e.g. with `println`) will almost always falsify your benchmark.

Unless you already do, I would recommend using `ProfileView` or `BenchmarkTools` to pinpoint the problem.

[quote

I know. But here the slowness is clearly apparent with this rough way of benchmarking. In French we say “il n’y a pas photo !”. That means the difference is very striking and there’s no need of further investigations.

I need to learn some tools for profiling, I don’t know `ProfileView` yet. Thanks!

I really you should shift your entire code to using tuples or SVectors, since the length of 3 is structurally a part the data. It can have many advantages, both in terms of bug safety and performance.

That being said, the above function should be very fast no matter what, though it would be faster with tuples.

@_stla the HalfEdgeTopology constructor sorts the vertices by default to guarantee coherent orientation. This behavior values correctness over performance, but you can disable sorting if you know a priori that the vertices are all ordered CCW or CW. Check the option `sort`:

https://juliageometry.github.io/Meshes.jl/stable/meshes.html#Meshes.HalfEdgeTopology

Now regarding the Boundary{2,0}(topo) being slow, I don’t know what may be happening. It just wraps the topology object and doesn’t do anything else.