Is there a faster alternative to `hcat(xx...)`?

_stla · April 30, 2023, 5:46am

Hello,

I have this piece of code in a function:

  triangles = Matrix{Int}(undef, 3, 0)
  for i in 1:ntriangles
    a, b, c = triangle = ∂₂₀(i)
    nkeep = tokeep[a] + tokeep[b] + tokeep[c]
    if nkeep > 0
      push!(counts, nkeep)
      triangles = hcat(triangles, triangle)
      push!(connec, connect((a, b, c)))
    end
  end

And this function is horribly slow (I have a similar function in R and it is faster). I believe this is due to this piece of code, because there’s almost nothing other than that in the function. I suspected this is due to the repetitive call triangles = hcat(triangles, triangle). So I replaced this piece of code with:

  xx = Vector{Vector{Int}}(undef, 0)
  for i in 1:ntriangles
    a, b, c = triangle = ∂₂₀(i)
    nkeep = tokeep[a] + tokeep[b] + tokeep[c]
    if nkeep > 0
      push!(counts, nkeep)
      push!(xx, triangle)
      push!(connec, connect((a, b, c)))
    end
  end
  triangles = hcat(xx...)

But this is still slow.

Is it known that hcat(xx...) is slow? Is there a faster alternative?

Otherwise, how can one identify which part of a function is slow?

SteffenPL · April 30, 2023, 6:04am

You could use reduce(hcat,xx)' (see How to convert Vector of Vectors to Matrix - #6 by rafael.guerra ), or in Julia 1.9 you can use stack.

To improve speed further, you could maybe post a minimal working example so that people can run the code. (Not visible here, but just a guess: Make sure that you don’t use globals by accident.)

There might also be a way to write the code differently, like

function filter_triangles(triangles, tokeep)
    counts = map(t -> tokeep[t[1]] + tokeep[t[2]] + tokeep[t[3]], triangles)
    keep = counts .> 0
    new_triangles = triangles[keep]
    return (counts = counts[keep], 
            triangles = new_triangles, 
            connec = connect.(new_triangles))  
end

(This uses more memory, but depending on your setting, it might even be faster.)

EDIT: As @fatteneder pointed out, I removed my wrong comment that Int is an abstract type…

e3c6 · April 30, 2023, 6:26am

The stack function will do this efficiently, see https://github.com/JuliaLang/julia/pull/43334. I think it will be available in Julia v1.9.

DNF · April 30, 2023, 6:33am

You nay want to represent each triangle, not as a Vector, but as a tuple of three values.

fatteneder · April 30, 2023, 6:37am

Minor correction: Int is not abstract, but Integer is:

julia> isabstracttype(Int)
false

julia> isabstracttype(Integer)
true

_stla · April 30, 2023, 7:03am

Thanks to all of you. Everything you proposed was instructive.

I’ve done numerous attempts to conclude that the slowness is elsewhere!!!

I have

values = [fn(vert) for vert in verts]

where verts is a large vector of vectors of length 3 and fn is LinearAlgebra.norm. And this is the slow part of the function!

_stla · April 30, 2023, 7:07am

But why this code is so slow? It is slower than R. I tried to get rid of the square roots: fn(v) = v[1] * v[1] + v[2] * v[2] + v[3] * v[3] but this is still slow.

Henrique_Becker · April 30, 2023, 7:12am

Maybe values is not being inferred correctly? Did you try to put the type of the element in front of the [ of the generator? Like values = Int[fn(vert) for vert in verts] (note: I do not know if your array is of Ints, just use the element type there).

rocco_sprmnt21 · April 30, 2023, 7:13am

I tried to isolate the under investigation part of your code, to make the results clearer.
As you can see your approach allocates 10^4 times more and is 600 to 2*10^3 slower than the other two in comparison.

julia> ntriangles=[[rand(-100:100),rand(-100:100),rand(-100:100)] for _ in 1:10^5]
100000-element Vector{Vector{Int64}}:
 [-14, -33, 29]
 [90, 39, 16]

 ⋮
 
 [63, -86, -31]
 [-67, 84, -76]
 [-5, -4, -98]

julia> @time begin
       triangles = Matrix{Int}(undef, 3, 0)
       for t in ntriangles
           triangles = hcat(triangles, t)
       end
       end
 41.366821 seconds (398.81 k allocations: 111.771 GiB, 23.27% gc time)

julia> @time begin
       triangles = Matrix{Int}(undef, 3, 10^5)
       for i in eachindex(ntriangles)
           triangles[:,i] = ntriangles[i]
       end
       end
  0.070287 seconds (498.47 k allocations: 11.421 MiB)

julia> @time begin
       triangles = Matrix{Int}(undef, 3, 10^5)
       for i in eachindex(ntriangles)
           copyto!(triangles, 3*(i-1)+1, ntriangles[i], 1)   
       end
       end
  0.020577 seconds (598.13 k allocations: 12.941 MiB)

rocco_sprmnt21 · April 30, 2023, 7:21am

anhoter faster way to do it, if you dont know the number of triangles in adavance

@time begin
    triangles = Vector{Int}()
    for t in ntriangles
        push!(triangles, t...)
    end
    reshape(triangles,3,:)
    end

_stla · April 30, 2023, 7:25am

Thanks!

Still wrong! The slowness is elsewhere. And it is hard to identify where. I will ask to the developers of Meshes.jl, this is the package I’m currently contrbuting to and this is some code for this package.

My new guess is that the culprit is the conversion of the topology:

topo = convert(HalfEdgeTopology, topology(tmesh))

@juliohm What do you think? Is this conversion slow? However that topology is not that big (< 300000 elements). Shame on us if we’re slower than R

There’s no such conversion in the R code, that would explain the difference of speed.

Will crible my code with some println to find the culprit.

_stla · April 30, 2023, 7:26am

Thank you @rocco_sprmnt21 I will incorporate your suggestions in the code!

_stla · April 30, 2023, 7:39am

@juliohm bingo!

  topo = convert(HalfEdgeTopology, topology(tmesh)) # this is slow
  ∂₂₀ = Boundary{2,0}(topo) # this is slow as well

Let’s continue this discussion on Github or Zulip!

fatteneder · April 30, 2023, 8:01am

Not sure how you do your benchmarking, but inserting artificial IO operations when benchmarking (e.g. with println) will almost always falsify your benchmark.

Unless you already do, I would recommend using ProfileView or BenchmarkTools to pinpoint the problem.

_stla · April 30, 2023, 8:05am

[quote

I know. But here the slowness is clearly apparent with this rough way of benchmarking. In French we say “il n’y a pas photo !”. That means the difference is very striking and there’s no need of further investigations.

I need to learn some tools for profiling, I don’t know ProfileView yet. Thanks!

DNF · April 30, 2023, 10:15am

I really you should shift your entire code to using tuples or SVectors, since the length of 3 is structurally a part the data. It can have many advantages, both in terms of bug safety and performance.

That being said, the above function should be very fast no matter what, though it would be faster with tuples.

juliohm · April 30, 2023, 12:01pm

@_stla the HalfEdgeTopology constructor sorts the vertices by default to guarantee coherent orientation. This behavior values correctness over performance, but you can disable sorting if you know a priori that the vertices are all ordered CCW or CW. Check the option sort:

https://juliageometry.github.io/Meshes.jl/stable/meshes.html#Meshes.HalfEdgeTopology

Now regarding the Boundary{2,0}(topo) being slow, I don’t know what may be happening. It just wraps the topology object and doesn’t do anything else.

Topic		Replies	Views
Using hcat with splatting on arrays of different lengths kills Julia General Usage compilation , memory , memory-allocation	2	1385	October 2, 2018
A blog post involving Julia Community benchmark	15	1416	October 23, 2020
Slow `reduce(vcat, itr)` Performance cat	26	1324	September 13, 2022
Performant Alternatives to mapslices General Usage question	4	489	October 18, 2022
`hcat` with generator Internals & Design	12	1484	February 26, 2017

Is there a faster alternative to `hcat(xx...)`?

Related topics