triangles = Matrix{Int}(undef, 3, 0)
for i in 1:ntriangles
a, b, c = triangle = ∂₂₀(i)
nkeep = tokeep[a] + tokeep[b] + tokeep[c]
if nkeep > 0
push!(counts, nkeep)
triangles = hcat(triangles, triangle)
push!(connec, connect((a, b, c)))
end
end

And this function is horribly slow (I have a similar function in R and it is faster). I believe this is due to this piece of code, because there’s almost nothing other than that in the function. I suspected this is due to the repetitive call triangles = hcat(triangles, triangle). So I replaced this piece of code with:

xx = Vector{Vector{Int}}(undef, 0)
for i in 1:ntriangles
a, b, c = triangle = ∂₂₀(i)
nkeep = tokeep[a] + tokeep[b] + tokeep[c]
if nkeep > 0
push!(counts, nkeep)
push!(xx, triangle)
push!(connec, connect((a, b, c)))
end
end
triangles = hcat(xx...)

But this is still slow.

Is it known that hcat(xx...) is slow? Is there a faster alternative?

Otherwise, how can one identify which part of a function is slow?

To improve speed further, you could maybe post a minimal working example so that people can run the code. (Not visible here, but just a guess: Make sure that you don’t use globals by accident.)

There might also be a way to write the code differently, like

But why this code is so slow? It is slower than R. I tried to get rid of the square roots: fn(v) = v[1] * v[1] + v[2] * v[2] + v[3] * v[3] but this is still slow.

Maybe values is not being inferred correctly? Did you try to put the type of the element in front of the [ of the generator? Like values = Int[fn(vert) for vert in verts] (note: I do not know if your array is of Ints, just use the element type there).

I tried to isolate the under investigation part of your code, to make the results clearer.
As you can see your approach allocates 10^4 times more and is 600 to 2*10^3 slower than the other two in comparison.

julia> ntriangles=[[rand(-100:100),rand(-100:100),rand(-100:100)] for _ in 1:10^5]
100000-element Vector{Vector{Int64}}:
[-14, -33, 29]
[90, 39, 16]
⋮
[63, -86, -31]
[-67, 84, -76]
[-5, -4, -98]
julia> @time begin
triangles = Matrix{Int}(undef, 3, 0)
for t in ntriangles
triangles = hcat(triangles, t)
end
end
41.366821 seconds (398.81 k allocations: 111.771 GiB, 23.27% gc time)
julia> @time begin
triangles = Matrix{Int}(undef, 3, 10^5)
for i in eachindex(ntriangles)
triangles[:,i] = ntriangles[i]
end
end
0.070287 seconds (498.47 k allocations: 11.421 MiB)
julia> @time begin
triangles = Matrix{Int}(undef, 3, 10^5)
for i in eachindex(ntriangles)
copyto!(triangles, 3*(i-1)+1, ntriangles[i], 1)
end
end
0.020577 seconds (598.13 k allocations: 12.941 MiB)

Still wrong! The slowness is elsewhere. And it is hard to identify where. I will ask to the developers of Meshes.jl, this is the package I’m currently contrbuting to and this is some code for this package.

My new guess is that the culprit is the conversion of the topology:

topo = convert(HalfEdgeTopology, topology(tmesh))

@juliohm What do you think? Is this conversion slow? However that topology is not that big (< 300000 elements). Shame on us if we’re slower than R

There’s no such conversion in the R code, that would explain the difference of speed.

Will crible my code with some println to find the culprit.

Not sure how you do your benchmarking, but inserting artificial IO operations when benchmarking (e.g. with println) will almost always falsify your benchmark.

Unless you already do, I would recommend using ProfileView or BenchmarkTools to pinpoint the problem.

I know. But here the slowness is clearly apparent with this rough way of benchmarking. In French we say “il n’y a pas photo !”. That means the difference is very striking and there’s no need of further investigations.

I need to learn some tools for profiling, I don’t know ProfileView yet. Thanks!

I really you should shift your entire code to using tuples or SVectors, since the length of 3 is structurally a part the data. It can have many advantages, both in terms of bug safety and performance.

That being said, the above function should be very fast no matter what, though it would be faster with tuples.

@_stla the HalfEdgeTopology constructor sorts the vertices by default to guarantee coherent orientation. This behavior values correctness over performance, but you can disable sorting if you know a priori that the vertices are all ordered CCW or CW. Check the option sort: