# Help with increasing performance in badly coded functions

Hi guys, I need a little help with optimising the performance of a few functions. If you need a more detailed access to my project here’s a notebook (LinearAlgebraicRepresentation.jl/largrid.ipynb at main · adelloste/LinearAlgebraicRepresentation.jl · GitHub) that contains all the functions with inputs ready to be tested with a @btime macro. Here they are.

cart_opt executes the cartesian product of a certain number of arguments:

``````function cart_opt(args)::Vector{Tuple}
return sort(vcat(collect(Iterators.product(args...))...))
end
``````

This is the output:

``````cart_opt([[1,2],["a"],[3,4]])
4-element Vector{Tuple}:
(1, "a", 3)
(1, "a", 4)
(2, "a", 3)
(2, "a", 4)
``````

This one (index2addr_opt) I’m not quite sure what it does but if you have any suggestions I’ll happily take them.

``````@inline function index2addr_opt( shape::Vector{Int64} )
n = length(shape)
theShape = push!(shape[2:end],1)
weights = [prod(theShape[k:end]) for k in (1:n)]

return dot(collect(multiIndex), weights) + 1
end

end
``````

Here’s the output:

``````index2addr_opt([2,7])([1,4])

12
``````

My project is about solid modeling, larCellProd_opt is a bit of a mess and it executes the product between two larCells

``````@inline function larCellProd_opt(cellLists::Vector{Lar.Cells})::Lar.Cells
shapes = [length(item) for item in cellLists]
subscripts = cart_opt([collect(range(0, length=shape)) for shape in shapes])
indices = [collect(tuple) .+ 1 for tuple in subscripts]

jointCells = [cart_opt([cells[k] for (k,cells) in zip(index,cellLists)]) for index in indices]
convertIt = index2addr_opt([ (length(cellLists[k][1]) > 1) ? shape .+ 1 : shape for (k,shape) in enumerate(shapes) ])
[map(convertIt, map(collect,jointCells[j])) for j in 1:length(jointCells)]
end
``````

Here’s the output:

``````c1 = [[0,1],[1,2],[2,3]]
c0 = [[0],[1],[2]]
@btime larCellProd_opt([c1, c1, c0])

27-element Vector{Vector{Int64}}:
[1, 4, 13, 16]
[2, 5, 14, 17]
[3, 6, 15, 18]
[4, 7, 16, 19]
[5, 8, 17, 20]
[6, 9, 18, 21]
[7, 10, 19, 22]
[8, 11, 20, 23]
[9, 12, 21, 24]
[13, 16, 25, 28]
[14, 17, 26, 29]
[15, 18, 27, 30]
[16, 19, 28, 31]
⋮
[19, 22, 31, 34]
[20, 23, 32, 35]
[21, 24, 33, 36]
[25, 28, 37, 40]
[26, 29, 38, 41]
[27, 30, 39, 42]
[28, 31, 40, 43]
[29, 32, 41, 44]
[30, 33, 42, 45]
[31, 34, 43, 46]
[32, 35, 44, 47]
[33, 36, 45, 48]
``````

The last function I’d need help with is createCells, I’ve been told that push! isn’t really good for performance.

``````@inline function createCells(V,cells1,W,cells2,vertices)
cells = []
for c1 in cells1
for c2 in cells2
cell = []
@inbounds for vc in c1
@inbounds @simd for wc in c2
push!(cell, vertices[[V[:,vc];W[:,wc]]] )
end
end
push!(cells, cell)
end
end
return cells
end
``````

in general, it isn’t worth parallelizing bad code. adding threads will give you a maximum 100x speedup (on a fairly large server), and it’s usually harder than making bad code 100x faster single threaded. step 1 is removing unnecessary allocations.

Splatting is bad and `collect` is bad. But first of all get rid of the output type annotation. You are forcing your vector to have an abstract eltype.

Annotating functions with output types is normally not a good idea.

2 Likes

A good next step is to get rid of every single instance of `collect` in all your functions. See if anything breaks. Check for ways to fix it without using `collect`.

Also, `cell = []` is a red flag for performance. Try to use typed containers, preferably by letting the compiler figure out the types. But a plain `[]` forces the eltype to be `Any`, which hurts performance.

BTW, what is `Lar.Cells`? It looks like the code is using regular Arrays of Arrays.

2 Likes

Woud you have any suggestion on how to improve the code?

Lar.cell is a type coming from the LinearAlgebraicRepresentation.jl pkg. It’s used to define cells in solid modeling. Btw you were very helpful thx a lot, I will try.