Help with speeding up this code

I am new to this language, tried a few speed up tips and this is the best i got. Is there anything else that can be done to get a significant performance increase? Can you show me how?

``````using Plots, Printf, OffsetArrays

function loops(nn::Integer, kk::Integer)
if nn < 0 || kk <= 0
return nothing
end
if nn > 0 && nn < kk
return nothing
end
_loops(Float64(nn),Float64(kk))
end
function _loops(n::Float64, k::Float64)
if k == 1 || n == 0
return 1.0
end
cnt = 0.0
for j = 1:floor(n/k)
recur = 0.0
m = min(k-1, n-j*k)
if m == 0
m = 1.0
end
for t = 1:m
recur += _loops(n-j*k, t)
end
cnt += recur*(fac[Int(n)]/fac[Int(n-j*k)])/(fac[Int(j)]*(k^j))
end
return cnt
end
n = 100
const fac = OffsetArray([1.0; cumprod(1:Float64(n))], 0:n)
@time begin
y = Vector{Float64}(undef, n)
for i = 1:n
y[i] = loops(n,i)/fac[n]
end
end
y1 = sum(y[1:Int(n/2)])
y2 = sum(y[Int(n/2)+1:n])
colors = [ifelse(i<=Int(n/2), "green", "red") for i=1:n]
#colors = fill("red", n)
#colors[1:Int(n/2)] .= "green"
mm = Plots.mm
bar(	y,
margin=7mm,
size=(1000,667),
xticks=1:2:n,
xrot=90,
color=colors,
label="\$(@sprintf("%.0f",y1*100))%",
legendfontsize=16,
titlefontsize=24,
guidefontsize=16,
tickfontsize=8
)
bar!(	[1],[0],
label="\$(@sprintf("%.0f",y2*100))%",
color=colors[n]
)
title!("\n\$n Prisoners")
xlabel!("Longest Loop")
ylabel!("Probability")
``````
1 Like

Hi, can you please format the code with three backticks at the start and at the end, as well as the correct indentations? It would make things much easier for us

Done. Thanks

Is it the function `loops` that takes too much time, or the plotting?
As a first rule, never benchmark code in the global scope, always put the thing you want to measure inside a function.

Plotting is not the main concern but tips would be appreciated. Itâs the loops and _loops code that is the main concern. I have placed an @time around what i am measuring.

1 Like

I profiled your code with `@profview` to see where it was spending most of its time, and I found a culprit: the `Colon` method, which you call every time you create `1:m`. Normally this is fast, but when `m` is not an integer itâs much slower. I changed your code a little to fix that, hopefully it gives the same result:

``````function loops(n, k, fac)
if k â 1 || n â 0
return 1.0
end
cnt = 0.0
for j = 1:floor(Int, n / k)
recur = 0.0
m = floor(Int, min(k - 1, n - j * k))
if m â 0
m = 1
end
for t = 1:m
recur += loops(n - j * k, t, fac)
end
cnt += recur * (fac[Int(n)] / fac[Int(n - j * k)]) / (fac[Int(j)] * (k^j))
end
return cnt
end

function main(n)
y = Vector{Float64}(undef, n)
fac = OffsetArray([1.0; cumprod(float.(1:n))], 0:n)
for i = 1:n
y[i] = loops(float(n), float(i), fac) / fac[n]
end
return y
end

@time main(100)
``````

However the complexity still looks exponential to me because of the duplicated recursive calls. I think the answer lies in a better memoization than just the factorials, but since Iâm not sure what youâre trying to compute, I canât help much further.

3 Likes

A little contrary perhaps to question goal (of speeding things up), have you considered using BigInts? As the use of Float64 seems to be just to manage larger numbers and perhaps the loss of accuracy is bad.
In code:

``````julia> const bigfac = factorial.(1:BigInt(100));

julia> bigfac[100]
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
``````

In general, thinking more closely about the types and making things cleaner and more stable seems to be a good step. The function `loops` returns `nothing` on some paths, making it harder for the compiler to infer cleaner types.

1 Like

I guess, just like Moana, it all depends how far youâll go

1 Like

Iteration over float ranges is relatively slower than integer ranges. Do you really need that? Do you expect integer overflow?

Just to ease thinking about the calculation:

`loops(n, k) = # permutations of [n] with maximal cycle length k`

So trivially the numbers are integers (not fractions).

And `sum(k -> loops(n,k), 1:n) == factorial(n)`.

[thanks OEIS]

3 Likes

Hereâs my attempt at writing this in a Julian way, with only integers except at the very last step, and a better memoization to avoid duplicate work:

``````function aux!(loops, n::T, k::T, fac) where {T<:Integer}
if n < zero(T) || k <= zero(T)
error("Invalid values")
elseif n > zero(T) && n < k
error("Invalid values")
elseif n in axes(loops, 1) && k in axes(loops, 2) && loops[n, k] > -one(T)
return loops[n, k]
end
if k == one(T) || n == zero(T)
cnt = one(T)
else
cnt = zero(T)
for j = 1:(nĂˇk)
recur = zero(T)
m = max(one(T), min(k - 1, n - j * k))
for t = 1:m
recur += aux!(loops, n - j * k, t, fac)
end
cnt += recur * (fac[n+1] Ăˇ fac[n-j*k+1]) Ăˇ (fac[j+1] * (k^j))
end
end
if n in axes(loops, 1) && k in axes(loops, 2)
loops[n, k] = cnt
end
return cnt
end

function main(n)
y = Vector{Float64}(undef, n)
fac = factorial.(0:BigInt(n))
loops = fill(BigInt(-1), n, n)
for k = 1:n
y[k] = aux!(loops, BigInt(n), BigInt(k), fac) / fac[n + 1]
end
return y
end
``````

I checked, it does return the same result as your old code, but substantially faster for larger values of `n`:

``````julia> for n in 20:20:100
@info "Testing with n=\$n"
@time main(n);
@time old_main(n);
end
[ Info: Testing with n=20
0.001678 seconds (21.52 k allocations: 420.703 KiB)
0.000145 seconds (3 allocations: 672 bytes)
[ Info: Testing with n=40
0.008518 seconds (157.96 k allocations: 3.017 MiB)
0.010954 seconds (3 allocations: 1.172 KiB)
[ Info: Testing with n=60
0.035770 seconds (515.50 k allocations: 9.972 MiB)
0.213151 seconds (3 allocations: 1.594 KiB)
[ Info: Testing with n=80
0.101977 seconds (1.20 M allocations: 23.477 MiB, 20.66% gc time)
2.806020 seconds (3 allocations: 2.156 KiB)
[ Info: Testing with n=100
0.236065 seconds (2.30 M allocations: 45.809 MiB, 23.71% gc time)
32.061351 seconds (3 allocations: 2.625 KiB)
``````

Weâre losing on memory because the big integers cause additional allocations, but weâre redeemed by my memoization trick, which makes the complexity polynomial instead of exponential (assuming fixed number size).

3 Likes

Nothing quite like speeding up code by 100x before going to bed

It would be more modular to separate the factorial precomputation from the other parts of the code, like this:

``````module FactorialTables

struct FactorialTable{T <: Number}
table::Vector{T}

FactorialTable{T}(n::Integer) where {T <: Number} =
new{T}(T[factorial(big(i)) for i â 0:n])
end

table(f::FactorialTable) = f.table

# https://docs.julialang.org/en/v1/manual/interfaces/
Base.getindex(f::FactorialTable, i::Integer) = table(f)[begin + i]

precomputed(::Type{T}, n::Integer) where {T <: Number} = let t = FactorialTable{T}(n)
(i::Integer) -> t[i]
end

precomputed(n::Integer) = precomputed(BigInt, n)

end
``````

Then we are protected from off-by-one errors, like what I suppose happened in an earlier version of your post:

``````julia> fac = FactorialTables.precomputed(100)
#1 (generic function with 1 method)

julia> for i â 1:5
println(fac(i) == factorial(i))
end
true
true
true
true
true
``````
1 Like

Itâs a cool idea indeed but perhaps mildly scary for a beginner I tried to stick to the original code as much as possible, but I agree that we can do much cleaner

2 Likes
``````cnt += recur * (fac[n+1] Ăˇ fac[n-j*k+1]) Ăˇ (fac[j+1] * (k^j))
``````

This for instance still has a lot of duplicate computations, the division of factorials is only there for notational simplicity

1 Like

This expression also repeats multiple times. I donât think Julia can do common subexpression elimination on account of `BigInt` operations being done by external C code.

1 Like

Poor OP, by the time they come back weâll have completely disfigured their code

5 Likes

Thanks, yes I read about BigInts but really didnât need the accuracy and things were working with floating points so I settled on the table i created or using the gamma() function which works with floating points for real value âfactorialsâ. The latter took twice as long as the table but I am new at speed up and Julia coding so âŚ

Thanks for the tips about Julia speed ups and I do appreciate the memoization. That was the next step I planned to look at so you saved me a lot of time. Good job!

1 Like

If thereâs a need to make gdalleâs dynamic programming solution even more performant, while still using `BigInt`s, look into the MutableArithmetics package:

https://juliahub.com/ui/Packages/MutableArithmetics/EoEec/1.3.0

1 Like