Did a little refactoring, as gdalle said, to make the code totally unrecognizable. Specifically, used the semantic meaning of the loops
values to find a more simply calculated expression.
It goes like this (using memoization also):
# returns number of perms of 1..n with largest cycle of size k
function loop!(M,CM,n,k)
n < k && return 0.0 # base and trivial cases
n == 0 && k == 0 && return 1.0
k <= 0 && return 0.0
n <= 0 && return 0.0
M[n,k] >= 0.0 && return M[n,k] # memoization
if k == 1
M[n, k] = 1.0
return 1.0
end
factor = 1.0 # non base case calculation
res = 0.0
for i in 1:k-1
res += factor * loop!(M,CM,n-i,k)
factor *= n-i
end
if n>k
j = k
while j>1 && CM[n-k,j] < 0.0
j -= 1
end
for i in j:k # second memoization for M cumsum
CM[n-k, i] = (i>1 ? CM[n-k, i-1] : 0.0) + loop!(M,CM,n-k, i)
end
res += factor * CM[n-k,k]
else
res += factor
end
M[n,k] = res
return res
end
To use this function, the memoization matrices need to be initialized and passed as parameters, as follows:
julia> const M = fill(-1.0,100,100);
julia> const CM = fill(-1.0,100,100);
julia> @time [loop!(M, CM, 100, i) for i in 1:100];
0.027222 seconds (23.15 k allocations: 1.561 MiB, 98.51% compilation time)
The timing is not very informative, as I’m not including any other time to compare with. But it seems at least >1000x speedup.
UPDATE: To benchmark, adding a main
as in previous posts:
function main(n)
M = fill(-1.0, n,n)
CM = fill(-1.0, n,n)
y = Vector{Float64}(undef, n)
fac = cumprod(float.(1:n))
for i = 1:n
y[i] = loop!(M, CM, n, i) / fac[n]
end
return y
end
with this main
and comparing to one of the original oldmain
s:
julia> @time oldmain(100);
5.305958 seconds (4 allocations: 3.500 KiB)
julia> @time main(100);
0.001171 seconds (7 allocations: 158.969 KiB)
( 5000x speedup )