Help with speeding up this code

I am new to this language, tried a few speed up tips and this is the best i got. Is there anything else that can be done to get a significant performance increase? Can you show me how?

using Plots, Printf, OffsetArrays

function loops(nn::Integer, kk::Integer)
    if nn < 0 || kk <= 0
        return nothing
    end
	if nn > 0 && nn < kk
		return nothing
	end
    _loops(Float64(nn),Float64(kk))
end
function _loops(n::Float64, k::Float64)
    if k == 1 || n == 0
		return 1.0
	end
    cnt = 0.0
	for j = 1:floor(n/k)
		recur = 0.0
		m = min(k-1, n-j*k)
		if m == 0
			m = 1.0
		end
		for t = 1:m
		   recur += _loops(n-j*k, t)
		end
		cnt += recur*(fac[Int(n)]/fac[Int(n-j*k)])/(fac[Int(j)]*(k^j))
	end
    return cnt
end
n = 100
const fac = OffsetArray([1.0; cumprod(1:Float64(n))], 0:n)
@time begin
	y = Vector{Float64}(undef, n)
	for i = 1:n
		y[i] = loops(n,i)/fac[n]
	end
end
y1 = sum(y[1:Int(n/2)])
y2 = sum(y[Int(n/2)+1:n])
colors = [ifelse(i<=Int(n/2), "green", "red") for i=1:n]
#colors = fill("red", n)
#colors[1:Int(n/2)] .= "green"
mm = Plots.mm
bar(	y,
		margin=7mm, 
		size=(1000,667),
		xticks=1:2:n,
		xrot=90,
		color=colors,
		label="$(@sprintf("%.0f",y1*100))%",
		legendfontsize=16,
		titlefontsize=24,
		guidefontsize=16,
		tickfontsize=8
)
bar!(	[1],[0],
		label="$(@sprintf("%.0f",y2*100))%",
		color=colors[n]
)
title!("\n$n Prisoners")
xlabel!("Longest Loop")
ylabel!("Probability")
1 Like

Hi, can you please format the code with three backticks at the start and at the end, as well as the correct indentations? It would make things much easier for us :slight_smile:

Done. Thanks

Is it the function loops that takes too much time, or the plotting?
As a first rule, never benchmark code in the global scope, always put the thing you want to measure inside a function.

Plotting is not the main concern but tips would be appreciated. It’s the loops and _loops code that is the main concern. I have placed an @time around what i am measuring.

1 Like

I profiled your code with @profview to see where it was spending most of its time, and I found a culprit: the Colon method, which you call every time you create 1:m. Normally this is fast, but when m is not an integer it’s much slower. I changed your code a little to fix that, hopefully it gives the same result:

function loops(n, k, fac)
    if k ≈ 1 || n ≈ 0
        return 1.0
    end
    cnt = 0.0
    for j = 1:floor(Int, n / k)
        recur = 0.0
        m = floor(Int, min(k - 1, n - j * k))
        if m ≈ 0
            m = 1
        end
        for t = 1:m
            recur += loops(n - j * k, t, fac)
        end
        cnt += recur * (fac[Int(n)] / fac[Int(n - j * k)]) / (fac[Int(j)] * (k^j))
    end
    return cnt
end

function main(n)
    y = Vector{Float64}(undef, n)
    fac = OffsetArray([1.0; cumprod(float.(1:n))], 0:n)
    for i = 1:n
        y[i] = loops(float(n), float(i), fac) / fac[n]
    end
    return y
end

@time main(100)

However the complexity still looks exponential to me because of the duplicated recursive calls. I think the answer lies in a better memoization than just the factorials, but since I’m not sure what you’re trying to compute, I can’t help much further.

3 Likes

A little contrary perhaps to question goal (of speeding things up), have you considered using BigInts? As the use of Float64 seems to be just to manage larger numbers and perhaps the loss of accuracy is bad.
In code:

julia> const bigfac = factorial.(1:BigInt(100));

julia> bigfac[100]
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

In general, thinking more closely about the types and making things cleaner and more stable seems to be a good step. The function loops returns nothing on some paths, making it harder for the compiler to infer cleaner types.

1 Like

I guess, just like Moana, it all depends how far you’ll go

1 Like

Iteration over float ranges is relatively slower than integer ranges. Do you really need that? Do you expect integer overflow?

Just to ease thinking about the calculation:

loops(n, k) = # permutations of [n] with maximal cycle length k

So trivially the numbers are integers (not fractions).

And sum(k -> loops(n,k), 1:n) == factorial(n).

[thanks OEIS]

3 Likes

Here’s my attempt at writing this in a Julian way, with only integers except at the very last step, and a better memoization to avoid duplicate work:

function aux!(loops, n::T, k::T, fac) where {T<:Integer}
    if n < zero(T) || k <= zero(T)
        error("Invalid values")
    elseif n > zero(T) && n < k
        error("Invalid values")
    elseif n in axes(loops, 1) && k in axes(loops, 2) && loops[n, k] > -one(T)
        return loops[n, k]
    end
    if k == one(T) || n == zero(T)
        cnt = one(T)
    else
        cnt = zero(T)
        for j = 1:(n÷k)
            recur = zero(T)
            m = max(one(T), min(k - 1, n - j * k))
            for t = 1:m
                recur += aux!(loops, n - j * k, t, fac)
            end
            cnt += recur * (fac[n+1] ÷ fac[n-j*k+1]) ÷ (fac[j+1] * (k^j))
        end
    end
    if n in axes(loops, 1) && k in axes(loops, 2)
        loops[n, k] = cnt
    end
    return cnt
end

function main(n)
    y = Vector{Float64}(undef, n)
    fac = factorial.(0:BigInt(n))
    loops = fill(BigInt(-1), n, n)
    for k = 1:n
        y[k] = aux!(loops, BigInt(n), BigInt(k), fac) / fac[n + 1]
    end
    return y
end

I checked, it does return the same result as your old code, but substantially faster for larger values of n:

julia> for n in 20:20:100
           @info "Testing with n=$n"
           @time main(n);
           @time old_main(n);
       end
[ Info: Testing with n=20
  0.001678 seconds (21.52 k allocations: 420.703 KiB)
  0.000145 seconds (3 allocations: 672 bytes)
[ Info: Testing with n=40
  0.008518 seconds (157.96 k allocations: 3.017 MiB)
  0.010954 seconds (3 allocations: 1.172 KiB)
[ Info: Testing with n=60
  0.035770 seconds (515.50 k allocations: 9.972 MiB)
  0.213151 seconds (3 allocations: 1.594 KiB)
[ Info: Testing with n=80
  0.101977 seconds (1.20 M allocations: 23.477 MiB, 20.66% gc time)
  2.806020 seconds (3 allocations: 2.156 KiB)
[ Info: Testing with n=100
  0.236065 seconds (2.30 M allocations: 45.809 MiB, 23.71% gc time)
 32.061351 seconds (3 allocations: 2.625 KiB)

We’re losing on memory because the big integers cause additional allocations, but we’re redeemed by my memoization trick, which makes the complexity polynomial instead of exponential (assuming fixed number size).

3 Likes

Nothing quite like speeding up code by 100x before going to bed :sleeping:

It would be more modular to separate the factorial precomputation from the other parts of the code, like this:

module FactorialTables

struct FactorialTable{T <: Number}
  table::Vector{T}

  FactorialTable{T}(n::Integer) where {T <: Number} =
    new{T}(T[factorial(big(i)) for i ∈ 0:n])
end

table(f::FactorialTable) = f.table

# https://docs.julialang.org/en/v1/manual/interfaces/
Base.getindex(f::FactorialTable, i::Integer) = table(f)[begin + i]

precomputed(::Type{T}, n::Integer) where {T <: Number} = let t = FactorialTable{T}(n)
  (i::Integer) -> t[i]
end

precomputed(n::Integer) = precomputed(BigInt, n)

end

Then we are protected from off-by-one errors, like what I suppose happened in an earlier version of your post:

julia> fac = FactorialTables.precomputed(100)
#1 (generic function with 1 method)

julia> for i ∈ 1:5
         println(fac(i) == factorial(i))
       end
true
true
true
true
true
1 Like

It’s a cool idea indeed but perhaps mildly scary for a beginner :scream: I tried to stick to the original code as much as possible, but I agree that we can do much cleaner

2 Likes
cnt += recur * (fac[n+1] ÷ fac[n-j*k+1]) ÷ (fac[j+1] * (k^j))

This for instance still has a lot of duplicate computations, the division of factorials is only there for notational simplicity

1 Like

This expression also repeats multiple times. I don’t think Julia can do common subexpression elimination on account of BigInt operations being done by external C code.

1 Like

Poor OP, by the time they come back we’ll have completely disfigured their code :rofl:

5 Likes

Thanks, yes I read about BigInts but really didn’t need the accuracy and things were working with floating points so I settled on the table i created or using the gamma() function which works with floating points for real value “factorials”. The latter took twice as long as the table but I am new at speed up and Julia coding so …

Thanks for the tips about Julia speed ups and I do appreciate the memoization. That was the next step I planned to look at so you saved me a lot of time. Good job!

1 Like

If there’s a need to make gdalle’s dynamic programming solution even more performant, while still using BigInts, look into the MutableArithmetics package:

https://juliahub.com/ui/Packages/MutableArithmetics/EoEec/1.3.0

1 Like