# Reduce allocations and speed up

Hi there,

in my attempt to teach myself Julia, Iβm solving Project Euler problems. The following code works and give the correct answer, although I was expecting better performance, as the time is similar to what I got using Python. Here it is the code:

``````function chain_length(n, terms)
length = 0
while n != 1
length += terms[n]
break
end
if n % 2 == 0
n = n / 2
else
n = 3n + 1
end
length += 1
end
return length
end

function Problem14()
#=
The following iterative sequence is defined for the set of positive
integers:

n β n/2 (n is even)
n β 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following
sequence:

13 β 40 β 20 β 10 β 5 β 16 β 8 β 4 β 2 β 1

It can be seen that this sequence (starting at 13 and finishing at 1)
contains 10 terms. Although it has not been proved yet (Collatz Problem),
it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.
=#

ans = 0
limit = 1_000_000
score = 0
terms = Dict()
for i in 1:limit
terms[i] = chain_length(i, terms)
if terms[i] > score
score = terms[i]
ans = i
end
end
return ans
end
``````

I see two problems: You do `n = n / 2` but the division operator `/` returns `Float64` even when called with `Int`s that can in theory be divided into an `Int` again. You want the `Γ·` operator (typed `\div`). This will make that part of the function type stable.

The other part that is type unstable is that youβre defining a plain `Dict()`, which is equal to `Dict{Any,Any})`, even though you know you are always going to store `Int => Int` pairs in it. So you should change that to `Dict{Int, Int}()`. With those two changes I get:

`````` 0.393216 seconds (54 allocations: 65.170 MiB)
``````

Whereas with your previous version I got

``````4.184902 seconds (17.95 M allocations: 339.074 MiB, 27.09% gc time)
``````
7 Likes

Allocations may further be reduced by using `sizehint!`

``````function chain_length(n, terms)
length = 0
while n != 1
length += terms[n]
break
end
if n % 2 == 0
n = n Γ· 2
else
n = 3n + 1
end
length += 1
end
return length
end

function Problem14()
#=
The following iterative sequence is defined for the set of positive
integers:

n β n/2 (n is even)
n β 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following
sequence:

13 β 40 β 20 β 10 β 5 β 16 β 8 β 4 β 2 β 1

It can be seen that this sequence (starting at 13 and finishing at 1)
contains 10 terms. Although it has not been proved yet (Collatz Problem),
it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.
=#

ans = 0
limit = 1_000_000
score = 0
terms = Dict{Int, Int}()
sizehint!(terms, limit)
for i in 1:limit
length_i = chain_length(i, terms)
terms[i] = length_i
if length_i > score
score = length_i
ans = i
end
end
return ans
end
``````

``````julia> @time Problem14();
0.379844 seconds (7 allocations: 34.001 MiB)
``````

Edit: the shortcut of replacing (3n+1) by (3n+1)/2 helps:

``````function chain_length(n, terms)
length = 0
while n != 1
length += terms[n]
break
end
if n % 2 == 0
n = n Γ· 2
else
n = (3n + 1) Γ· 2
length += 1
end
length += 1
end
return length
end
``````
``````julia> @time Problem14()
0.314625 seconds (7 allocations: 34.001 MiB, 0.49% gc time)
837799
``````
3 Likes

This seems to be a problem that would benefit from more aggressive memoizaiton, but for some reason thereβs no difference in performance (presumably because this needs to store more terms?).

Code
``````function get_chain_length!(terms, n)
return terms[n]
end
if n % 2 == 0
n_new = n Γ· 2
else
n_new = 3n + 1
end
length = get_chain_length!(terms, n_new) + 1
terms[n] = length
length
end

function Problem14()
#=
The following iterative sequence is defined for the set of positive
integers:

n β n/2 (n is even)
n β 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following
sequence:

13 β 40 β 20 β 10 β 5 β 16 β 8 β 4 β 2 β 1

It can be seen that this sequence (starting at 13 and finishing at 1)
contains 10 terms. Although it has not been proved yet (Collatz Problem),
it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.
=#

ans = 0
limit = 1_000_000
score = 0
terms = Dict{Int, Int}()
sizehint!(terms, 2limit)
terms[1] = 0
for i in 2:limit
length_i = get_chain_length!(terms, i)
if length_i > score
score = length_i
ans = i
end
end
return ans
end
``````
``````julia> @time Problem14()
0.363303 seconds (7 allocations: 68.001 MiB)
837799
``````

A marginal improvement is gained by replacing (3n+1) by (3n+1)/2:

``````function get_chain_length!(terms, n)
return terms[n]
end
length = 0
if n % 2 == 0
n_new = n Γ· 2
else
n_new = (3n + 1) Γ· 2
length = 1
end
length += get_chain_length!(terms, n_new) + 1
terms[n] = length
length
end
``````
``````julia> @time Problem14()
0.315037 seconds (7 allocations: 68.001 MiB)
837799
``````

Since all the keys in `terms` are small positive integers, it may be faster to use a `Vector{Int}` instead, using a sentinel value like -1 to indicate missing values.

1 Like

Potentially, you can speed up calculations with bitwise operations. Instead of `n % 2 == 0` you can do `n & 1 == 0`. Maybe compiler can figure it out on itβs own, but at least itβs worth a try. Same goes for `n \div 2`, you can use ` n >> 1`

3 Likes

Iβd use `iseven` or `isodd` here instead of the modulus check.

4 Likes

`divrem` can perhaps be used here.

1 Like

Interestingly, the βdumbβ version without memoization is faster for me:

``````function chain_length(n)
length = 0
while n != 1
if iseven(n)
n = n Γ· 2
else
n = 3n + 1
end
length += 1
end
return length
end

function problem_14(range)
k = 0
l = 0
for i in range
ll = chain_length_2(i)
if ll > l
k = i
l = ll
end
end
k
end

julia> @time problem_14(1:1_000_000)

0.263881 seconds
837799 => 524
``````

And this dumb version can also be threaded easily, and it scales perfectly because itβs complete divide and conquer:

``````function problem_14_threads()
ks = fill(0=>0, 4)
x = 250_000

rng = (t-1) * x + 1 : t * x
ks[t] = problem_14(rng)
end
ks[argmax(last.(ks))][1]
end

0.080029 seconds (26 allocations: 1.969 KiB)
837799
``````
1 Like

It may not help much with speed but you could build around `get!` if you want to memoize the values.

``````const lengths = Dict{Int, Int}(1 => 1)
function getlength!(lengths, n)
get!(lengths, n) do
1 + getlength!(lengths, isodd(n) ? 3n + 1 : (n >> 1))
end
end
findmax([getlength!(lengths, i) for i in 1:1000000])
``````

I havenβt worked out how to avoid the comprehension in the call to `findmax`. You donβt need to store the values if you are just trying to get the maximum but I havenβt decided how to make the expression in the comprehension into an iterator.

So I did manage to work out how to use a generator expression as an argument to `findmax`. You construct a key-value pair but, curiously the βkeyβ is the array value and the βvalueβ is the index. (If I would have looked at the result of `findmax` that would have given me the hint.) So the above can be written

``````julia> findmax(getlength!(lengths, i) => i for i in 1:1000000)
(525, 837799)
``````

To benchmark this approach fairly the Dict should be reinitialized for each evaluation.

``````julia> @benchmark findmax(Pair(getlength!(lengths, i), i) for i in 1:1000000)  setup=(lengths = Dict{Int,Int}(1 => 1))
BenchmarkTools.Trial: 20 samples with 1 evaluation.
Range (min β¦ max):  245.319 ms β¦ 308.182 ms  β GC (min β¦ max): 0.73% β¦ 20.86%
Time  (median):     249.373 ms               β GC (median):    1.07%
Time  (mean Β± Ο):   255.664 ms Β±  17.785 ms  β GC (mean Β± Ο):  3.07% Β±  5.19%

ββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
245 ms           Histogram: frequency by time          308 ms <

Memory estimate: 133.17 MiB, allocs estimate: 56.
``````

An alternative, recursive function that does not memoize would be

``````getlengthr(n) = isone(n) ? 1 : 1 + getlengthr(isodd(n) ? 3n + 1 : (n >> 1))
``````

but that is a little slower

``````julia> @benchmark findmax(getlengthr(i) => i for i in 1:1000000)
BenchmarkTools.Trial: 17 samples with 1 evaluation.
Range (min β¦ max):  300.419 ms β¦ 304.137 ms  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     301.531 ms               β GC (median):    0.00%
Time  (mean Β± Ο):   301.979 ms Β±   1.204 ms  β GC (mean Β± Ο):  0.00% Β± 0.00%

β β β     β  β β ββ     β   β β       β                  β  β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
300 ms           Histogram: frequency by time          304 ms <

Memory estimate: 0 bytes, allocs estimate: 0.
``````

Turns out for me (single-threaded), a direct, non-recursive version

``````function getlength(n)
count = 1
while !isone(n)
n = isodd(n) ? 3n + 1 : (n >> 1)
count += 1
end
return count
end
``````

is fastest

``````julia> @benchmark findmax(getlength(i) => i for i in 1:1000000)
BenchmarkTools.Trial: 44 samples with 1 evaluation.
Range (min β¦ max):  111.425 ms β¦ 116.422 ms  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     113.932 ms               β GC (median):    0.00%
Time  (mean Β± Ο):   113.910 ms Β± 856.555 ΞΌs  β GC (mean Β± Ο):  0.00% Β± 0.00%

ββ    β β  β  ββ  βββ  ββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
111 ms           Histogram: frequency by time          116 ms <

Memory estimate: 0 bytes, allocs estimate: 0.
``````
1 Like

This takes only 77 ms on my computer. Itβs 2.6X faster than @jules version above. Two small tricks used here,
(1) use `n >> 1` as suggested by @Skoffer instead of `n Γ· 2`, surprisingly the compiler couldnβt do it, and
(2) use the range `1:2:1_000_000` which excludes even numbers which drop in value very quickly, as opposed to odd starting values which increase.

``````function chain_length(n)
length = 0
while n > 1
n = iseven(n) ? n >> 1 : 3n + 1
length += 1
end
return length
end

function problem_14(rng)
k = l = 0
for i in rng
ll = chain_length(i)
if ll > l
k = i
l = ll
end
end
k
end
@btime problem_14(1:2:1_000_000)
77.016 ms (0 allocations: 0 bytes)
837799
``````
1 Like

wow! Thatβs remarkable. Thx for sharing with a newbie

I think you can get rid of the branch by writing

``````iseven(n) * (n >> 1) + isodd(n) * (3n + 1)
``````

Or hoisting the check outside and inverting the boolean in the above expression.

Not sure if this is enough to get it to SIMD, there may need to be some more tricky transform with the loop condition necessary.

1 Like

We can do it even slightly better

``````function chain_length(n)
length = 0
while n > 1
n = 3n + 1
k = trailing_zeros(n)
n >>= k
length += k + 1
end
return length
end
``````

With this `chain_length` implementation, time has decreased from 79ms to 52ms on my laptop.

With multithreading (it can be erroneous though on interval edges, so it should be written more carefully, but it still yields correct result in this case for 4 threads)

``````function problem_14_threads()
ks = zeros(Int, n)
x = 1_000_000 Γ· n

rng = (t-1) * x + 1:2:t * x
ks[t] = problem_14(rng)
end
maximum(ks)
end

15.294 ms (22 allocations: 1.98 KiB)
837799
``````
2 Likes

For fun, here is an unbeatable 6 ms version combining bit twidling magic by @Skoffer with memoization. I was lazy and used global `const`s for `maxn` and `cache` array.

``````function chain_length(n)
res = 0
nxt = 3n + 1
k = trailing_zeros(nxt)
nxt >>= k
if nxt >= maxn || nxt < maxn && iszero(cache[nxt])
res = k + 1 + chain_length(nxt)
else
res = k + 1 + cache[nxt]
end
if n < maxn
cache[n] = res
end
return res
end

function problem_14(rng)
cache[1] = 1
best = 1
for n in rng
if iszero(cache[n])
if cache[best] < chain_length(n)
best = n
end
end
end
best, cache[best]
end

const maxn = 1_000_000
const cache = zeros(Int,maxn)
@btime problem_14(1:2:10^6) setup = (cache .= 0)
6.039 ms (0 allocations: 0 bytes)
(837799, 525)
``````
1 Like