Mysterious allocations in nested scalar loops

I am writing some code to solve Project Euler #60. Given an upper bound N, the program uses 5 nested for loops to find the smallest sum of quintuples (p_1, p_2, p_3, p_4, p_5) of prime numbers, p_i \leqslant N such that concatenating any two of them in any order produces a prime. What surprises me is that there is only scalar code in the loops, yet benchmarking shows a bazillion allocations.

julia> @btime euler60(10_000)
  712.060 ms (2325284 allocations: 35.49 MiB)
26033
Full code
using Primes 

function num_digits(n)
    d = 0
    while n > 0
        d += 1
        n Γ·= 10
    end
    d
end

function concatenate(a, b)
    d = num_digits(b)
    10^d * a + b
end

function is_admissible(values::Tuple, p) 
    # check if a prime p can be extended to a tuple of already found primes
    for q in values
        if !isprime(concatenate(p, q)) || !isprime(concatenate(q, p)) 
            return false
        end
    end
    true 
end

function euler60(upper_bound)
    prime_list = Primes.primes(3, upper_bound) 
    n = length(prime_list) 
    min_sum = typemax(Int)  

    @inbounds for i1 in 1:(n - 4)
        p1 = prime_list[i1]
        5p1 > min_sum && break
        for i2 in (i1 + 1):(n - 3)
            p2 = prime_list[i2] 
            p1 + 4p2 >= min_sum && break
            is_admissible((p1), p2) || continue
            
            for i3 in (i2 + 1):(n - 2)
                p3 = prime_list[i3]
                p1 + p2 + 3p3 >= min_sum && break
                is_admissible((p1, p2), p3) || continue

                for i4 in (i3 + 1):(n - 1)
                    p4 = prime_list[i4]
                    p1 + p2 + p3 + 2p4 >= min_sum && break
                    is_admissible((p1, p2, p3), p4) || continue

                    for i5 in (i4 + 1):n
                        p5 = prime_list[i5]
                        s = p1 + p2 + p3 + p4 + p5
                        s >= min_sum && break
                        is_admissible((p1, p2, p3, p4), p5) || continue
                        
                        # println("primes: $p1 $p2 $p3 $p4 $p5, sum = $s") 
                        min_sum = s
                    end
                end
            end
        end
    end
    min_sum 
end

There doesn’t seem to be any type instabilities either.

@code_warntype output
julia> @code_warntype euler60(10_000)
MethodInstance for euler60(::Int64)
  from euler60(upper_bound) @ Main ~/project-euler/euler60.jl:72
Arguments
  #self#::Core.Const(euler60)
  upper_bound::Int64
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  val::Nothing
  min_sum::Int64
  n::Int64
  prime_list::Vector{Int64}
  @_8::Union{Nothing, Tuple{Int64, Int64}}
  i1::Int64
  p1::Int64
  @_11::Union{Nothing, Tuple{Int64, Int64}}
  i2::Int64
  p2::Int64
  @_14::Union{Nothing, Tuple{Int64, Int64}}
  i3::Int64
  p3::Int64
  @_17::Union{Nothing, Tuple{Int64, Int64}}
  i4::Int64
  p4::Int64
  i5::Int64
  s::Int64
  p5::Int64
Body::Int64
1 ──        Core.NewvarNode(:(val))
β”‚    %2   = Primes.primes::Core.Const(Primes.primes)
β”‚           (prime_list = (%2)(3, upper_bound))
β”‚           (n = Main.length(prime_list))
β”‚           (min_sum = Main.typemax(Main.Int))
β”‚           nothing
β”‚    %7   = (n - 4)::Int64
β”‚    %8   = (1:%7)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
β”‚           (@_3 = Base.iterate(%8))
β”‚    %10  = (@_3 === nothing)::Bool
β”‚    %11  = Base.not_int(%10)::Bool
└───        goto #39 if not %11
2 ┄─        Core.NewvarNode(:(@_8))
β”‚    %14  = @_3::Tuple{Int64, Int64}
β”‚           (i1 = Core.getfield(%14, 1))
β”‚    %16  = Core.getfield(%14, 2)::Int64
β”‚           (p1 = Base.getindex(prime_list, i1))
β”‚    %18  = (5 * p1)::Int64
β”‚    %19  = (%18 > min_sum)::Bool
└───        goto #4 if not %19
3 ──        goto #39
4 ── %22  = (i1 + 1)::Int64
β”‚    %23  = (n - 3)::Int64
β”‚    %24  = (%22:%23)::UnitRange{Int64}
β”‚           (@_8 = Base.iterate(%24))
β”‚    %26  = (@_8 === nothing)::Bool
β”‚    %27  = Base.not_int(%26)::Bool
└───        goto #37 if not %27
5 ┄─        Core.NewvarNode(:(@_11))
β”‚    %30  = @_8::Tuple{Int64, Int64}
β”‚           (i2 = Core.getfield(%30, 1))
β”‚    %32  = Core.getfield(%30, 2)::Int64
β”‚           (p2 = Base.getindex(prime_list, i2))
β”‚    %34  = p1::Int64
β”‚    %35  = (4 * p2)::Int64
β”‚    %36  = (%34 + %35)::Int64
β”‚    %37  = (%36 >= min_sum)::Bool
└───        goto #7 if not %37
6 ──        goto #37
7 ── %40  = Main.is_admissible(p1, p2)::Bool
└───        goto #9 if not %40
8 ──        goto #10
9 ──        goto #35
10 ─ %44  = (i2 + 1)::Int64
β”‚    %45  = (n - 2)::Int64
β”‚    %46  = (%44:%45)::UnitRange{Int64}
β”‚           (@_11 = Base.iterate(%46))
β”‚    %48  = (@_11 === nothing)::Bool
β”‚    %49  = Base.not_int(%48)::Bool
└───        goto #35 if not %49
11 β”„        Core.NewvarNode(:(@_14))
β”‚    %52  = @_11::Tuple{Int64, Int64}
β”‚           (i3 = Core.getfield(%52, 1))
β”‚    %54  = Core.getfield(%52, 2)::Int64
β”‚           (p3 = Base.getindex(prime_list, i3))
β”‚    %56  = p1::Int64
β”‚    %57  = p2::Int64
β”‚    %58  = (3 * p3)::Int64
β”‚    %59  = (%56 + %57 + %58)::Int64
β”‚    %60  = (%59 >= min_sum)::Bool
└───        goto #13 if not %60
12 ─        goto #35
13 ─ %63  = Core.tuple(p1, p2)::Tuple{Int64, Int64}
β”‚    %64  = Main.is_admissible(%63, p3)::Bool
└───        goto #15 if not %64
14 ─        goto #16
15 ─        goto #33
16 ─ %68  = (i3 + 1)::Int64
β”‚    %69  = (n - 1)::Int64
β”‚    %70  = (%68:%69)::UnitRange{Int64}
β”‚           (@_14 = Base.iterate(%70))
β”‚    %72  = (@_14 === nothing)::Bool
β”‚    %73  = Base.not_int(%72)::Bool
└───        goto #33 if not %73
17 β”„        Core.NewvarNode(:(@_17))
β”‚    %76  = @_14::Tuple{Int64, Int64}
β”‚           (i4 = Core.getfield(%76, 1))
β”‚    %78  = Core.getfield(%76, 2)::Int64
β”‚           (p4 = Base.getindex(prime_list, i4))
β”‚    %80  = p1::Int64
β”‚    %81  = p2::Int64
β”‚    %82  = p3::Int64
β”‚    %83  = (2 * p4)::Int64
β”‚    %84  = (%80 + %81 + %82 + %83)::Int64
β”‚    %85  = (%84 >= min_sum)::Bool
└───        goto #19 if not %85
18 ─        goto #33
19 ─ %88  = Core.tuple(p1, p2, p3)::Tuple{Int64, Int64, Int64}
β”‚    %89  = Main.is_admissible(%88, p4)::Bool
└───        goto #21 if not %89
20 ─        goto #22
21 ─        goto #31
22 ─ %93  = (i4 + 1)::Int64
β”‚    %94  = (%93:n)::UnitRange{Int64}
β”‚           (@_17 = Base.iterate(%94))
β”‚    %96  = (@_17 === nothing)::Bool
β”‚    %97  = Base.not_int(%96)::Bool
└───        goto #31 if not %97
23 β”„ %99  = @_17::Tuple{Int64, Int64}
β”‚           (i5 = Core.getfield(%99, 1))
β”‚    %101 = Core.getfield(%99, 2)::Int64
β”‚           (p5 = Base.getindex(prime_list, i5))
β”‚           (s = p1 + p2 + p3 + p4 + p5)
β”‚    %104 = (s >= min_sum)::Bool
└───        goto #25 if not %104
24 ─        goto #31
25 ─ %107 = Core.tuple(p1, p2, p3, p4)::NTuple{4, Int64}
β”‚    %108 = Main.is_admissible(%107, p5)::Bool
└───        goto #27 if not %108
26 ─        goto #28
27 ─        goto #29
28 ─ %112 = Base.string("primes: ", p1, " ", p2, " ", p3, " ", p4, " ", p5, ", sum = ", s)::String
β”‚           Main.println(%112)
└───        (min_sum = s)
29 β”„        (@_17 = Base.iterate(%94, %101))
β”‚    %116 = (@_17 === nothing)::Bool
β”‚    %117 = Base.not_int(%116)::Bool
└───        goto #31 if not %117
30 ─        goto #23
31 β”„        (@_14 = Base.iterate(%70, %78))
β”‚    %121 = (@_14 === nothing)::Bool
β”‚    %122 = Base.not_int(%121)::Bool
└───        goto #33 if not %122
32 ─        goto #17
33 β”„        (@_11 = Base.iterate(%46, %54))
β”‚    %126 = (@_11 === nothing)::Bool
β”‚    %127 = Base.not_int(%126)::Bool
└───        goto #35 if not %127
34 ─        goto #11
35 β”„        (@_8 = Base.iterate(%24, %32))
β”‚    %131 = (@_8 === nothing)::Bool
β”‚    %132 = Base.not_int(%131)::Bool
└───        goto #37 if not %132
36 ─        goto #5
37 β”„        (@_3 = Base.iterate(%8, %16))
β”‚    %136 = (@_3 === nothing)::Bool
β”‚    %137 = Base.not_int(%136)::Bool
└───        goto #39 if not %137
38 ─        goto #2
39 β”„        (val = nothing)
β”‚           nothing
β”‚           val
└───        return min_sum

System information

julia> versioninfo()
Julia Version 1.10.0-DEV.15
Commit 3e7d796c17 (2022-11-16 23:01 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 16 Γ— 12th Gen Intel(R) Core(TM) i5-12500H
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, alderlake)
  Threads: 16 on 16 virtual cores

When I run your code I get:

julia> euler60(10_000)
ERROR: MethodError: no method matching is_admissible(::Int64, ::Int64)

So you may have some stray method definition lying around.

The reason for the error is in this line:

which should be

is_admissible((p1,), p2) || continue  # note extra comma to get a tuple

After fixing this, I still get the same amount of allocations, though.

Another note: this seems to beg for a recursive solution. One loop depth per tuple length seems untenable. What will you do for an octuple, or a quinvigintuple?

1 Like

It’s probably the isprime function. It allocates for some large integers.

julia> @btime isprime(65521)
3.268 ns (0 allocations: 0 bytes)
true

julia> @btime isprime(65537)
212.412 ns (1 allocation: 16 bytes)
true

julia> @btime isprime(6553765601)
4.250 ΞΌs (0 allocations: 0 bytes)
true

julia> @btime isprime(1000723)
474.679 ns (1 allocation: 16 bytes)
true

1 Like

I did some backtracking at first but don’t know how to efficiently prune the search tree (for example if 5p1 > min_sum I want to stop immediately instead of unwinding the recursion stack).

The main difficulty here is that you don’t know how large the largest number can get . It could hypothetically be the case that 3, 7, 109, 673, 19xxx produces a smaller sum. So to be absolutely sure you need to check with the upper bound 26,000, which is much faster with the nested for loops approach than naive recursion.

Thanks, that seems to be the culprit. I wonder if those allocations negatively affect performance too much.

Seems like this could be fixed with some refactoring, if someone wants to make a PR to Primes.jl:

1 Like