Reduce number of allocations

I would like to reduce the number of allocations in this particular problem (for context, that’s problem 30 of project euler). I got it right, but I think there are too many allocations.

function power_digit_sum(pow, n)
    return sum(c^pow for c in reverse(digits(n)))
end

function Problem30()
    ans = sum(i for i in 2:1_000_000 if i == power_digit_sum(5, i))
    return ans
end

By using the Benchmarks package, I’m getting these numbers:
238.513 ms (2000005 allocations: 243.83 MiB)

for start, don’t reverse the digits because it doesn’t matter?


julia> function power_digit_sum(pow, n)
           return sum(c->c^pow, digits(n))
       end
power_digit_sum (generic function with 1 method)

julia> @benchmark Problem30()
BenchmarkTools.Trial: 73 samples with 1 evaluation.
 Range (min … max):  67.670 ms … 76.800 ms  ┊ GC (min … max): 2.73% … 2.34%
 Time  (median):     68.717 ms              ┊ GC (median):    2.72%
 Time  (mean ± σ):   69.106 ms ±  1.293 ms  ┊ GC (mean ± σ):  3.24% ± 0.66%

  █                                                            
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
  76.8 ms         Histogram: frequency by time        70.8 ms <

 Memory estimate: 105.27 MiB, allocs estimate: 999999.
1 Like

Thanks! TIL :slight_smile:

julia> mutable struct Ds{T<:Integer}
           n::T
       end

julia> Base.length(d::Ds) = floor(Int, log10(d.n)+1.0)

julia> Base.eltype(::Type{Ds{T}}) where T = T

julia> function Base.iterate(d::Ds, state=0)
           quo, rem = divrem(d.n, 10)
           iszero(quo) && iszero(rem) && return nothing
           d.n = quo
           return rem, state+1
       end

julia> sum(abs2, Ds(1120))
6

julia> function power_digit_sum(pow, n)
           return sum(c->c^pow, Ds(n))
       end
power_digit_sum (generic function with 1 method)

julia> function Problem30()
           ans = sum(i for i in 2:1_000_000 if i == power_digit_sum(5, i))
           return ans
       end
Problem30 (generic function with 1 method)

julia> Problem30()
443839

julia> @benchmark Problem30()
BenchmarkTools.Trial: 180 samples with 1 evaluation.
 Range (min … max):  27.493 ms …  31.108 ms  ┊ GC (min … max): 0.00% … 1.83%
 Time  (median):     27.687 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.882 ms ± 492.082 μs  ┊ GC (mean ± σ):  0.71% ± 0.96%

  █                                                             
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▄
  30 ms         Histogram: log(frequency) by time      28.2 ms <

 Memory estimate: 15.26 MiB, allocs estimate: 1000004.

here’s an attempt to make digits allocate less.

1 Like

This thing should have zero allocations:

function powsum(pow, n)
    s = 0
    while n > 0
        (n, r) = divrem(n, 10)
        s += r^pow
    end
    return s
end

problem30() = sum(i for i in 2:1_000_000 if i == powsum(5, i))
jl> @btime problem30()
  30.112 ms (0 allocations: 0 bytes)
443839
4 Likes

that’s indeed what I tried and realized divrem doesn’t allocate, but I still want it to be digits for not “cheating” :wink:

But this is digits:wink: It does exactly the same thing as digits, except doesn’t store anything in a vector.

I think I have one divrem call too many, btw. Could try to get rid of that.

the point is you don’t have handle on the digits themselves. i.e. you can’t apply arbitrary function to each digits, and you can’t recover the old digits – i.e. collect them.

You can rewrite it to be fsum(f, n), where f is an arbitrary function.

But I don’t really understand the reason for such an expansion of scope. Why collect the digits when your doing a reduction?

The answer to this question in vacuum is a sound “no”. I’m just saying it’s not always safe to assume OP does and only does this specific one thing → it’s not safe to be too smart.

It’s project Euler 30, I think it’s specific enough. I don’t like bending over backwards to do things the same way as the OP, I normally assume they’re asking the wrong question anyway, so we’ll have to agree to disagree on this :wink:

Here’s a version that is more specific and faster:

function pow5(x)
    y = x^2
    return y^2 * x
end

function pow5sum(n)
    s = 0
    while n >= 10
        (n, r) = divrem(n, 10)
        s += pow5(r)
    end
    return s + pow5(n)
end

problem30() = sum(i for i in 2:1_000_000 if i == pow5sum(i))
jl> @btime problem30()
  11.453 ms (0 allocations: 0 bytes)
443839

But there must be some less brute-force way to go about this.

2 Likes

Original code:

using BenchmarkTools

function power_digit_sum(pow, n)
    return sum(c^pow for c in reverse(digits(n)))
end

function Problem30()
    ans = sum(i for i in 2:1_000_000 if i == power_digit_sum(5, i))
    return ans
end

@show Problem30();
@btime Problem30();
Problem30() = 443839
  128.256 ms (2000004 allocations: 243.83 MiB)

The easiest way to reduce the allocation in this code is to use digits!:

module Rev1

function power_digit_sum!(pow, n, ws)
    return sum(c^pow for c in digits!(ws, n))
end

function Problem30(N, m)
    ws = Vector{Int}(undef, N)
    ans = sum(i for i in 2:(10^N - 1) if i == power_digit_sum!(m, i, ws))
    return ans
end

end

@show Rev1.Problem30(6, 5);
@btime Rev1.Problem30(6, 5);
Rev1.Problem30(6, 5) = 443839
  54.770 ms (1 allocation: 128 bytes)

128.256 ms (2000004 allocations: 243.83 MiB)

54.770 ms (1 allocation: 128 bytes)

The naive simplest way to solve Project Euler Problem 30 is to use a 6-fold for loop that runs from 0 to 9 for each of i_1, i_2, \ldots, i_6. Using the Base.Cartesian macros, we can simply write a 6-fold for loop in Julia.

function projecteuler30()
    s = 0
    x = 0
    @nloops 6 i d -> 0:9 begin
        y = 0
        @nexprs 6 d -> y += (z = i_d^2; z^2*i_d)
        s += (x == y) * x
        x += 1
    end
    s - 1
end

@show projecteuler30();
@btime projecteuler30();
projecteuler30() = 443839
  237.800 μs (0 allocations: 0 bytes)

128.256 ms (2000004 allocations: 243.83 MiB)

54.770 ms (1 allocation: 128 bytes)

237.800 μs (0 allocations: 0 bytes)

That’s over 500 times faster than the original code! It’s only using one CPU thread! The readability of the code has been reduced, but it is simple enough!

It can be generalized with only a 30% performance sacrifice by using a generated function as follows:

@generated function _projecteuler30(::Val{N}, ::Val{m}) where {N, m}
    quote
        p = @ntuple 10 i -> (i - 1) ^ $m
        s = 0
        x = 0
        @inbounds @nloops $N i d -> 0:9 begin
            y = 0
            @nexprs $N d -> y += p[i_d+1]
            s += (x == y) * x
            x += 1
        end
        s - 1
    end
end
projecteuler30(N, m) = _projecteuler30(Val(N), Val(m))

@show projecteuler30(6, 5);
@btime projecteuler30(6, 5);
projecteuler30(6, 5) = 443839
  315.700 μs (0 allocations: 0 bytes)

I think one of the great joys of Julia is that calculations can be made hundreds of times faster with a little effort.

Jupyter notebook (I have left the trial and error part.)

5 Likes