Reduce number of allocations

daviddoij · August 25, 2021, 4:54pm

I would like to reduce the number of allocations in this particular problem (for context, that’s problem 30 of project euler). I got it right, but I think there are too many allocations.

function power_digit_sum(pow, n)
    return sum(c^pow for c in reverse(digits(n)))
end

function Problem30()
    ans = sum(i for i in 2:1_000_000 if i == power_digit_sum(5, i))
    return ans
end

By using the Benchmarks package, I’m getting these numbers:
238.513 ms (2000005 allocations: 243.83 MiB)

jling · August 25, 2021, 4:58pm

for start, don’t reverse the digits because it doesn’t matter?


julia> function power_digit_sum(pow, n)
           return sum(c->c^pow, digits(n))
       end
power_digit_sum (generic function with 1 method)

julia> @benchmark Problem30()
BenchmarkTools.Trial: 73 samples with 1 evaluation.
 Range (min … max):  67.670 ms … 76.800 ms  ┊ GC (min … max): 2.73% … 2.34%
 Time  (median):     68.717 ms              ┊ GC (median):    2.72%
 Time  (mean ± σ):   69.106 ms ±  1.293 ms  ┊ GC (mean ± σ):  3.24% ± 0.66%

  █                                                            
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
  76.8 ms         Histogram: frequency by time        70.8 ms <

 Memory estimate: 105.27 MiB, allocs estimate: 999999.

daviddoij · August 25, 2021, 5:02pm

Thanks! TIL

jling · August 25, 2021, 5:31pm

julia> mutable struct Ds{T<:Integer}
           n::T
       end

julia> Base.length(d::Ds) = floor(Int, log10(d.n)+1.0)

julia> Base.eltype(::Type{Ds{T}}) where T = T

julia> function Base.iterate(d::Ds, state=0)
           quo, rem = divrem(d.n, 10)
           iszero(quo) && iszero(rem) && return nothing
           d.n = quo
           return rem, state+1
       end

julia> sum(abs2, Ds(1120))
6

julia> function power_digit_sum(pow, n)
           return sum(c->c^pow, Ds(n))
       end
power_digit_sum (generic function with 1 method)

julia> function Problem30()
           ans = sum(i for i in 2:1_000_000 if i == power_digit_sum(5, i))
           return ans
       end
Problem30 (generic function with 1 method)

julia> Problem30()
443839

julia> @benchmark Problem30()
BenchmarkTools.Trial: 180 samples with 1 evaluation.
 Range (min … max):  27.493 ms …  31.108 ms  ┊ GC (min … max): 0.00% … 1.83%
 Time  (median):     27.687 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.882 ms ± 492.082 μs  ┊ GC (mean ± σ):  0.71% ± 0.96%

  █                                                             
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▄
  30 ms         Histogram: log(frequency) by time      28.2 ms <

 Memory estimate: 15.26 MiB, allocs estimate: 1000004.

here’s an attempt to make digits allocate less.

DNF · August 25, 2021, 5:43pm

This thing should have zero allocations:

function powsum(pow, n)
    s = 0
    while n > 0
        (n, r) = divrem(n, 10)
        s += r^pow
    end
    return s
end

problem30() = sum(i for i in 2:1_000_000 if i == powsum(5, i))

jl> @btime problem30()
  30.112 ms (0 allocations: 0 bytes)
443839

jling · August 25, 2021, 5:45pm

that’s indeed what I tried and realized divrem doesn’t allocate, but I still want it to be digits for not “cheating”

DNF · August 25, 2021, 5:56pm

But this is digits It does exactly the same thing as digits, except doesn’t store anything in a vector.

I think I have one divrem call too many, btw. Could try to get rid of that.

jling · August 25, 2021, 5:59pm

the point is you don’t have handle on the digits themselves. i.e. you can’t apply arbitrary function to each digits, and you can’t recover the old digits – i.e. collect them.

DNF · August 25, 2021, 6:02pm

You can rewrite it to be fsum(f, n), where f is an arbitrary function.

But I don’t really understand the reason for such an expansion of scope. Why collect the digits when your doing a reduction?

jling · August 25, 2021, 6:05pm

The answer to this question in vacuum is a sound “no”. I’m just saying it’s not always safe to assume OP does and only does this specific one thing → it’s not safe to be too smart.

DNF · August 25, 2021, 6:25pm

It’s project Euler 30, I think it’s specific enough. I don’t like bending over backwards to do things the same way as the OP, I normally assume they’re asking the wrong question anyway, so we’ll have to agree to disagree on this

Here’s a version that is more specific and faster:

function pow5(x)
    y = x^2
    return y^2 * x
end

function pow5sum(n)
    s = 0
    while n >= 10
        (n, r) = divrem(n, 10)
        s += pow5(r)
    end
    return s + pow5(n)
end

problem30() = sum(i for i in 2:1_000_000 if i == pow5sum(i))

jl> @btime problem30()
  11.453 ms (0 allocations: 0 bytes)
443839

But there must be some less brute-force way to go about this.

genkuroki · August 25, 2021, 10:03pm

Original code:

using BenchmarkTools

function power_digit_sum(pow, n)
    return sum(c^pow for c in reverse(digits(n)))
end

function Problem30()
    ans = sum(i for i in 2:1_000_000 if i == power_digit_sum(5, i))
    return ans
end

@show Problem30();
@btime Problem30();

Problem30() = 443839
  128.256 ms (2000004 allocations: 243.83 MiB)

The easiest way to reduce the allocation in this code is to use digits!:

module Rev1

function power_digit_sum!(pow, n, ws)
    return sum(c^pow for c in digits!(ws, n))
end

function Problem30(N, m)
    ws = Vector{Int}(undef, N)
    ans = sum(i for i in 2:(10^N - 1) if i == power_digit_sum!(m, i, ws))
    return ans
end

end

@show Rev1.Problem30(6, 5);
@btime Rev1.Problem30(6, 5);

Rev1.Problem30(6, 5) = 443839
  54.770 ms (1 allocation: 128 bytes)

128.256 ms (2000004 allocations: 243.83 MiB)
↓
54.770 ms (1 allocation: 128 bytes)

The naive simplest way to solve Project Euler Problem 30 is to use a 6-fold for loop that runs from 0 to 9 for each of i_1, i_2, \ldots, i_6. Using the Base.Cartesian macros, we can simply write a 6-fold for loop in Julia.

function projecteuler30()
    s = 0
    x = 0
    @nloops 6 i d -> 0:9 begin
        y = 0
        @nexprs 6 d -> y += (z = i_d^2; z^2*i_d)
        s += (x == y) * x
        x += 1
    end
    s - 1
end

@show projecteuler30();
@btime projecteuler30();

projecteuler30() = 443839
  237.800 μs (0 allocations: 0 bytes)

128.256 ms (2000004 allocations: 243.83 MiB)
↓
54.770 ms (1 allocation: 128 bytes)
↓
237.800 μs (0 allocations: 0 bytes)

That’s over 500 times faster than the original code! It’s only using one CPU thread! The readability of the code has been reduced, but it is simple enough!

It can be generalized with only a 30% performance sacrifice by using a generated function as follows:

@generated function _projecteuler30(::Val{N}, ::Val{m}) where {N, m}
    quote
        p = @ntuple 10 i -> (i - 1) ^ $m
        s = 0
        x = 0
        @inbounds @nloops $N i d -> 0:9 begin
            y = 0
            @nexprs $N d -> y += p[i_d+1]
            s += (x == y) * x
            x += 1
        end
        s - 1
    end
end
projecteuler30(N, m) = _projecteuler30(Val(N), Val(m))

@show projecteuler30(6, 5);
@btime projecteuler30(6, 5);

projecteuler30(6, 5) = 443839
  315.700 μs (0 allocations: 0 bytes)

I think one of the great joys of Julia is that calculations can be made hundreds of times faster with a little effort.

Jupyter notebook (I have left the trial and error part.)

Topic		Replies	Views
Help to reduce number of allocations General Usage	23	5412	January 6, 2019
Reducing allocations when subtracting vectors and multiplying by scalars Performance memory-allocation , optimization	9	435	November 9, 2021
Reducing allocations to optimize (to match Java speed) Performance	22	1297	December 10, 2018
Reduce memory allocation in copying array New to Julia question , memory-allocation	6	890	August 17, 2021
Help understanding allocation on scalar function and possible ways to avoid it Performance	3	898	December 7, 2017

Reduce number of allocations

Related topics