Naive lookup table with lots of allocations

daviddoij · July 28, 2021, 6:55pm

Hi there,

going through project euler problems, I’ve implemented a working solution for problem 17:

function num2letters(num)
    nums = Dict(
        0 => "", 1 => "one", 2 => "two", 3 => "three", 4 => "four", 5 => "five",
        6 => "six", 7 => "seven", 8 => "eight", 9 => "nine", 10 => "ten",
        11 => "eleven", 12 => "twelve", 13 => "thirteen", 14 => "fourteen",
        15 => "fifteen", 16 => "sixteen", 17 => "seventeen", 18 => "eighteen",
        19 => "nineteen", 20 => "twenty", 30 => "thirty", 40 => "forty",
        50 => "fifty", 60 => "sixty", 70 => "seventy", 80 => "eighty",
        90 => "ninety", 100 => "hundred", 1000 => "thousand"
)

    if num <= 20
        return length(nums[num])
    elseif num < 100
        tens, units = divrem(num, 10)
        return length(nums[tens * 10]) + num2letters(units)
    elseif num < 1000
        hundreds, rest = divrem(num, 100)
        if rest != 0
            return num2letters(hundreds) + length(nums[100]) + length("and") + num2letters(rest)
        else
            return num2letters(hundreds) + length(nums[100])
        end
    else
        thousands, rest = divrem(num, 1000)
        return num2letters(thousands) + length(nums[1000])
    end
end

function Problem17()
    #=
    If the numbers 1 to 5 are written out in words: one, two, three, four,
    five, then there are 3 + 3 + 5 + 4 + 4 = 19 letters used in total.

    If all the numbers from 1 to 1000 (one thousand) inclusive were written
    out in words, how many letters would be used?

    NOTE: Do not count spaces or hyphens. For example, 342 (three hundred and
    forty-two) contains 23 letters and 115 (one hundred and fifteen) contains
    20 letters. The use of "and" when writing out numbers is in compliance
    with British usage.
    =#
    n = 1000
    letters = 0
    for num in 1:n
        letters += num2letters(num)
    end
    return letters
end

It takes almost 7 ms and does 25k allocations. That’s the same time I’m getting in Python with almost identical code.

How can I do better?

Thx in advance.

pdeffebach · July 28, 2021, 7:01pm

Well for one, you are re-creating the Dict every time you call num2letters. But it seems constant.

Make the dictionary an input to the function.

daviddoij · July 28, 2021, 7:08pm

I did already that and time (33 vs 7) and allocations (43k vs 25k) were worse

sostock · July 28, 2021, 7:20pm

For me, creating the Dict in Problem17 instead of num2letters reduces allocations from 25k to 18k. What did you do?

pdeffebach · July 28, 2021, 7:22pm

Really? The following doesn’t make it faster?

function num2letters(num, nums)

    if num <= 20
        return length(nums[num])
    elseif num < 100
        tens, units = divrem(num, 10)
        return length(nums[tens * 10]) + num2letters(units, nums)
    elseif num < 1000
        hundreds, rest = divrem(num, 100)
        if rest != 0
            return num2letters(hundreds, nums) + length(nums[100]) + length("and") + num2letters(rest, nums)
        else
            return num2letters(hundreds, nums) + length(nums[100])
        end
    else
        thousands, rest = divrem(num, 1000)
        return num2letters(thousands, nums) + length(nums[1000])
    end
end

function Problem17()

    nums = Dict(
        0 => "", 1 => "one", 2 => "two", 3 => "three", 4 => "four", 5 => "five",
        6 => "six", 7 => "seven", 8 => "eight", 9 => "nine", 10 => "ten",
        11 => "eleven", 12 => "twelve", 13 => "thirteen", 14 => "fourteen",
        15 => "fifteen", 16 => "sixteen", 17 => "seventeen", 18 => "eighteen",
        19 => "nineteen", 20 => "twenty", 30 => "thirty", 40 => "forty",
        50 => "fifty", 60 => "sixty", 70 => "seventy", 80 => "eighty",
        90 => "ninety", 100 => "hundred", 1000 => "thousand"
    )
    #=
    If the numbers 1 to 5 are written out in words: one, two, three, four,
    five, then there are 3 + 3 + 5 + 4 + 4 = 19 letters used in total.

    If all the numbers from 1 to 1000 (one thousand) inclusive were written
    out in words, how many letters would be used?

    NOTE: Do not count spaces or hyphens. For example, 342 (three hundred and
    forty-two) contains 23 letters and 115 (one hundred and fifteen) contains
    20 letters. The use of "and" when writing out numbers is in compliance
    with British usage.
    =#
    n = 1000
    letters = 0
    for num in 1:n
        letters += num2letters(num, nums)
    end
    return letters
end

I get the following timing

julia> @btime Problem17()
  67.200 μs (7 allocations: 1.73 KiB)
21124

hendri54 · July 28, 2021, 7:23pm

I would avoid the Dict entirely.

function length_up_to_ten(x)
  if x == 0
    return 0
  elseif x in (1, 2, 6, 10)
    return 3
  elseif x in (4, 9)
    return 4
  elseif x in (3, 8)
    return 5
  end
end

function length_up_to_twenty(x)
  [...]
end

Then in the main function call the appropriate sub-function depending on the value of num (or the parts it is divided into).

daviddoij · July 28, 2021, 7:27pm

aha, I put the dictionary outside of both functions

pdeffebach · July 28, 2021, 7:30pm

Read through the performance tips in the Julia manual to understand why that’s a bad idea, as well as learn more ways to speed up code.

daviddoij · July 28, 2021, 7:35pm

Thx, will do

DNF · July 28, 2021, 10:24pm

Yeah, something like this. For a smaller change in the algorithm, store the length of the words in the dict instead of the words themselves.

hendri54 · July 29, 2021, 11:06am

But then, if the goal is to count the number of letter for the specific set of numbers 1:1000, the better approach is do some math instead.
There are 10 thousand terms. Each has length “length of number between 1 and 9” plus length of the word “thousand”.
For each thousand term, there are ten hundred terms. Figure out the length of each (essentially “length of 1:9” + length of word “hundred”).
For each hundred term, we get the length of 0:99.
Then add up: length of thousands + 10length of hundreds + 100length of (0:99)

Gabrielforest · May 30, 2022, 3:30am

I wrote this code which is not faster but it takes less time to write the code (just in case someone would like to know a shorter solution)

function total_letters( n::Int64 )
  v = [ SpelledOut.spelled_out( i, lang = :en_UK ) for i in 1:n ]
  sum( length.( v ) ) - ( sum( count.( " ", v ) ) + sum( count.( "-", v ) ) )
end
total_letters( 1000 )