Naive lookup table with lots of allocations

Hi there,

going through project euler problems, I’ve implemented a working solution for problem 17:

function num2letters(num)
    nums = Dict(
        0 => "", 1 => "one", 2 => "two", 3 => "three", 4 => "four", 5 => "five",
        6 => "six", 7 => "seven", 8 => "eight", 9 => "nine", 10 => "ten",
        11 => "eleven", 12 => "twelve", 13 => "thirteen", 14 => "fourteen",
        15 => "fifteen", 16 => "sixteen", 17 => "seventeen", 18 => "eighteen",
        19 => "nineteen", 20 => "twenty", 30 => "thirty", 40 => "forty",
        50 => "fifty", 60 => "sixty", 70 => "seventy", 80 => "eighty",
        90 => "ninety", 100 => "hundred", 1000 => "thousand"
)

    if num <= 20
        return length(nums[num])
    elseif num < 100
        tens, units = divrem(num, 10)
        return length(nums[tens * 10]) + num2letters(units)
    elseif num < 1000
        hundreds, rest = divrem(num, 100)
        if rest != 0
            return num2letters(hundreds) + length(nums[100]) + length("and") + num2letters(rest)
        else
            return num2letters(hundreds) + length(nums[100])
        end
    else
        thousands, rest = divrem(num, 1000)
        return num2letters(thousands) + length(nums[1000])
    end
end

function Problem17()
    #=
    If the numbers 1 to 5 are written out in words: one, two, three, four,
    five, then there are 3 + 3 + 5 + 4 + 4 = 19 letters used in total.

    If all the numbers from 1 to 1000 (one thousand) inclusive were written
    out in words, how many letters would be used?

    NOTE: Do not count spaces or hyphens. For example, 342 (three hundred and
    forty-two) contains 23 letters and 115 (one hundred and fifteen) contains
    20 letters. The use of "and" when writing out numbers is in compliance
    with British usage.
    =#
    n = 1000
    letters = 0
    for num in 1:n
        letters += num2letters(num)
    end
    return letters
end

It takes almost 7 ms and does 25k allocations. That’s the same time I’m getting in Python with almost identical code.

How can I do better?

Thx in advance.

Well for one, you are re-creating the Dict every time you call num2letters. But it seems constant.

Make the dictionary an input to the function.

I did already that and time (33 vs 7) and allocations (43k vs 25k) were worse :frowning:

For me, creating the Dict in Problem17 instead of num2letters reduces allocations from 25k to 18k. What did you do?

Really? The following doesn’t make it faster?

function num2letters(num, nums)

    if num <= 20
        return length(nums[num])
    elseif num < 100
        tens, units = divrem(num, 10)
        return length(nums[tens * 10]) + num2letters(units, nums)
    elseif num < 1000
        hundreds, rest = divrem(num, 100)
        if rest != 0
            return num2letters(hundreds, nums) + length(nums[100]) + length("and") + num2letters(rest, nums)
        else
            return num2letters(hundreds, nums) + length(nums[100])
        end
    else
        thousands, rest = divrem(num, 1000)
        return num2letters(thousands, nums) + length(nums[1000])
    end
end

function Problem17()

    nums = Dict(
        0 => "", 1 => "one", 2 => "two", 3 => "three", 4 => "four", 5 => "five",
        6 => "six", 7 => "seven", 8 => "eight", 9 => "nine", 10 => "ten",
        11 => "eleven", 12 => "twelve", 13 => "thirteen", 14 => "fourteen",
        15 => "fifteen", 16 => "sixteen", 17 => "seventeen", 18 => "eighteen",
        19 => "nineteen", 20 => "twenty", 30 => "thirty", 40 => "forty",
        50 => "fifty", 60 => "sixty", 70 => "seventy", 80 => "eighty",
        90 => "ninety", 100 => "hundred", 1000 => "thousand"
    )
    #=
    If the numbers 1 to 5 are written out in words: one, two, three, four,
    five, then there are 3 + 3 + 5 + 4 + 4 = 19 letters used in total.

    If all the numbers from 1 to 1000 (one thousand) inclusive were written
    out in words, how many letters would be used?

    NOTE: Do not count spaces or hyphens. For example, 342 (three hundred and
    forty-two) contains 23 letters and 115 (one hundred and fifteen) contains
    20 letters. The use of "and" when writing out numbers is in compliance
    with British usage.
    =#
    n = 1000
    letters = 0
    for num in 1:n
        letters += num2letters(num, nums)
    end
    return letters
end

I get the following timing

julia> @btime Problem17()
  67.200 μs (7 allocations: 1.73 KiB)
21124
1 Like

I would avoid the Dict entirely.

function length_up_to_ten(x)
  if x == 0
    return 0
  elseif x in (1, 2, 6, 10)
    return 3
  elseif x in (4, 9)
    return 4
  elseif x in (3, 8)
    return 5
  end
end

function length_up_to_twenty(x)
  [...]
end

Then in the main function call the appropriate sub-function depending on the value of num (or the parts it is divided into).

2 Likes

aha, I put the dictionary outside of both functions

Read through the performance tips in the Julia manual to understand why that’s a bad idea, as well as learn more ways to speed up code.

1 Like

Thx, will do :slight_smile:

Yeah, something like this. For a smaller change in the algorithm, store the length of the words in the dict instead of the words themselves.

But then, if the goal is to count the number of letter for the specific set of numbers 1:1000, the better approach is do some math instead.
There are 10 thousand terms. Each has length “length of number between 1 and 9” plus length of the word “thousand”.
For each thousand term, there are ten hundred terms. Figure out the length of each (essentially “length of 1:9” + length of word “hundred”).
For each hundred term, we get the length of 0:99.
Then add up: length of thousands + 10length of hundreds + 100length of (0:99)

I wrote this code which is not faster but it takes less time to write the code (just in case someone would like to know a shorter solution)

function total_letters( n::Int64 )
  v = [ SpelledOut.spelled_out( i, lang = :en_UK ) for i in 1:n ]
  sum( length.( v ) ) - ( sum( count.( " ", v ) ) + sum( count.( "-", v ) ) )
end
total_letters( 1000 )