Rust / Julia Comparison Post

Important observation: because the hot loops

constraint.letter ∉ word
constraint.letter ∈ word

seem to be way faster for Vector{Char} than for String. Quick example

julia> x = "salet"
"salet"

julia> @btime in('t', $x)
  14.930 ns (0 allocations: 0 bytes)
true

julia> x = Vector{Char}("salet")
5-element Vector{Char}:
 's': ASCII/Unicode U+0073 (category Ll: Letter, lowercase)
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'l': ASCII/Unicode U+006C (category Ll: Letter, lowercase)
 'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)
 't': ASCII/Unicode U+0074 (category Ll: Letter, lowercase)

julia> @btime in('t', $x)
  3.200 ns (0 allocations: 0 bytes)
true
1 Like

I wonder about this because the blog post of the original author mentions

$ time cargo run
Ranking guesses... 11/12972 words (441 min left)

And the other morale is “Discourse is a great place for optimization hints!”

9 Likes

The blog post includes this edit:

Edit: using time cargo run --release even brings it all the way down to 30 minutes!

which was explained with a bit more detail in this HN comment:

I’m amused that the author reached a satisfactory result of “only” 7 hours, just to later realize that they forgot to compile their Rust code with optimizations thus bringing the runtime down to 30 minutes. I’ve never used Julia, but unoptimized Rust binaries contain such naive and suboptimal codegen that I assume that something must have been wrong with their Julia code; my ballpark for the performance of unoptimized Rust code is that it’s roughly on-par with interpreted (not JITted) Ruby or Python (parallelism notwithstanding, which even without optimizations bears much more fruit in Rust than in Ruby/Python).

2 Likes

I use cargo run --release as pointed out by @Benny for fair comparison with Julia’s command line julia -O3 --check-bounds=no. BTW, the code by @Vasily_Pisarev is even better:

GRanking guesses... 100/12972 words (13 min left) # Julia 
GRanking guesses... 100/12972 words (20 min left) # Rust
5 Likes

Applying the vcat bug fix (which I missed completely) both serial and parallel implementation show for me

1. ['r', 'o', 'a', 't', 'e'] (keeps 60.42462203023758 words on average)
2. ['r', 'a', 'i', 's', 'e'] (keeps 61.00086393088553 words on average)
3. ['r', 'a', 'i', 'l', 'e'] (keeps 61.33088552915767 words on average)
4. ['s', 'o', 'a', 'r', 'e'] (keeps 62.30107991360691 words on average)
5. ['a', 'r', 'i', 's', 'e'] (keeps 63.72570194384449 words on average)
6. ['i', 'r', 'a', 't', 'e'] (keeps 63.7792656587473 words on average) 
7. ['o', 'r', 'a', 't', 'e'] (keeps 63.89071274298056 words on average)
8. ['a', 'r', 'i', 'e', 'l'] (keeps 65.28768898488121 words on average)
9. ['a', 'r', 'o', 's', 'e'] (keeps 66.02116630669546 words on average)
10. ['r', 'a', 'i', 'n', 'e'] (keeps 67.0561555075594 words on average)

This seems slightly off compared to The best strategies for Wordle - Things (various).

@Vasily_Pisarev: your implementation also yields

1. roate (keeps 60.42462203023758 words on average)
2. raise (keeps 61.00086393088553 words on average)
3. raile (keeps 61.33088552915767 words on average)
4. soare (keeps 62.30107991360691 words on average)
5. arise (keeps 63.72570194384449 words on average)
6. irate (keeps 63.7792656587473 words on average)
7. orate (keeps 63.89071274298056 words on average)
8. ariel (keeps 65.28768898488121 words on average)
9. arose (keeps 66.02116630669546 words on average)
10. raine (keeps 67.0561555075594 words on average)

for me. Are my word lists broken?

No, the lists are fine. The algorithm on the page you reference should find the globally-optimal solution. The algorithm linked in OP uses the greedy approach, i.e. just finds at each step the word that narrows the search the most.

1 Like

I would suggest making a PR to author’s code and maybe we will see them make an edit/update in the blog post too

7 Likes

No big redesign necessary, simply replacing does most of the trick.

Here is my current best using ThreadX and StaticArrays

using StaticArrays
using ThreadsX

function _readline(path)::SVector{5, Char}
  word = readline(path)
  SVector{5, Char}(word...)
end
function _readlines(path)::Vector{SVector{5, Char}}
  words = Vector{SVector{5, Char}}()
  for word in readlines(path)
      push!(words, SVector{5, Char}(word...))
  end
  println(length(words))
  words
end

const words = _readlines("solutions.txt")
const non_solution_words = _readlines("non-solution-guesses.txt")
const allowed_guesses = vcat(words, non_solution_words)

@enum ConstraintType has_no has_but_not_at has_at
struct Constraint
  type::ConstraintType
  letter::Char
  index::Int
end

function isequal(c0::ConstraintType, c1::ConstraintType)::Bool
  c0.type == c1.type && c0.letter == c1.letter && (c0.type == has_no || c0.index == c1.index)
end

function hash(c::ConstraintType, h::UInt = 0)::UInt
  hash(c.type, h) ⊻ hash(c.letter, h ⊻ 1) ⊻ hash(c.index, h ⊻ 2)
end

function constraints(guess::SVector{5, Char}, actual::SVector{5, Char})::Vector{Constraint}
  c = Vector{Constraint}()
  for (i, g, a) in zip(1:5, guess, actual)
    if g == a
      push!(c, Constraint(has_at, g, i))
    elseif g ∈ actual
      push!(c, Constraint(has_but_not_at, g, i))
    else
      push!(c, Constraint(has_no, g, i))
    end
  end
  c
end

# Code: . = has_no, x = has_but_not_at, o = has_at
function parse_constraints(template::SVector{5, Char}, guess::SVector{5, Char})::Vector{Constraint}
  constraints = Vector{Constraint}()
  for (i, c) in zip(1:5, template)
    if c == 'x'
      push!(constraints, Constraint(has_but_not_at, guess[i], i))
    elseif c == 'o'
      push!(constraints, Constraint(has_at, guess[i], i))
    else
      push!(constraints, Constraint(has_no, guess[i], i))
    end
  end
  constraints
end

function match_constraint(word::SVector{5, Char}, constraint::Constraint)::Bool
  if constraint.type == has_at
    word[constraint.index] == constraint.letter
  elseif constraint.type == has_no
    constraint.letter ∉ word
  elseif constraint.type == has_but_not_at
    (word[constraint.index] != constraint.letter) && (constraint.letter ∈ word)
  else
    @assert false
  end
end

# Average number of possible words left after performing a given guess,
# and receiving the corresponding constraint information.
function avg_remaining(guess::SVector{5, Char}, words::Vector{SVector{5, Char}})::Float64
  rem = ThreadsX.sum(words) do wo
    cs = constraints(guess, wo)
    sum(words) do wi
      all(cs) do c
        match_constraint(wi, c)
      end ? 1 : 0
    end
  end
  rem / length(words)
end

struct Choice
  word::SVector{5, Char}
  avg_remaining::Float64
end

function rank_guesses(guesses::Vector{SVector{5, Char}}, words::Vector{SVector{5, Char}})::Vector{Choice}
  wt = length(guesses)
  avg_time = 0
  choices = Vector{Choice}(undef, wt)
  for (wi, g) in zip(0:wt-1, guesses)
    print("\x1b[1K\x1b[GRanking guesses... ", wi, "/", wt, " words (", Int(ceil(avg_time*(wt-wi)/60)), " min left)")
    avg_time = (wi*avg_time + @elapsed choices[wi+1] = Choice(g, avg_remaining(g, words))) / (wi+1)
  end
  print("\x1b[1K\x1b[G")
  sort!(choices, by = c -> c.avg_remaining)
end

function main()
  global words; global allowed_guesses
  remaining_words = copy(words)  # List of words that currently fit all known constraints.
  while length(remaining_words) > 1
    best_guesses = rank_guesses(allowed_guesses, remaining_words)[1:10]
    for (i, g) in zip(1:10, best_guesses)
      println(i, ". ", g.word, " (keeps ", g.avg_remaining, " words on average)")
    end
    println("Insert your guess: ")
    guess = _readline(stdin)
    println("Insert the results (o = letter at right spot, x = wrong spot, . = not in the word): ")
    constraint_template = _readline(stdin)
    remaining_words = filter(w -> match_constraints(w, parse_constraints(constraint_template, guess)), remaining_words)
  end
  println("Solution: ", remaining_words[1], ".")
end
1 Like

Did a bit of reading on Rust Strings and turns out they are also mutable, or at least growable (couldn’t find an example of explicitly in-place-mutating an index). If the Rust version doesn’t mutate Strings, Julia’s immutable Strings might be a fair analogue just for being heap-allocated UTF-8 structures.

(Also all those mut keywords don’t specify the object as mutable, but rather the variable. In a way, Julia’s variables are mut if reassigned. Low-level is very cool, if quite a bit more complicated to wrap my head around.)

Might as well invite the author here :stuck_out_tongue:

12 Likes

That would make a huge speedup (4 cores and 8 threads cpu):

GRanking guesses... 100/12972 words (3 min left) # Julia multithreaded

But since only a single-threaded Rust version was available, people here were concerned more with fixing the single-threaded code.

3 Likes

Seems like a missing optimization; there’s no inherent reason why this should be so, especially for an ASCII character where it is just a byte search (memchr). char in string is slow for short strings · Issue #43980 · JuliaLang/julia · GitHub

10 Likes

Maybe I’m missing something, but to me it seems clear why the vector is faster: The string needs logic to determine the position of the next char, which both takes time, and inhibits instruction level parallelism.

You’re right that isn’t needed for ASCII strings, but how would the compiler know the string is ASCII? One solution would be to cache whether strings are ASCII somewhere in the struct, e.g. in the top bit of its length. I’d wager the majority of strings are ASCII, so that could be a big deal for speed

No — because of the way UTF-8 works, you don’t actually need to decode characters in order to search for a character, especially an ASCII character. You just need to search for a matching byte.

It doesn’t need to know. Because of the way UTF-8 works, to check if an ASCII character is in the string you just need to search for that byte in the string. So, you only need to check if the character is ASCII.

More generally, to determine if any character is in the string, you just need to search for that byte sequence in the string. And the way Char is encoded in Julia is that it actually stores the UTF-8 byte sequence in the Char.

5 Likes

OK, more ‘fixing’ of the serial code is possible: using memoization (like the original poster already tried?)

function rank_guesses(guesses::Vector{SVector{5, Char}}, words::Vector{SVector{5, Char}})::Vector{Choice}
  wt = length(guesses)
  avg_time = 0
  memo = Dict()
  choices = Vector{Choice}(undef, wt)
  for (wi, guess) in enumerate(guesses)
    rem = 0
    time = @elapsed for actual in words
      cs = constraints(guess, actual)
      if !haskey(memo, cs)
        rem_cs = 0
        for w in words
          rem_cs += match_constraints(w, cs) ? 1.0 : 0.0
        end
        memo[cs] = rem_cs
      end
      rem += memo[cs]
    end
    choices[wi] = Choice(guess, rem / length(words))
    print("\x1b[1K\x1b[GRanking guesses... ", wi, "/", wt, " words (", Int(ceil(avg_time*(wt-wi)/60)), " min left)")
    avg_time = (wi*avg_time + time) / (wi+1)
  end
  print("\x1b[1K\x1b[G")
  sort!(choices, by = c -> c.avg_remaining)
end
1 Like

@Vasily_Pisarev’s pull request (with a very good explanation of the changes) was merged, and the original blog post was updated to explain the effects. @GunnarFarneback’s vcat fix alone reportedly brought it down from the ~17 days (~416 hours) down to 4 hours. 104x is a much bigger effect than people in this thread have reported, so I’m guessing that wasn’t the only quick change (maybe de-global-ing?). The full revision ran for the poster in 20 minutes (12x from 4 hours, 1248x from 17 days).

My remaining questions on this topic:

  1. how can the Rust version (30 minutes with compiler optimizations via cargo run --release) be so performant without the type optimizations described in section III of the pull request? The Rust version uses heap-allocated strings and views on them. There are certainly places where the original Julia version did extraneous allocations to iterate over Strings, but maybe there’s an actual performance discrepancy between the language’s strings?

  2. I’m not exactly sure where the remaining_for_constraint memoization method is used in any of the versions. It was mentioned in the pull request, so I assumed it was important at first. The only line calling it in the original Julia version is in the remaining method, but it’s commented out entirely. The original Rust version doesn’t have it at all. The revised Julia version makes the memoization dictionary const but there is no longer any line calling remaining_for_constraint.

3 Likes

Maybe because Rust has no GC to stop the world?

1 Like

The big thing that would be needed to get the Julia String version faster is https://github.com/JuliaLang/julia/issues/43980 (gc improvements would also probably help)

1 Like

I don’t doubt it has a nonzero effect, but the Rust version ran in 30 minutes. Given the reported 4 hours of the vcat fix, I hesitate to believe that the entire 3.5 hour discrepancy (87.5% of runtime) was spent on garbage collection, though I admit I don’t know the code well enough and have not profiled it to gauge how much allocation/deallocation is happening in each version.