Fast String processing in Julia vs. Python

On the risk of getting off topic, I would like to encourage a non-string solution for this problem. I see two problems with the string approach:

  1. The current implementation relies on the fact that the characters in suits and ranks don’t overlap, which is just a fortuitousness in the English language.
  2. In Python, using strings is probably more performant than using an abstraction, because operations on builtin types are much faster than on user-defined types.

But in Julia we can have abstractions at zero cost, which can also make the code safer. In this spirit I would like to offer the following version:

using Combinatorics

const suits = "SHDC"
const ranks = "A23456789TJQK"
 
immutable Card{TS,TR}
   suit :: TS
   rank :: TR
end

"""The probability of an event, given a sample space of equiprobable outcomes.
event can be either a set of outcomes, or a predicate (true for outcomes in the event)."""
function P(event::Function, space)
    count(event,space)//length(space)
end

function flush(hand)
    any( count(card -> card.suit==s, hand) == 5 for s in suits )
end

function main()
    deck  = [ Card(s,r) for r in ranks for s in suits ]
    hands = combinations(deck, 5) 
    P(flush, hands) #Warm up
    @time p = P(flush, hands)
    println(p)
end

main()

With julia 0.5.0, this gets me:

  0.624516 seconds (20.79 M allocations: 991.425 MB, 9.39% gc time)

The main speedup comes from not using strings, but using the dedicated Card type, and having a hand simply be a list of Cards (thus avoiding the strings). The other changes I made (explicitly marking global variables as const, and putting the top-level code into a main function) have little impact on the speed. A little more speed can actually be gained by changing the defintion of suits as follows:

@enum SUIT spades hearts diamonds clubs
const suits = instances(SUIT)

which gets me

  0.531291 seconds (20.79 M allocations: 991.425 MB, 10.73% gc time)

(I guess on current architectures, Chars are too small so that operating on them individually incurs too much overhead.)

Note: on my machine the original Python version takes 5.4 seconds.

8 Likes