My daughter is learning about the Vigenère cipher in school and I was helping out (after reading the wiki). They are doing it by hand in school, but it seemed like it might be a good example to start showing her code, since it’s very tedious and error prone by hand.
Here’s a quick implementation, but it’s a little clunky for a kid to follow. In particular, the encode → and decode ← are just modulo addition and subtraction, but this is obfuscated by all the shifting by 'A'. The conversion from Strings to vectors of Char and back is also hiding the important part of that line, which is repeating the code word to match the message length.
Thanks for sharing this brilliant code.
You should post it at the Rosetta site, as it looks much better than the old Julia code posted there (would be tempted to say the same applies when we compare it to most other languages")
Not sure how old your daughter is but it might be appropriate to also provide an implementation where the Vigenère table is built, like their exercise on paper and the description on Wikipedia.
Tried this below, but simpler would probably be better.
function encrypt(message::String, codeword::String)
message, codeword = uppercase.((message, codeword))
message = join([c for c in message if 'A' <= c <= 'Z'])
codeword = match_length(codeword,length(message))
join(vsqr[c - 'A' + 1, m - 'A' + 1] for (m,c) in zip(message, codeword))
end
function decrypt(ciphertext::String, codeword::String)
ciphertext, codeword = uppercase.((ciphertext, codeword))
codeword = match_length(codeword,length(ciphertext))
join('A' - 1 + findfirst(t, join(vsqr[c - 'A' + 1, :])) for (t,c) in zip(ciphertext, codeword))
end
match_length(word,len) = (word^len)[1:len]
# Vigenère square (or tabula recta)
const vsqr = reshape(permutedims(hcat([circshift('A':'Z',-i) for i in 0:25]...)), 26, 26)
message = "VIGENERE"
codeword = "GREEN"
ciphertext = encrypt(message, codeword)
decrypt(ciphertext, codeword)
NB: added the uppercase() function and removed all non-alphabetic characters from input.
I think the table should be an result of the shifting alphabet concept! I do like the idea of being more robust to input strings.
Here’s version 2:
number(letter) = letter-'A' # number of letters between `letter` and 'A'
letter(number) = 'A'+(number+26)%26 # the `number`th letter after 'A' (negative `number` means it should be before 'A')
# Loop around if you run out of letters in the alphabet. So "A-1=Z" and "A+27=B".
→(a::Char,b::Char) = letter(number(a)+number(b)) # encode using +
←(a::Char,b::Char) = letter(number(a)-number(b)) # decode using -
isletter(a) = a in 'A':'Z' # is `a` inside the letters from 'A' to 'Z'?
cleanup(str::String) = filter(isletter, collect(uppercase.(str))) # make all letters uppercase and filter out non-letters
match_length(word,len) = repeat(word,len)[1:len] # repeat the codeword until you match the length
function →(message::String,codeword::String, ↔ = →)
plaintext = cleanup(message)
key = match_length(cleanup(codeword),length(plaintext))
return String(plaintext .↔ key)
end
←(message::String,codeword::String) = →(message,codeword,←)
Trying to slow down a little and add comments to help her (she’s 12 and hasn’t taken any coding yet).
I think this version is really clear. I was doing something along these lines to post but yours is better. One thing to note is that if you do a Vigenère cipher with bytes, then the whole dance goes away and you get what looks like junk for your cypher text, which could be good or bad depending on your preferences.
encrypt(message::String, codeword::String, + = +) =
String([a + b for (a, b) in zip(
codeunits(message),
Iterators.cycle(codeunits(codeword))
)])
decrypt(message::String, codeword::String) =
encrypt(message, codeword, -)
julia> c = encrypt("This is a message", "code")
"\xb7\xd7\xcd\u603\xd8ׅď\xd1\xca\xd6\xe2\xc5\xcc\xc8"
julia> decrypt(c, "code")
"This is a message"
I also like the operator-as-argument to avoid copy-pasting the encoder code over to decoder. I’ve edited that to the “teaching” version above. Applying it to the “minimalist” version:
Could you please shed some light on the magic in this code line?
What is the name for this in the manual?
Do not understand in particular: (a) The → function takes two chars as input but where are they? (b) how the matrix of chars is then generated from the input StepRange.
So you have a collection num which acts like a column vector, and a transposed collection num' which acts like a row vector. Broadcasting the + operator fills out the full matrix over all columns and rows. And even though I could write +.(num,num') and get the same thing, it looks a little nicer to put the operator in the middle.
@baggepinnen, thanks for the feedback. My difficulties were not so much with broadcasting as with the →(a,b) function. It started with a mistaken assumption that it was just a visual decoration to help @weymouth’s daughter, and that a function name like f(a,b) would do the same job.
It seems that for a number of symbols used in function names with two arguments, Julia’s interpreter allows them to be used as binary operators as well: a → b then becomes another way to call the function →(a,b). We would not have had this extra magic if the function had been defined as f(a,b) in the first place.
This is a wonderful topic for teaching school kids and young undergraduates too. I showed the basic Caesar cypher to my 12 year-old son and some basic code/pseudo-code to go with it. One thing that we had fun doing was investigating which language had the hardest alphabet to crack for the Caesar cypher: while English and French have 26 letters, Italian has 21 letters, Spanish 27 (used to be 30 until not so long ago), Portuguese and classic latin have 23. Which is the hardest to crack? A similar question can be asked of Vigenère/Bellaso cypher.
Edit: The wikipedia pages on Vigenère and Bellaso makes for a fascinating read. Vigenère attributed the original cypher to Bellaso and improved it. The poor Bellaso has been forgotten a little since, but he was a superstar then. The cypher fascinated the great intellects of the time and took centuries to crack.
There’s an interesting math article listed there (sadly it is “gated”) and a more accessible blog post:
Biermann, Norbert (2018). “Analysis of Giouan Battista Bellaso’s cipher challenges of 1555”. Cryptologia . 42 (5): 381–407
One thing that we had fun doing was investigating which language had the hardest alphabet to crack for the Caesar cypher: while English and French have 26 letters, Italian has 21 letters, Spanish 27 (used to be 30 until not so long ago), Portuguese and classic latin have 23. Which is the hardest to crack?