Vigenère cipher

My daughter is learning about the Vigenère cipher in school and I was helping out (after reading the wiki). They are doing it by hand in school, but it seemed like it might be a good example to start showing her code, since it’s very tedious and error prone by hand.

Here’s a quick implementation, but it’s a little clunky for a kid to follow. In particular, the encode and decode are just modulo addition and subtraction, but this is obfuscated by all the shifting by 'A'. The conversion from Strings to vectors of Char and back is also hiding the important part of that line, which is repeating the code word to match the message length.

Any suggestions?

message = "VIGENERE"
codeword = "GREEN"

→(a::Char,b::Char) = 'A'+((a-'A')+(b-'A'))%26
←(a::Char,b::Char) = 'A'+((a-'A')-(b-'A')+26)%26

→(message::String,codeword::String) = String(collect(message) .→ collect(match_length(codeword,length(message))))
←(message::String,codeword::String) = String(collect(message) .← collect(match_length(codeword,length(message))))
match_length(word,len) = (word^len)[1:len]
6 Likes

Thanks for sharing this brilliant code.
You should post it at the Rosetta site, as it looks much better than the old Julia code posted there.

Not sure how old your daughter is but it might be appropriate to also provide an implementation where the Vigenère table is built, like their exercise on paper and the description on Wikipedia.

Tried this below, but simpler would probably be better.

function encrypt(message::String, codeword::String)
	message, codeword = uppercase.((message, codeword))
	message = join([c for c in message if 'A' <= c <= 'Z'])
	codeword = match_length(codeword,length(message))
	join(vsqr[c - 'A' + 1, m - 'A' + 1] for (m,c) in zip(message, codeword))
end

function decrypt(ciphertext::String, codeword::String)
	ciphertext, codeword = uppercase.((ciphertext, codeword))
	codeword = match_length(codeword,length(ciphertext))
	join('A' - 1 + findfirst(t, join(vsqr[c - 'A' + 1, :])) for (t,c) in zip(ciphertext, codeword))
end

match_length(word,len) = (word^len)[1:len]

# Vigenère square (or tabula recta)
const vsqr = reshape(permutedims(hcat([circshift('A':'Z',-i) for i in 0:25]...)), 26, 26)

message = "VIGENERE"
codeword = "GREEN"
ciphertext = encrypt(message, codeword)
decrypt(ciphertext, codeword)

NB: added the uppercase() function and removed all non-alphabetic characters from input.

2 Likes

I think the table should be an result of the shifting alphabet concept! I do like the idea of being more robust to input strings.

Here’s version 2:

number(letter) = letter-'A'         # number of letters between `letter` and 'A'
letter(number) = 'A'+(number+26)%26 # the `number`th letter after 'A' (negative `number` means it should be before 'A')
# Loop around if you run out of letters in the alphabet. So "A-1=Z" and "A+27=B".

→(a::Char,b::Char) = letter(number(a)+number(b)) # encode using +
←(a::Char,b::Char) = letter(number(a)-number(b)) # decode using -

isletter(a) = a in 'A':'Z' # is `a` inside the letters from 'A' to 'Z'?
cleanup(str::String) = filter(isletter, collect(uppercase.(str))) # make all letters uppercase and filter out non-letters
match_length(word,len) = repeat(word,len)[1:len]  # repeat the codeword until you match the length

function →(message::String,codeword::String, ↔ = →)
    plaintext = cleanup(message)
    key = match_length(cleanup(codeword),length(plaintext))
    return String(plaintext .↔ key)
end
←(message::String,codeword::String) = →(message,codeword,←)

Trying to slow down a little and add comments to help her (she’s 12 and hasn’t taken any coding yet).

julia> number('A'),number('L')
(0, 11)

julia> letter(2),letter(-2)
('C', 'Y')

julia> alpha = 'A':'Z';
julia> alpha .→ permutedims(alpha)
26×26 Matrix{Char}:
 'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z'
 'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z'  'A'
 'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z'  'A'  'B'
 'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z'  'A'  'B'  'C'
 ⋮                        ⋮                        ⋮                        ⋮                        ⋮                        ⋮
 'W'  'X'  'Y'  'Z'  'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'
 'X'  'Y'  'Z'  'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'
 'Y'  'Z'  'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'
 'Z'  'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'

julia> cleanup("a & b & z")
3-element Vector{Char}:
 'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
 'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)
 'Z': ASCII/Unicode U+005A (category Lu: Letter, uppercase)

julia> match_length("funrun",10)
"funrunfunr"

julia> cyphertext = "I want spearmint gum" → "Gimme!"
"OEMZXYXQMVSQZFKAU"

julia> cyphertext ← "Gimme!"
"IWANTSPEARMINTGUM"
3 Likes

From this post, one could also clean the non-alphabetic characters in the input message as follows:

join([c for c in message if 'a' <= c <= 'z' || 'A' <= c <= 'Z'])

NB: updated the encrypt function above with half of that because added it after uppercasing.

2 Likes

Yeah, it is more clear to stick with letters. I’ve updated it above.

I think this version is really clear. I was doing something along these lines to post but yours is better. One thing to note is that if you do a Vigenère cipher with bytes, then the whole dance goes away and you get what looks like junk for your cypher text, which could be good or bad depending on your preferences.

encrypt(message::String, codeword::String, + = +) =
    String([a + b for (a, b) in zip(
        codeunits(message),
        Iterators.cycle(codeunits(codeword))
    )])

decrypt(message::String, codeword::String) =
    encrypt(message, codeword, -)
julia> c = encrypt("This is a message", "code")
"\xb7\xd7\xcd\u603\xd8ׅď\xd1\xca\xd6\xe2\xc5\xcc\xc8"

julia> decrypt(c, "code")
"This is a message"
5 Likes

That’s cool!

I also like the operator-as-argument to avoid copy-pasting the encoder code over to decoder. I’ve edited that to the “teaching” version above. Applying it to the “minimalist” version:

→(a::Char,b::Char, ± = +) = 'A'+((a-'A')±(b-'A')+26)%26
←(a::Char,b::Char) = →(a,b,-)

→(message::String,codeword::String, ↔ = →) = String(collect(message) .↔ collect(match_length(codeword,length(message))))
←(message::String,codeword::String) = →(message,codeword,←)
match_length(word,len) = (word^len)[1:len]
4 Likes

Could you please shed some light on the magic in this code line?
What is the name for this in the manual?

Do not understand in particular: (a) The function takes two chars as input but where are they? (b) how the matrix of chars is then generated from the input StepRange.

Thank you.

1 Like

It’s exactly the same as this:

julia> num = 0:9;
julia> num .+ num'
10×10 Matrix{Int64}:
 0   1   2   3   4   5   6   7   8   9
 1   2   3   4   5   6   7   8   9  10
 2   3   4   5   6   7   8   9  10  11
 3   4   5   6   7   8   9  10  11  12
 ⋮                   ⋮
 6   7   8   9  10  11  12  13  14  15
 7   8   9  10  11  12  13  14  15  16
 8   9  10  11  12  13  14  15  16  17
 9  10  11  12  13  14  15  16  17  18

So you have a collection num which acts like a column vector, and a transposed collection num' which acts like a row vector. Broadcasting the + operator fills out the full matrix over all columns and rows. And even though I could write +.(num,num') and get the same thing, it looks a little nicer to put the operator in the middle.

3 Likes

Several fuses were blown. Thank you for helping me restore power to almost every floor :bulb:

Would the binary operators that support infix notation be the right section in the manual covering this topic?

2 Likes

Is this what you’re looking for?
https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting
or
https://docs.julialang.org/en/v1/manual/mathematical-operations/#man-dot-operators
Admittedly, it took me a while to find broadcasting in the manual since it’s a subtopic only.

1 Like

@baggepinnen, thanks for the feedback. My difficulties were not so much with broadcasting as with the →(a,b) function. It started with a mistaken assumption that it was just a visual decoration to help @weymouth’s daughter, and that a function name like f(a,b) would do the same job.

It seems that for a number of symbols used in function names with two arguments, Julia’s interpreter allows them to be used as binary operators as well: a → b then becomes another way to call the function →(a,b). We would not have had this extra magic if the function had been defined as f(a,b) in the first place.

1 Like

Dreaming: Perhaps include something from Julia’s unicode set – good for funny output. Or encode an expression so that it is runnable when decrypted?

1 Like

Haha. That’s a great idea!

→(a::Char,b::Char) = '🌽'+((a-'A')+(b-'A'))%26
←(a::Char,b::Char) = 'A'+((a-'🌽')-(b-'A')+26)%26

gives:

julia> cypher = "ATTACKATDAWN" → "TURING"
"🍐🍊🍇🍅🍌🍍🍐🍊🍑🍅🍆🍐"

julia> cypher ← "TURING"
"ATTACKATDAWN"

julia> 'A':'Z' .→ permutedims('A':'Z')
26×26 Matrix{Char}:
 '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'
 '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'
 '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'
 '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'
 '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'
 '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'
 ⋮                        ⋮                        ⋮                        ⋮                        ⋮                        ⋮
 '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'
 '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'
 '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'
 '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'
 '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'

which may be my favorite matrix ever…

8 Likes

That is what they call the Green’s algebra?

2 Likes

This is a wonderful topic for teaching school kids and young undergraduates too. I showed the basic Caesar cypher to my 12 year-old son and some basic code/pseudo-code to go with it. One thing that we had fun doing was investigating which language had the hardest alphabet to crack for the Caesar cypher: while English and French have 26 letters, Italian has 21 letters, Spanish 27 (used to be 30 until not so long ago), Portuguese and classic latin have 23. Which is the hardest to crack? A similar question can be asked of Vigenère/Bellaso cypher.

Edit: The wikipedia pages on Vigenère and Bellaso makes for a fascinating read. Vigenère attributed the original cypher to Bellaso and improved it. The poor Bellaso has been forgotten a little since, but he was a superstar then. The cypher fascinated the great intellects of the time and took centuries to crack.

There’s an interesting math article listed there (sadly it is “gated”) and a more accessible blog post:

Biermann, Norbert (2018). “Analysis of Giouan Battista Bellaso’s cipher challenges of 1555”. Cryptologia . 42 (5): 381–407

2 Likes

One thing that we had fun doing was investigating which language had the hardest alphabet to crack for the Caesar cypher: while English and French have 26 letters, Italian has 21 letters, Spanish 27 (used to be 30 until not so long ago), Portuguese and classic latin have 23. Which is the hardest to crack?

Chinese with 56,000 letters

https://studycli.org/chinese-characters/number-of-characters-in-chinese/

https://www.hutong-school.com/how-many-chinese-characters-are-there

Or would you prefer Chinese with 65,535 letters plus NULL

2 Likes

I wish discourse had a laughing-emoji button. :rofl:

Awesome! We should call it the “Weymouth Matrix”. What are its eigenfruits?

2 Likes

There’s one MIT PhD on the case here.

2 Likes