Vigenère cipher

weymouth · October 13, 2021, 3:28pm

My daughter is learning about the Vigenère cipher in school and I was helping out (after reading the wiki). They are doing it by hand in school, but it seemed like it might be a good example to start showing her code, since it’s very tedious and error prone by hand.

Here’s a quick implementation, but it’s a little clunky for a kid to follow. In particular, the encode → and decode ← are just modulo addition and subtraction, but this is obfuscated by all the shifting by 'A'. The conversion from Strings to vectors of Char and back is also hiding the important part of that line, which is repeating the code word to match the message length.

Any suggestions?

message = "VIGENERE"
codeword = "GREEN"

→(a::Char,b::Char) = 'A'+((a-'A')+(b-'A'))%26
←(a::Char,b::Char) = 'A'+((a-'A')-(b-'A')+26)%26

→(message::String,codeword::String) = String(collect(message) .→ collect(match_length(codeword,length(message))))
←(message::String,codeword::String) = String(collect(message) .← collect(match_length(codeword,length(message))))
match_length(word,len) = (word^len)[1:len]

rafael.guerra · October 13, 2021, 5:51pm

Thanks for sharing this brilliant code.
You should post it at the Rosetta site, as it looks much better than the old Julia code posted there (would be tempted to say the same applies when we compare it to most other languages")

Not sure how old your daughter is but it might be appropriate to also provide an implementation where the Vigenère table is built, like their exercise on paper and the description on Wikipedia.

Tried this below, but simpler would probably be better.

function encrypt(message::String, codeword::String)
	message, codeword = uppercase.((message, codeword))
	message = join([c for c in message if 'A' <= c <= 'Z'])
	codeword = match_length(codeword,length(message))
	join(vsqr[c - 'A' + 1, m - 'A' + 1] for (m,c) in zip(message, codeword))
end

function decrypt(ciphertext::String, codeword::String)
	ciphertext, codeword = uppercase.((ciphertext, codeword))
	codeword = match_length(codeword,length(ciphertext))
	join('A' - 1 + findfirst(t, join(vsqr[c - 'A' + 1, :])) for (t,c) in zip(ciphertext, codeword))
end

match_length(word,len) = (word^len)[1:len]

# Vigenère square (or tabula recta)
const vsqr = reshape(permutedims(hcat([circshift('A':'Z',-i) for i in 0:25]...)), 26, 26)

message = "VIGENERE"
codeword = "GREEN"
ciphertext = encrypt(message, codeword)
decrypt(ciphertext, codeword)

NB: added the uppercase() function and removed all non-alphabetic characters from input.

weymouth · October 13, 2021, 7:05pm

I think the table should be an result of the shifting alphabet concept! I do like the idea of being more robust to input strings.

Here’s version 2:

number(letter) = letter-'A'         # number of letters between `letter` and 'A'
letter(number) = 'A'+(number+26)%26 # the `number`th letter after 'A' (negative `number` means it should be before 'A')
# Loop around if you run out of letters in the alphabet. So "A-1=Z" and "A+27=B".

→(a::Char,b::Char) = letter(number(a)+number(b)) # encode using +
←(a::Char,b::Char) = letter(number(a)-number(b)) # decode using -

isletter(a) = a in 'A':'Z' # is `a` inside the letters from 'A' to 'Z'?
cleanup(str::String) = filter(isletter, collect(uppercase.(str))) # make all letters uppercase and filter out non-letters
match_length(word,len) = repeat(word,len)[1:len]  # repeat the codeword until you match the length

function →(message::String,codeword::String, ↔ = →)
    plaintext = cleanup(message)
    key = match_length(cleanup(codeword),length(plaintext))
    return String(plaintext .↔ key)
end
←(message::String,codeword::String) = →(message,codeword,←)

Trying to slow down a little and add comments to help her (she’s 12 and hasn’t taken any coding yet).

julia> number('A'),number('L')
(0, 11)

julia> letter(2),letter(-2)
('C', 'Y')

julia> alpha = 'A':'Z';
julia> alpha .→ permutedims(alpha)
26×26 Matrix{Char}:
 'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z'
 'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z'  'A'
 'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z'  'A'  'B'
 'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z'  'A'  'B'  'C'
 ⋮                        ⋮                        ⋮                        ⋮                        ⋮                        ⋮
 'W'  'X'  'Y'  'Z'  'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'
 'X'  'Y'  'Z'  'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'
 'Y'  'Z'  'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'
 'Z'  'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'

julia> cleanup("a & b & z")
3-element Vector{Char}:
 'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
 'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)
 'Z': ASCII/Unicode U+005A (category Lu: Letter, uppercase)

julia> match_length("funrun",10)
"funrunfunr"

julia> cyphertext = "I want spearmint gum" → "Gimme!"
"OEMZXYXQMVSQZFKAU"

julia> cyphertext ← "Gimme!"
"IWANTSPEARMINTGUM"

rafael.guerra · October 13, 2021, 7:20pm

From this post, one could also clean the non-alphabetic characters in the input message as follows:

join([c for c in message if 'a' <= c <= 'z' || 'A' <= c <= 'Z'])

NB: updated the encrypt function above with half of that because added it after uppercasing.

weymouth · October 13, 2021, 7:25pm

Yeah, it is more clear to stick with letters. I’ve updated it above.

StefanKarpinski · October 13, 2021, 8:20pm

I think this version is really clear. I was doing something along these lines to post but yours is better. One thing to note is that if you do a Vigenère cipher with bytes, then the whole dance goes away and you get what looks like junk for your cypher text, which could be good or bad depending on your preferences.

encrypt(message::String, codeword::String, + = +) =
    String([a + b for (a, b) in zip(
        codeunits(message),
        Iterators.cycle(codeunits(codeword))
    )])

decrypt(message::String, codeword::String) =
    encrypt(message, codeword, -)

julia> c = encrypt("This is a message", "code")
"\xb7\xd7\xcd\u603\xd8ׅď\xd1\xca\xd6\xe2\xc5\xcc\xc8"

julia> decrypt(c, "code")
"This is a message"

weymouth · October 13, 2021, 8:48pm

That’s cool!

I also like the operator-as-argument to avoid copy-pasting the encoder code over to decoder. I’ve edited that to the “teaching” version above. Applying it to the “minimalist” version:

→(a::Char,b::Char, ± = +) = 'A'+((a-'A')±(b-'A')+26)%26
←(a::Char,b::Char) = →(a,b,-)

→(message::String,codeword::String, ↔ = →) = String(collect(message) .↔ collect(match_length(codeword,length(message))))
←(message::String,codeword::String) = →(message,codeword,←)
match_length(word,len) = (word^len)[1:len]

rafael.guerra · October 13, 2021, 8:53pm

Could you please shed some light on the magic in this code line?
What is the name for this in the manual?

Do not understand in particular: (a) The → function takes two chars as input but where are they? (b) how the matrix of chars is then generated from the input StepRange.

Thank you.

weymouth · October 13, 2021, 9:19pm

It’s exactly the same as this:

julia> num = 0:9;
julia> num .+ num'
10×10 Matrix{Int64}:
 0   1   2   3   4   5   6   7   8   9
 1   2   3   4   5   6   7   8   9  10
 2   3   4   5   6   7   8   9  10  11
 3   4   5   6   7   8   9  10  11  12
 ⋮                   ⋮
 6   7   8   9  10  11  12  13  14  15
 7   8   9  10  11  12  13  14  15  16
 8   9  10  11  12  13  14  15  16  17
 9  10  11  12  13  14  15  16  17  18

So you have a collection num which acts like a column vector, and a transposed collection num' which acts like a row vector. Broadcasting the + operator fills out the full matrix over all columns and rows. And even though I could write +.(num,num') and get the same thing, it looks a little nicer to put the operator in the middle.

rafael.guerra · October 13, 2021, 9:55pm

Several fuses were blown. Thank you for helping me restore power to almost every floor

Would the binary operators that support infix notation be the right section in the manual covering this topic?

baggepinnen · October 14, 2021, 5:01am

Is this what you’re looking for?
https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting
or
https://docs.julialang.org/en/v1/manual/mathematical-operations/#man-dot-operators
Admittedly, it took me a while to find broadcasting in the manual since it’s a subtopic only.

rafael.guerra · October 14, 2021, 6:29am

@baggepinnen, thanks for the feedback. My difficulties were not so much with broadcasting as with the →(a,b) function. It started with a mistaken assumption that it was just a visual decoration to help @weymouth’s daughter, and that a function name like f(a,b) would do the same job.

It seems that for a number of symbols used in function names with two arguments, Julia’s interpreter allows them to be used as binary operators as well: a → b then becomes another way to call the function →(a,b). We would not have had this extra magic if the function had been defined as f(a,b) in the first place.

RogerP · October 17, 2021, 9:19am

Dreaming: Perhaps include something from Julia’s unicode set – good for funny output. Or encode an expression so that it is runnable when decrypted?

weymouth · October 17, 2021, 8:46pm

Haha. That’s a great idea!

→(a::Char,b::Char) = '🌽'+((a-'A')+(b-'A'))%26
←(a::Char,b::Char) = 'A'+((a-'🌽')-(b-'A')+26)%26

gives:

julia> cypher = "ATTACKATDAWN" → "TURING"
"🍐🍊🍇🍅🍌🍍🍐🍊🍑🍅🍆🍐"

julia> cypher ← "TURING"
"ATTACKATDAWN"

julia> 'A':'Z' .→ permutedims('A':'Z')
26×26 Matrix{Char}:
 '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'
 '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'
 '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'
 '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'
 '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'
 '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'
 ⋮                        ⋮                        ⋮                        ⋮                        ⋮                        ⋮
 '🍒'  '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'
 '🍓'  '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'
 '🍔'  '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'
 '🍕'  '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'
 '🍖'  '🌽'  '🌾'  '🌿'  '🍀'  '🍁'  '🍂'  '🍃'  '🍄'  '🍅'  '🍆'  '🍇'  '🍈'  '🍉'  '🍊'  '🍋'  '🍌'  '🍍'  '🍎'  '🍏'  '🍐'  '🍑'  '🍒'  '🍓'  '🍔'  '🍕'

which may be my favorite matrix ever…

rafael.guerra · October 17, 2021, 9:04pm

That is what they call the Green’s algebra?

ptoche · October 17, 2021, 9:04pm

This is a wonderful topic for teaching school kids and young undergraduates too. I showed the basic Caesar cypher to my 12 year-old son and some basic code/pseudo-code to go with it. One thing that we had fun doing was investigating which language had the hardest alphabet to crack for the Caesar cypher: while English and French have 26 letters, Italian has 21 letters, Spanish 27 (used to be 30 until not so long ago), Portuguese and classic latin have 23. Which is the hardest to crack? A similar question can be asked of Vigenère/Bellaso cypher.

Edit: The wikipedia pages on Vigenère and Bellaso makes for a fascinating read. Vigenère attributed the original cypher to Bellaso and improved it. The poor Bellaso has been forgotten a little since, but he was a superstar then. The cypher fascinated the great intellects of the time and took centuries to crack.

There’s an interesting math article listed there (sadly it is “gated”) and a more accessible blog post:

Biermann, Norbert (2018). “Analysis of Giouan Battista Bellaso’s cipher challenges of 1555”. Cryptologia . 42 (5): 381–407

StevenSiew · October 17, 2021, 9:19pm

One thing that we had fun doing was investigating which language had the hardest alphabet to crack for the Caesar cypher: while English and French have 26 letters, Italian has 21 letters, Spanish 27 (used to be 30 until not so long ago), Portuguese and classic latin have 23. Which is the hardest to crack?

Chinese with 56,000 letters

https://studycli.org/chinese-characters/number-of-characters-in-chinese/

https://www.hutong-school.com/how-many-chinese-characters-are-there

Or would you prefer Chinese with 65,535 letters plus NULL

ptoche · October 17, 2021, 9:25pm

I wish discourse had a laughing-emoji button.

RogerP · October 18, 2021, 5:16pm

Awesome! We should call it the “Weymouth Matrix”. What are its eigenfruits?

rafael.guerra · October 18, 2021, 5:26pm

There’s one MIT PhD on the case here.

Topic		Replies	Views
Julia slower than Python for finding pangrams? Performance question	28	1731	April 5, 2024
Fun One Liners General Usage	142	13349	October 16, 2022
The Hegemony Of `==` Must End; Welcome The `⩵` Upstart Offtopic syntax	46	6484	July 7, 2021
Efficient creation of power series matrix or array of arrays Performance question	23	4922	March 29, 2019
Seven Lines of Julia (examples sought) General Usage	180	28399	August 1, 2025

Vigenère cipher

Related topics