BYU logo Computer Science

Project 4 — Cryptography

In cryptography, we take the original “plaintext”, and encrypt it under the control of a key word, yielding an unintelligible “ciphertext”.

encryption

Decryption goes in the opposite direction, using the key to recover the plaintext from the ciphertext. Anyone who intercepts the ciphertext cannot make any sense of it without the key.

decryption

Substitution ciphers

A substitution cipher is one form of cryptography. Someone who wants to encrypt a message constructs a key, consisting of a mapping of letters from plaintext to ciphertext. For example:

plaintextABCDEFGHIJKLMNOPQRSTUVWXYZ
ciphertextXYZABCDEFGHIJKLMNOPQRSTUVW

To encrypt, you find a letter in the “plaintext” line of the table and substitute the corresponding letter in the “ciphertext” line. For example:

Plaintext: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Ciphertext: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

Deriving a key from a password

If you want to communicate with a friend using a substitution cipher, you need to give them the entire key in the form of a table, as shown above. To make this easier, we could derive the key from a password. Here is one way to do that, by creating a list of the characters a--z, except in a different order:

  1. Convert the password into lowercase.
  2. Remove any non-alphabetic characters in the password.
  3. Add all of the characters in the password to the key, as long as they are not already in the key.
  4. Add all of the rest of the characters in the alphabet to the key, in alphabetic order.

For example, if the password is Bananas!, then the key is:

['b', 'a', 'n', 's', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'o', 'p', 'q', 'r', 't', 'u', 'v', 'w', 'x', 'y', 'z']

To use this key, you would use a constant:

ALPHABET = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

Then a in plaintext gets mapped to b in ciphertext, b gets mapped to a, c gets mapped to n, and so forth.

Reminder: Notice that a character can never be in the key twice. This is really important! You would otherwise be mapping more than one character in the plaintext to the same character in ciphertext. It would be difficult to decrypt in this case!

Setup

For this project, you are going to encrypt and decrypt files using a substitution cipher, where the key is derived from a password, just like what is shown above.

To get started, download project4.zip. You will write all of your code in the file crypto.py. We have written the function declarations, documentation, and doctests for you. You are welcome to write more doctests!

To be successful, write one function at a time, in the order shown, and be sure the function passes all of its doctests before moving to the next function.

Computing the key

Write the function compute_key(password). This function takes one parameter:

  • password: a password, any sequence of characters

The function returns a list of 26 characters. This function uses the algorithm described above under the heading Deriving a key from a password. When you write this function, use the constant ALPHABET, which is defined in the code for you. This constant contains the lowercase letters from a-z in a list.

Remember, ALPHABET is a constant. Do not change its contents. We use uppercase names for constants as a convention to tell other programmers to treat them as a constant.

Encrypting a character

Write the function encrypt_character(source, key, character). This function takes three parameters:

  • source: a list of 26 characters
  • key: a list of 26 characters
  • character: a single character

The function returns the encrypted form of the character, using a substitution cipher. To encrypt the character, convert it to lowercase, and then find it in the source list. Take its position, and use the corresponding position in the key list to find the character it should be encrypted as. If the character is not in the source list, then just return it.

For example, consider these two lists:

source = ['a', 'b', 'c', 'd', ...]
slug   = ['z', 'a', 'b', 'c', ...]

In this case, if we want to encrypt the character c, it would become b. If we want to encrypt ?, it stays as ?.

Be sure to preserve case. So if you are encrypting c it becomes b, but if you are encrypting C, it becomes B.

Notice That for the doctests we compute the key first, then we can use it for testing encrypt_character():

>>> z_key = compute_key('z')
>>> encrypt_character(ALPHABET, z_key, 'A')
'Z'

There is also a small doctest quirk. Inside the text of a doctest the '\n' character must be written with 2 backlashes '\\n'. The reason is that the '\n' expands to an actual newline in the Doctest which messes up the test. Using '\\n' represents that a newline is desired.

Encrypting a string

Write the function encrypt_string(source, key, s). This function takes three parameters:

  • source: a list of 26 characters
  • key: a list of 26 characters
  • s: a string of arbitrary length

The function returns the encrypted string, using a substitution cipher. It uses encrypt_character() to encrypt each character in the string.

Decrypting a string

Write the function decrypt_string(source, key, s). This function takes three parameters:

  • source: a list of 26 characters
  • key: a list of 26 characters
  • s: a string of arbitrary length

The function returns the decrypted string, using a substitution cipher.

This is a great chance to appreciate a small, satisfying moment of decomposition. The first important feature of decomposition is the ability to proceed in steps smaller than the whole program, testing each step on its own. You’ve done that many times. Another key advantage of decomposition is re-use of code in multiple places.

You can completely implement the decrypt_string() function with one line of code.

Encrypting and decrypting files

Now you are ready to encrypt and decrypt files! Write the function encrypt_file(filename, password) and then write the function decrypt_file(filename, password). These functions both take two parameters:

  • filename: the filename to encrypt
  • password: the password to use to encrypt the file

The encrypt_file() function reads the file, encrypts every line, and prints out the encrypted lines. The decrypt_file() function reads the file, decrypts every line, and prints out the decrypted lines. Both functions uses password to generate a key for encryption and should use the ALPHABET constant as the source list.

Reminder: To print out the encrypted or decrypted lines, you should use print(encrypted_line, end='') or print(decrypted_line, end=''). This will ensure that you don’t print an extra newline after the line.

These two functions do not have doctests because they use print() instead of return. We have included both plaintext and ciphertext versions of several files so you can compare your results.

Command line arguments

To run the program and test your file encrypting and decrypting functions, use the following:

python crypto.py -encrypt [key] [filename]
python crypto.py -decrypt [key] [filename]

For example:

python crypto.py -encrypt Franklin independence.txt

This will result in:

Wbkj cj tbk Amuqsk ml buhfj kvkjts ct rkamhks jkakssfqy lmq mjk okmogk
tm ncssmgvk tbk omgctcafg rfjns wbcab bfvk amjjkatkn tbkh wctb fjmtbkq
fjn tm fssuhk fhmji tbk omwkqs ml tbk kfqtb, tbk skofqftk fjn kpufg
stftcmj tm wbcab tbk Gfws ml Jftuqk fjn ml Jftuqk's Imn kjtctgk tbkh,
f nkakjt qksokat tm tbk mocjcmjs ml hfjecjn qkpucqks tbft tbky sbmugn
nkagfqk tbk afusks wbcab chokg tbkh tm tbk skofqftcmj.

Wk bmgn tbksk tqutbs tm rk skgl-kvcnkjt, tbft fgg hkj fqk aqkftkn
kpufg, tbft tbky fqk kjnmwkn ry tbkcq Aqkftmq wctb akqtfcj ujfgckjfrgk
Qcibts, tbft fhmji tbksk fqk Gclk, Gcrkqty fjn tbk ouqsuct ml
Bfoocjkss.

And decrypting:

python crypto.py -decrypt Franklin independence-ciphertext.txt

This will result in:

When in the Course of human events it becomes necessary for one people
to dissolve the political bands which have connected them with another
and to assume among the powers of the earth, the separate and equal
station to which the Laws of Nature and of Nature's God entitle them,
a decent respect to the opinions of mankind requires that they should
declare the causes which impel them to the separation.

We hold these truths to be self-evident, that all men are created
equal, that they are endowed by their Creator with certain unalienable
Rights, that among these are Life, Liberty and the pursuit of
Happiness.

Putting the output in a file

In can be really helpful to put the output of your program into a file. You can do this by using the > character. For example:

python crypto.py -encrypt Franklin independence.txt > my-independence-ciphertext.txt

This will put the output into a file called my-independence-ciphertext.txt. Notice how we have been careful to use a different filename, my-independence-ciphertext.txt, to avoid overwriting the ciphertext that came with the assignment!

Now you can compare the two versions to be sure they are exact:

diff -u independence-ciphertext.txt my-independence-ciphertext.txt

If there is no output, they are identical. (We know, this is sort of anti-climactic, but this is how the diff command works.)

You can likewise do the same with decryption:

python crypto.py -decrypt Franklin independence-ciphertext.txt > my-independence-plaintext.txt
diff independence.txt my-independence-plaintext.txt

If the results of diff show something like this, then you have a mistake in how you are encrypting or decrypting:

% diff -u independence-ciphertext.txt my-independence-ciphertext.txt
--- independence-ciphertext.txt 2022-02-17 16:35:02.000000000 -0700
+++ my-independence-ciphertext.txt      2022-02-17 16:41:42.000000000 -0700
@@ -1,11 +1,22 @@
 Wbkj cj tbk Amuqsk ml buhfj kvkjts ct rkamhks jkakssfqy lmq mjk okmogk
+
 tm ncssmgvk tbk omgctcafg rfjns wbcab bfvk amjjkatkn tbkh wctb fjmtbkq
+
 fjn tm fssuhk fhmji tbk omwkqs ml tbk kfqtb, tbk skofqftk fjn kpufg
+
 stftcmj tm wbcab tbk Gfws ml Jftuqk fjn ml Jftuqk's Imn kjtctgk tbkh,
+
 f nkakjt qksokat tm tbk mocjcmjs ml hfjecjn qkpucqks tbft tbky sbmugn
+
 nkagfqk tbk afusks wbcab chokg tbkh tm tbk skofqftcmj.

+
+
 Wk bmgn tbksk tqutbs tm rk skgl-kvcnkjt, tbft fgg hkj fqk aqkftkn
+
 kpufg, tbft tbky fqk kjnmwkn ry tbkcq Aqkftmq wctb akqtfcj ujfgckjfrgk
+
 Qcibts, tbft fhmji tbksk fqk Gclk, Gcrkqty fjn tbk ouqsuct ml
+
 Bfoocjkss.
+

This output shows a ”-” next to lines that are in the first file (independence-ciphertex.txt) but not in the second file (my-independence-ciphertext.txt), and then a ”+” next to lines that are in the second file but not the first file. Notice that there are extra newlines in the second file that are not in the first file.

Parsing command line arguments

When you run a program from the command line, as we are showing you above, you can pass arguments to Python. These are stored automatically in a variable called sys.argv. This is actually a list! The arguments start at index 1 in the list, so we can get them with:

args = sys.argv[1:]

You can then see the rest of the code where we check for each possible argument and call encrypt_file() or decrypt_file() as appropriate.

A little bit of history

Substitution ciphers have been around for a long time — dating back at least to Julius Caesar (one kind of substitution cipher is even called a Caesar cipher). You may have heard of the Enigma system used in World War II. It performed character-to-character encryption like this, but shifted to a different key for each character. For a great introduction to the history of cryptography, see The Code Book, by Simon Singh. You could also watch Breaking the Codes, a BBC series about Alan Turing, who helped break Enigma during World War II, or the movie The Imitation Game.

breaking the codes movie

Mystery ciphertext

We have given you a mystery-ciphertext.txt. For fun, see if you can guess the password that breaks this cipher. It is a dictionary word, not a proper noun, that you would find in the scriptures.

Submit

You need to submit:

  • crypto.py

Points

This project is worth 50 points.

TaskDescriptionPoints
Computing the keyYour solution works15
Encrypting a characterYour solution works10
Encrypting a stringYour solution works10
Decrypting a stringYour solution works5
Encrypting a fileYour solution works5
Decrypting a fileYour solution works5

Credits

This project is based on one offered in CS 106A at Stanford University.