Project 4 — Cryptography
In cryptography, we take the original “plaintext”, and encrypt it under the control of a key word, yielding an unintelligible “ciphertext”.
Decryption goes in the opposite direction, using the key to recover the plaintext from the ciphertext. Anyone who intercepts the ciphertext cannot make any sense of it without the key.
Substitution ciphers
A substitution cipher is one form of cryptography. Someone who wants to encrypt
a message constructs a key
, consisting of a mapping of letters from plaintext
to ciphertext. For example:
plaintext | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ciphertext | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W |
To encrypt, you find a letter in the “plaintext” line of the table and substitute the corresponding letter in the “ciphertext” line. For example:
Plaintext: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Ciphertext: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD
Deriving a key from a password
If you want to communicate with a friend using a substitution cipher, you need
to give them the entire key in the form of a table, as shown above. To make this
easier, we could derive the key from a password. Here is one way to do that, by
creating a list of the characters a--z
, except in a different order:
- Convert the password into lowercase.
- Remove any non-alphabetic characters in the password.
- Add all of the characters in the password to the key, as long as they are not already in the key.
- Add all of the rest of the characters in the alphabet to the key, in alphabetic order.
For example, if the password is Bananas!
, then the key is:
['b', 'a', 'n', 's', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'o', 'p', 'q', 'r', 't', 'u', 'v', 'w', 'x', 'y', 'z']
To use this key, you would use a constant:
ALPHABET = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
Then a
in plaintext gets mapped to b
in ciphertext, b
gets mapped to a
,
c
gets mapped to n
, and so forth.
Reminder: Notice that a character can never be in the key twice. This is really important! You would otherwise be mapping more than one character in the plaintext to the same character in ciphertext. It would be difficult to decrypt in this case!
Setup
For this project, you are going to encrypt and decrypt files using a substitution cipher, where the key is derived from a password, just like what is shown above.
To get started, download project4.zip. You will write all
of your code in the file crypto.py
. We have written the function declarations,
documentation, and doctests for you. You are welcome to write more doctests!
To be successful, write one function at a time, in the order shown, and be sure the function passes all of its doctests before moving to the next function.
Computing the key
Write the function compute_key(password)
. This function takes one parameter:
password
: a password, any sequence of characters
The function returns a list of 26 characters. This function uses the algorithm
described above under the heading Deriving a key from a password
. When you
write this function, use the constant ALPHABET
, which is defined in the code
for you. This constant contains the lowercase letters from a-z
in a list.
Remember, ALPHABET is a constant. Do not change its contents. We use uppercase names for constants as a convention to tell other programmers to treat them as a constant.
Encrypting a character
Write the function encrypt_character(source, key, character)
. This function
takes three parameters:
source
: a list of 26 characterskey
: a list of 26 characterscharacter
: a single character
The function returns the encrypted form of the character, using a substitution
cipher. To encrypt the character, convert it to lowercase, and then find it in
the source
list. Take its position, and use the corresponding position in the
key
list to find the character it should be encrypted as. If the character is
not in the source
list, then just return it.
For example, consider these two lists:
source = ['a', 'b', 'c', 'd', ...]
slug = ['z', 'a', 'b', 'c', ...]
In this case, if we want to encrypt the character c
, it would become b
. If
we want to encrypt ?
, it stays as ?
.
Be sure to preserve case. So if you are encrypting c
it becomes b
, but if
you are encrypting C
, it becomes B
.
Notice That for the doctests we compute the key first, then we can use it for
testing encrypt_character()
:
>>> z_key = compute_key('z')
>>> encrypt_character(ALPHABET, z_key, 'A')
'Z'
There is also a small doctest quirk. Inside the text of a doctest the '\n'
character must be written with 2 backlashes '\\n'
. The reason is that the
'\n'
expands to an actual newline in the Doctest which messes up the test.
Using '\\n'
represents that a newline is desired.
Encrypting a string
Write the function encrypt_string(source, key, s)
. This function takes three
parameters:
source
: a list of 26 characterskey
: a list of 26 characterss
: a string of arbitrary length
The function returns the encrypted string, using a substitution cipher. It uses
encrypt_character()
to encrypt each character in the string.
Decrypting a string
Write the function decrypt_string(source, key, s)
. This function takes three
parameters:
source
: a list of 26 characterskey
: a list of 26 characterss
: a string of arbitrary length
The function returns the decrypted string, using a substitution cipher.
This is a great chance to appreciate a small, satisfying moment of decomposition. The first important feature of decomposition is the ability to proceed in steps smaller than the whole program, testing each step on its own. You’ve done that many times. Another key advantage of decomposition is re-use of code in multiple places.
You can completely implement the decrypt_string()
function with one line of
code.
Encrypting and decrypting files
Now you are ready to encrypt and decrypt files! Write the function
encrypt_file(filename, password)
and then write the function
decrypt_file(filename, password)
. These functions both take two parameters:
filename
: the filename to encryptpassword
: the password to use to encrypt the file
The encrypt_file()
function reads the file, encrypts every line, and prints
out the encrypted lines. The decrypt_file()
function reads the file, decrypts
every line, and prints out the decrypted lines. Both functions uses password
to generate a key for encryption and should use the ALPHABET
constant as the
source list.
Reminder: To print out the encrypted or decrypted lines, you should use
print(encrypted_line, end='')
or print(decrypted_line, end='')
. This will
ensure that you don’t print an extra newline after the line.
These two functions do not have doctests because they use print()
instead of
return
. We have included both plaintext and ciphertext versions of several
files so you can compare your results.
Command line arguments
To run the program and test your file encrypting and decrypting functions, use the following:
python crypto.py -encrypt [key] [filename]
python crypto.py -decrypt [key] [filename]
For example:
python crypto.py -encrypt Franklin independence.txt
This will result in:
Wbkj cj tbk Amuqsk ml buhfj kvkjts ct rkamhks jkakssfqy lmq mjk okmogk
tm ncssmgvk tbk omgctcafg rfjns wbcab bfvk amjjkatkn tbkh wctb fjmtbkq
fjn tm fssuhk fhmji tbk omwkqs ml tbk kfqtb, tbk skofqftk fjn kpufg
stftcmj tm wbcab tbk Gfws ml Jftuqk fjn ml Jftuqk's Imn kjtctgk tbkh,
f nkakjt qksokat tm tbk mocjcmjs ml hfjecjn qkpucqks tbft tbky sbmugn
nkagfqk tbk afusks wbcab chokg tbkh tm tbk skofqftcmj.
Wk bmgn tbksk tqutbs tm rk skgl-kvcnkjt, tbft fgg hkj fqk aqkftkn
kpufg, tbft tbky fqk kjnmwkn ry tbkcq Aqkftmq wctb akqtfcj ujfgckjfrgk
Qcibts, tbft fhmji tbksk fqk Gclk, Gcrkqty fjn tbk ouqsuct ml
Bfoocjkss.
And decrypting:
python crypto.py -decrypt Franklin independence-ciphertext.txt
This will result in:
When in the Course of human events it becomes necessary for one people
to dissolve the political bands which have connected them with another
and to assume among the powers of the earth, the separate and equal
station to which the Laws of Nature and of Nature's God entitle them,
a decent respect to the opinions of mankind requires that they should
declare the causes which impel them to the separation.
We hold these truths to be self-evident, that all men are created
equal, that they are endowed by their Creator with certain unalienable
Rights, that among these are Life, Liberty and the pursuit of
Happiness.
Putting the output in a file
In can be really helpful to put the output of your program into a file. You can
do this by using the >
character. For example:
python crypto.py -encrypt Franklin independence.txt > my-independence-ciphertext.txt
This will put the output into a file called my-independence-ciphertext.txt
.
Notice how we have been careful to use a different filename,
my-independence-ciphertext.txt
, to avoid overwriting the ciphertext that came
with the assignment!
Now you can compare the two versions to be sure they are exact:
diff -u independence-ciphertext.txt my-independence-ciphertext.txt
If there is no output, they are identical. (We know, this is sort of
anti-climactic, but this is how the diff
command works.)
You can likewise do the same with decryption:
python crypto.py -decrypt Franklin independence-ciphertext.txt > my-independence-plaintext.txt
diff independence.txt my-independence-plaintext.txt
If the results of diff
show something like this, then you have a mistake in
how you are encrypting or decrypting:
% diff -u independence-ciphertext.txt my-independence-ciphertext.txt
--- independence-ciphertext.txt 2022-02-17 16:35:02.000000000 -0700
+++ my-independence-ciphertext.txt 2022-02-17 16:41:42.000000000 -0700
@@ -1,11 +1,22 @@
Wbkj cj tbk Amuqsk ml buhfj kvkjts ct rkamhks jkakssfqy lmq mjk okmogk
+
tm ncssmgvk tbk omgctcafg rfjns wbcab bfvk amjjkatkn tbkh wctb fjmtbkq
+
fjn tm fssuhk fhmji tbk omwkqs ml tbk kfqtb, tbk skofqftk fjn kpufg
+
stftcmj tm wbcab tbk Gfws ml Jftuqk fjn ml Jftuqk's Imn kjtctgk tbkh,
+
f nkakjt qksokat tm tbk mocjcmjs ml hfjecjn qkpucqks tbft tbky sbmugn
+
nkagfqk tbk afusks wbcab chokg tbkh tm tbk skofqftcmj.
+
+
Wk bmgn tbksk tqutbs tm rk skgl-kvcnkjt, tbft fgg hkj fqk aqkftkn
+
kpufg, tbft tbky fqk kjnmwkn ry tbkcq Aqkftmq wctb akqtfcj ujfgckjfrgk
+
Qcibts, tbft fhmji tbksk fqk Gclk, Gcrkqty fjn tbk ouqsuct ml
+
Bfoocjkss.
+
This output shows a ”-” next to lines that are in the first file (independence-ciphertex.txt) but not in the second file (my-independence-ciphertext.txt), and then a ”+” next to lines that are in the second file but not the first file. Notice that there are extra newlines in the second file that are not in the first file.
Parsing command line arguments
When you run a program from the command line, as we are showing you above, you
can pass arguments to Python. These are stored automatically in a variable
called sys.argv
. This is actually a list! The arguments start at index 1
in
the list, so we can get them with:
args = sys.argv[1:]
You can then see the rest of the code where we check for each possible argument
and call encrypt_file()
or decrypt_file()
as appropriate.
A little bit of history
Substitution ciphers have been around for a long time — dating back at least to Julius Caesar (one kind of substitution cipher is even called a Caesar cipher). You may have heard of the Enigma system used in World War II. It performed character-to-character encryption like this, but shifted to a different key for each character. For a great introduction to the history of cryptography, see The Code Book, by Simon Singh. You could also watch Breaking the Codes, a BBC series about Alan Turing, who helped break Enigma during World War II, or the movie The Imitation Game.
Mystery ciphertext
We have given you a mystery-ciphertext.txt
. For fun, see if you can guess the
password that breaks this cipher. It is a dictionary word, not a proper noun,
that you would find in the scriptures.
Submit
You need to submit:
crypto.py
Points
This project is worth 50 points.
Task | Description | Points |
---|---|---|
Computing the key | Your solution works | 15 |
Encrypting a character | Your solution works | 10 |
Encrypting a string | Your solution works | 10 |
Decrypting a string | Your solution works | 5 |
Encrypting a file | Your solution works | 5 |
Decrypting a file | Your solution works | 5 |
Credits
This project is based on one offered in CS 106A at Stanford University.