XOR Cipher
- Summary
- You encrypt (and decrypt) a message using a secure XOR cipher stream
- Objectives
-
- Reinforce knowledge of arrays
- Practice using formatted input
- Apply bitwise operations for fun
and profit - Consider problem decomposition
Introduction
The XOR cipher is a simple, yet powerful cryptographic method for securing data. In its simplest form, only a secret key is necessary to perform both encryption and decryption using the bitwise exclusive OR operation, often denoted with a circle plus, ⊕
To encrypt, we simply XOR a plaintext message M with our secret key K so that M⊕K = E. To decrypt we simply XOR the encrypted message E with the same key, E⊕K = M. Conveniently, this means the same operation (or program, in our case) can be used to both encipher and decipher (or encrypt and decrypt).
If a fixed-length key is shorter than the message and used repeatedly
to encrypt pieces of the message, then a frequency analysis could be
used to break the encryption of text messages (in the same way you or
your parents may have
broken cryptogram
puzzles in the newspaper, which rely on the fact that certain letters
and combinations appear more often than others).
However, if the key is as long as the message, the only way to break
the encryption is to try every possible key. This is known as a
brute-force attack and is implausible in practice because
an N-bit key has 2N possibilities.
Securely sharing a long key is just as difficult as the original problem of sharing a long message. Moreover, repeated use of even a long key for multiple messages is also subject to frequency-based attacks. One way around these problems is to use a pseudo-random number generator (RNG) to generate an unpredictable but repeatable stream of keys. Then, only the (much shorter) initial seed for the generator needs to be shared (assuming both parties are using the same generator). Each block of the message is then encrypted using subsequent keys from the generator.
Details
As one might imagine, correctly using these ideas requires attention to a number of practical details.
Overview
In this assignment, you will use the C standard library function
rand to generate the sequence of random keys. (These
days, the widely-available function random is often
preferred over rand because the former is considered
more random
; for simplicity we'll be sticking to the
standard.)
Whichever generator is used, we must still deal with the fact that only some of the bits in the value returned are random. The remaining bits will likely be zero, and using them as part of our XOR key would make the results vulnerable to attack.
Once we figure out how many bits of the key are usable, we will then
need to pack together the bits of our message into a block that can
be XORed with the key. This is because our message text will be
stored as chars, while the key will
be some portion of the bits in an int returned by rand (or
perhaps long int returned by
random).
Finally, bitwise operations and casts (implicit or explicit) interact in ways that can be frustrating to the unwary user. Therefore, one must take care when shifting and combining bits across types and values.
The following sections provide further information about these requisite details.
Key stream
Suppose the message we want to encrypt is
. As 8-bit ASCII characters, this message would be
represented as follows.
My
secret
| Index | Char | Integer | Binary |
|---|---|---|---|
0 |
'M' |
77 |
01001101 |
1 |
'y' |
121 |
01111001 |
2 |
' ' |
32 |
00100000 |
3 |
's' |
115 |
01110011 |
4 |
'e' |
101 |
01100101 |
5 |
'c' |
99 |
01100011 |
6 |
'r' |
114 |
01110010 |
7 |
'e' |
101 |
01100101 |
8 |
't' |
116 |
01110100 |
With a seed of 161, the first two random keys
are 383803757
and 2110550296. If RAND_MAX uses only 31
bits, these would
be represented as follows.
| Decimal | Binary |
|---|---|
383803757 |
X0010110 11100000 01100001 01101101 |
2110550296 |
X1111101 11001100 01110001 00011000 |
We have inserted spaces to group the bits into bytes and made the leading bit X to indicate it is not usable as a random bit. Because the leftmost full byte is not usable, we will ignore it entirely.
Next we examine three possible ways to make use of these random keys. The first is the most straightforward, but reduces the number of random keys generated because most of the random bits are discarded. (More keys will therefore be needed and the RNG will begin to repeat the cycle of keys much sooner.) The second wastes far fewer bits, but requires more bitwise operations. The third (available as an extra credit option) requires significant attention to detail, but requires fewer XOR operations.
Approach 1: One key per character
The most straightforward approach would be to use each random key
value to cipher one character. In other words, the first
character 'M' would be encrypted with the first
key 383803757, the second character 'y'
would be encrypted with the second key 2110550296, and
so on.
Because each character (in this example) is a single byte but each key is several bytes, this approach would likely use the least significant byte of the random key being used as the XOR operand for a character, so that the encrypted text would become:
| Char | Encryption | Result | ||
|---|---|---|---|---|
| Input | Key | |||
'M' |
01001101 |
⊕ | 01101101 |
00100000 |
'y' |
01111001 |
⊕ | 00011000 |
01100001 |
Approach 2: Multiple characters per key
A second approach that makes better use of the bits in the random keys would first determine how many characters can be XORed from a given random key. In this example, where a character is a byte and there are three full bytes available with in the key, we could use the least significant bytes of the keys available in turn. For example:
| Char | Encryption | Result | ||
|---|---|---|---|---|
| Input | Key | |||
'M' |
01001101 |
⊕ | 01101101 |
00100000 |
'y' |
01111001 |
⊕ | 01100001 |
00011000 |
' ' |
00100000 |
⊕ | 11100000 |
11000000 |
At this point we will have exhausted the first key and can continue encrypting beginning with the second key:
| Char | Encryption | Result | ||
|---|---|---|---|---|
| Input | Key | |||
's' |
01110011 |
⊕ | 00011000 |
01101011 |
'e' |
01100101 |
⊕ | 01110001 |
00010100 |
'c' |
01100011 |
⊕ | 11001100 |
10101100 |
and so on.
Note that if we assume only the rightmost bits are preserved in a narrowing cast (from the wider key type to the narrower character type), we can use C's bitwise XOR operator to do the encryption, repeatedly shifting the appropriate number of bits off the key until we run out of bits and need to generate a new key.
Approach 3: One XOR per key [Extra Credit]
The previous approach makes efficient use of the keys, but it
requires many XOR operations. Assuming that each bitwise operation takes the
same amount of time, regardless of the number of bits involved (up
to the width of the computational hardware), we might instead try to
pack several chars of the message
together into a single int to
enable a single XOR with a given key.
Using this approach, the computation might look like the following. (Note how the character and key pairings have been reversed from Approach 2.)
| Input | 'My ': |
01001101 01111001 00100000 |
|---|---|---|
| Key | ⊕ | 11100000 01100001 01101101 |
| Result | 10101101 00011000 01001101 |
| Input | 'sec': |
01110011 01100101 01100011 |
|---|---|---|
| Key | ⊕ | 11001100 01110001 00011000 |
| Result | 10111111 00010100 01111011 |
Implementing this approach would require additional bitwise operations to accomplish the packing (before the per-key XOR) and unpacking of bits. Those attempting should be aware of the bitwise effects of widening casts on signed numbers.
Assignment
Overview
Write a program called encipher.c that implements the
requisite procedures specified below to make the following
main functional (without modification).
#ifndef NOMAIN
int
main (void)
{
char buf[MAXLEN];
srand (get_seed()); /* Initialize the RNG with a given key */
size_t len = read_data (buf,MAXLEN); /* Read chars into buf to be ciphered */
xor_encrypt (buf, len); /* Encrypt exactly len values in the buffer */
for (int c=0 ; c<len ; c++) /* Print ciphered values by chars because */
printf ("%c",buf[c]); /* the null terminator could appear within */
} // main
#endif
Your program must include exactly this block of code, including the wrapping preprocessor macro. (This will be used to efficiently test your code with a separate program.)
In order to capture the program output to a file for decryption, your program should not print any helpful user prompts.
Get Seed
To make the sequence of keys reproducible (and therefore the
enciphered text decipherable), we need to initialize the pseudo-random
number generator with a shared secret key. The standard library
function srand does this. You need to write the
function
unsigned int
get_seed (void);
that should read a line containing a single (non-negative) integer
number from user input and return the value as
an unsigned int.
While no initial prompt should be given, if there is any error in reading the number, an error message should be printed and another attempt made to read a correct seed value. (This process would continue indefinitely until a value is successfully read.)
Read Text
After reading the seed from the user, your program will need to read
the entirety of the message to be ciphered (up to max_len
characters) using the function
size_t
read_data(char buffer[], size_t max_len);
The procedure should read data—bytes as
chars from standard input—until
it fills the buffer or reaches
the end of file
. Of course, you'll be reading from terminal
input, but we'll see below that it too can run out. (Read the man
page for your preferred input method on how to detect the end of
file, often with
.)
feof(stdin)
The function should read as much data as is present (however many
lines, and beyond any zero value that might otherwise indicate the
null termination of a string. Because the data stream being read
could be the result of an XOR operation of an ASCII character with a
random set of bits, the result could be any other set of bits
(including '\0'). The function must therefore read until
it reaches the EOF. You will likely want to use getchar,
because scanf and fgets both stop reading
when they encounter a newline.
Likewise, no initial user prompt should be given. However, if your
program encounters any error in reading the text (perhaps as detected
by
), an error
message should be printed and the empty string placed in
ferror(stdin)buffer.
The number of characters read into the buffer should be
returned.
Note that
size_t
is a system-independent type (typically an unsigned integer) used to
represent the maximum size of any object. It is defined in several
places, including stdlib.h.
Identify Random Bits
The C standard library specifies that
rand will
return a value between 0 and RAND_MAX (a
value defined by stdlib.h).
It is therefore possible that
that not all the bits in the values returned by rand will
be usably random.
Thus, your next task will be to determine how many bits are usable
among the ints returned
by rand.
We will assume as a precondition that the value of
RAND_MAX in binary is a leading sequence of zeros
followed by a sequence of ones. Note that because many systems use an
arithmetic
right shift, we will also require that the argument not exceed
INT_MAX (defined in limits.h), which is to
say that the argument must be positive (when signed) so that the
leading/leftmost bit is always zero. (This assumption is not strictly
necessary, but it greatly simplifies your function for our purposes.)
Write the following function that determines the number of (binary) ones in any integer formatted in such a sequence.
size_t
bits_in_mask (int mask);
Examples
bits_in_mask (255) // 28-1
8
bits_in_mask (3)
2
bits_in_mask (1048575) // 220-1
20
bits_in_mask ((1<<18)-1) // 218-1
18
bits_in_mask (RAND_MAX)
31
Note that the last value is system-dependent and not a universal property of the C ecosystem.
Encrypting Text
Using the examples outlined above as your guide, you will need to use
the sizeof
operator, the RAND_MAX constant and
the bits_in_mask function to implement the XOR cipher
stream. Complete the function
void
xor_encrypt (char buffer[], size_t len)
{
size_t chars_per_key = ???; // Number of chars cipherable per random int
...
} // encrypt
Note that Approach 1 above will not need to
calculate chars_per_key. Approaches 2 and 3 will require
this value be calculated using system-independent operators and named
constants. In the examples, the calculated value would
be 3.
As indicated by main above, encrypt should
transform the first len characters of buf
in place using calls to rand to generate the keys.
Regardless of which approach is used, the same function can both
encrypt and decrypt the text (assuming srand is called
with the appropriate seed).
Additional Requirements
-
You may define
MAXLENto whatever value you like. -
Your program should use no global variables
(but
consts and#defined names are acceptable and encouraged). - Except in the case of I/O errors, the ciphered message is the only output your program should print.
-
Your program should not hardcode the size of any data types or
return ranges. Instead it should use values defined in C standard
library headers and make use of
the
sizeofoperator. -
Although when immediately multiplied by the result of a
sizeofoperator (i.e., in the same expression), the magic number8is widely accepted and understood, let us hold fast to using the more portable standard nameCHAR_BIT(also defined inlimits.h), in the unlikely event you wanted to run your code on a machine that uses more than 8 bits per byte.
Tips
-
To signal the end of file when typing input in the terminal window, type Control-D. One typically needs to type it twice, because the first causes all the characters typed on a line to be sent from the terminal to the read buffer for your program, while the second attempts to do the same, but finding no waiting characters, it finally signals end of file.
Typing Enter has the simultaneous effect of sending a newline character into the input stream and moving the characters from the terminal to the read buffer for your program. After that, only a single Control-D would be necessary.
-
Accelerate your testing by packaging all inputs into a text file
and using
shell I/O redirection, as follows.
./encipher < seed-encipher.txt
In the example above, the input will come from the file specified,seed-encipher.txt, and the end of file will be detected automatically. -
Capture encrypted output with I/O redirection:
./encipher > decipher.txt
In the example above, all the output (the ciphertext) is deposited in the filedecipher.txt -
If you place your seed in a file,
echo my_seed_num > seed.txt
(wheremy_seed_numare the actual digits you use in your seed for the key generator), then you can concatenate your seed file and the ciphered text together,cat seed.txt decipher.txt > seed-decipher.txt
and use it as input (as in the first tip above). It should (but doesn't) go without saying that storing your secret key generator in the same file as with your encrypted text not a very good idea in general, since its availability defeats the whole purpose of the encryption. -
The following files contain examples for you to use in testing your
own program. (Note: Do not copy and paste the text
from the web browser. You need to save the files via downloading in
order to preserve the bits accurately.)
-
xor-cipher-seed-encipher.txtcontains the secret seed and plaintext message used in the examples above. -
xor-cipher-seed-decipher1.datcontains the seed and an encrypted message that you should be able to read if your algorithm uses Approach 1 as described above. -
xor-cipher-seed-decipher2.datcontains the seed and an encrypted message that you should be able to read if your algorithm uses Approach 2 as described above. -
xor-cipher-seed-decipher3.datcontains the seed and an encrypted message that you should be able to read if your algorithm uses Approach 3 as described above.
-
-
The output of your program will be raw character bits that may not
be printable values and could alter the behavior of your
terminal. If this binary output breaks your terminal's expected
behavior, try issuing the command
resetand type Enter. -
If you attempt the extra credit, you should decompose parts of the
encryptalgorithm into apackandunpackprocedures, which will enable easier testing and streamline thexor_encryptprocedure.
Grading
In addition to the general grading guidelines for evaluation, the assignment is worth 30 points.
- [20 points] Main Program
- [4 points] Procedure
get_seed- Reads a number from standard input
- Detects bad input
- Prints message and repeats if error
- [6 points] Procedure
read_data- Reads character values from standard input
- Cannot overflow given buffer
- Reads to end of file
- Detects error, prints a message, and yields empty string
- [4 points] Procedure
bits_in_mask - [6 points] Procedure
xor_encrypt- Uses
randto produce key sequence - Uses XOR operation to cipher character data
- [4 points] Encrypts multiple characters with each key
- [5 points] Extra credit: Uses a single XOR to encrypt multiple characters with each key
- Uses
- [4 points] Procedure
-
Evaluation of program Structure, Comments, Design, and Testing.
(Additional points not given, but points may be deducted.) -
[10 points] Testing and Development
- [2 points] Functions give clear and complete pre- and post-conditions
- [3 points] Test plan enumerates the ranges of problem circumstances
- [2 points] Tests compare cases to be considered with expected outcomes
- [2 points] Transcript records actual test runs
- [1 point] Summary statement explains why the program is correct
A five point penalty will apply to any submission that includes personally identifying information anywhere other than the references/honesty file.
Global variables may not be used anywhere; use of any global variable will result in a grade of zero. Instead, use parameters to pass values into functions.
