CSC 161 Schedule Readings Labs & Projects Homework Deadlines Resources

XOR Cipher

Summary
You encrypt (and decrypt) a message using a secure XOR cipher stream
Objectives
  • Reinforce knowledge of arrays
  • Practice using formatted input
  • Apply bitwise operations for fun and profit
  • Consider problem decomposition

Introduction

The XOR cipher is a simple, yet powerful cryptographic method for securing data. In its simplest form, only a secret key is necessary to perform both encryption and decryption using the bitwise exclusive OR operation, often denoted with a circle plus, ⊕

To encrypt, we simply XOR a plaintext message M with our secret key K so that MK = E. To decrypt we simply XOR the encrypted message E with the same key, EK = M. Conveniently, this means the same operation (or program, in our case) can be used to both encipher and decipher (or encrypt and decrypt).

If a fixed-length key is shorter than the message and used repeatedly to encrypt pieces of the message, then a frequency analysis could be used to break the encryption of text messages (in the same way you or your parents may have broken cryptogram puzzles in the newspaper, which rely on the fact that certain letters and combinations appear more often than others).

However, if the key is as long as the message, the only way to break the encryption is to try every possible key. This is known as a brute-force attack and is implausible in practice because an N-bit key has 2N possibilities.

Securely sharing a long key is just as difficult as the original problem of sharing a long message. Moreover, repeated use of even a long key for multiple messages is also subject to frequency-based attacks. One way around these problems is to use a pseudo-random number generator (RNG) to generate an unpredictable but repeatable stream of keys. Then, only the (much shorter) initial seed for the generator needs to be shared (assuming both parties are using the same generator). Each block of the message is then encrypted using subsequent keys from the generator.

Details

As one might imagine, correctly using these ideas requires attention to a number of practical details.

Overview

In this assignment, you will use the C standard library function rand to generate the sequence of random keys. (These days, the widely-available function random is often preferred over rand because the former is considered more random; for simplicity we'll be sticking to the standard.)

Whichever generator is used, we must still deal with the fact that only some of the bits in the value returned are random. The remaining bits will likely be zero, and using them as part of our XOR key would make the results vulnerable to attack.

Once we figure out how many bits of the key are usable, we will then need to pack together the bits of our message into a block that can be XORed with the key. This is because our message text will be stored as chars, while the key will be some portion of the bits in an int returned by rand (or perhaps long int returned by random).

Finally, bitwise operations and casts (implicit or explicit) interact in ways that can be frustrating to the unwary user. Therefore, one must take care when shifting and combining bits across types and values.

The following sections provide further information about these requisite details.

Key stream

Suppose the message we want to encrypt is My secret. As 8-bit ASCII characters, this message would be represented as follows.

Index Char Integer Binary
0 'M' 77 01001101
1 'y' 121 01111001
2 ' ' 32 00100000
3 's' 115 01110011
4 'e' 101 01100101
5 'c' 99 01100011
6 'r' 114 01110010
7 'e' 101 01100101
8 't' 116 01110100

With a seed of 161, the first two random keys are 383803757 and 2110550296. If RAND_MAX uses only 31 bits, these would be represented as follows.

Decimal Binary
383803757 X0010110 11100000 01100001 01101101
2110550296 X1111101 11001100 01110001 00011000

We have inserted spaces to group the bits into bytes and made the leading bit X to indicate it is not usable as a random bit. Because the leftmost full byte is not usable, we will ignore it entirely.

Next we examine three possible ways to make use of these random keys. The first is the most straightforward, but reduces the number of random keys generated because most of the random bits are discarded. (More keys will therefore be needed and the RNG will begin to repeat the cycle of keys much sooner.) The second wastes far fewer bits, but requires more bitwise operations. The third (available as an extra credit option) requires significant attention to detail, but requires fewer XOR operations.

Approach 1: One key per character

The most straightforward approach would be to use each random key value to cipher one character. In other words, the first character 'M' would be encrypted with the first key 383803757, the second character 'y' would be encrypted with the second key 2110550296, and so on.

Because each character (in this example) is a single byte but each key is several bytes, this approach would likely use the least significant byte of the random key being used as the XOR operand for a character, so that the encrypted text would become:

Char Encryption Result
Input Key
'M' 01001101 01101101 00100000
'y' 01111001 00011000 01100001

Approach 2: Multiple characters per key

A second approach that makes better use of the bits in the random keys would first determine how many characters can be XORed from a given random key. In this example, where a character is a byte and there are three full bytes available with in the key, we could use the least significant bytes of the keys available in turn. For example:

Char Encryption Result
Input Key
'M' 01001101 01101101 00100000
'y' 01111001 01100001 00011000
' ' 00100000 11100000 11000000

At this point we will have exhausted the first key and can continue encrypting beginning with the second key:

Char Encryption Result
Input Key
's' 01110011 00011000 01101011
'e' 01100101 01110001 00010100
'c' 01100011 11001100 10101100

and so on.

Note that if we assume only the rightmost bits are preserved in a narrowing cast (from the wider key type to the narrower character type), we can use C's bitwise XOR operator to do the encryption, repeatedly shifting the appropriate number of bits off the key until we run out of bits and need to generate a new key.

Approach 3: One XOR per key [Extra Credit]

The previous approach makes efficient use of the keys, but it requires many XOR operations. Assuming that each bitwise operation takes the same amount of time, regardless of the number of bits involved (up to the width of the computational hardware), we might instead try to pack several chars of the message together into a single int to enable a single XOR with a given key.

Using this approach, the computation might look like the following. (Note how the character and key pairings have been reversed from Approach 2.)

Input 'My ': 01001101 01111001 00100000
Key 11100000 01100001 01101101
Result 10101101 00011000 01001101
First Key
Input 'sec': 01110011 01100101 01100011
Key 11001100 01110001 00011000
Result 10111111 00010100 01111011
Second Key

Implementing this approach would require additional bitwise operations to accomplish the packing (before the per-key XOR) and unpacking of bits. Those attempting should be aware of the bitwise effects of widening casts on signed numbers.

Assignment

Overview

Write a program called encipher.c that implements the requisite procedures specified below to make the following main functional (without modification).


#ifndef NOMAIN
int
main (void)
{
  char buf[MAXLEN];           
  srand (get_seed());            /* Initialize the RNG with a given key */
  size_t len = read_data (buf,MAXLEN); /* Read chars into buf to be ciphered */
  xor_encrypt (buf, len);        /* Encrypt exactly len values in the buffer */
  for (int c=0 ; c<len ; c++) /* Print ciphered values by chars because */
    printf ("%c",buf[c]);        /*  the null terminator could appear within */
} // main
#endif

Your program must include exactly this block of code, including the wrapping preprocessor macro. (This will be used to efficiently test your code with a separate program.)

In order to capture the program output to a file for decryption, your program should not print any helpful user prompts.

Get Seed

To make the sequence of keys reproducible (and therefore the enciphered text decipherable), we need to initialize the pseudo-random number generator with a shared secret key. The standard library function srand does this. You need to write the function

unsigned int
get_seed (void);

that should read a line containing a single (non-negative) integer number from user input and return the value as an unsigned int.

While no initial prompt should be given, if there is any error in reading the number, an error message should be printed and another attempt made to read a correct seed value. (This process would continue indefinitely until a value is successfully read.)

Read Text

After reading the seed from the user, your program will need to read the entirety of the message to be ciphered (up to max_len characters) using the function

size_t
read_data(char buffer[], size_t max_len);

The procedure should read data—bytes as chars from standard input—until it fills the buffer or reaches the end of file. Of course, you'll be reading from terminal input, but we'll see below that it too can run out. (Read the man page for your preferred input method on how to detect the end of file, often with feof(stdin).)

The function should read as much data as is present (however many lines, and beyond any zero value that might otherwise indicate the null termination of a string. Because the data stream being read could be the result of an XOR operation of an ASCII character with a random set of bits, the result could be any other set of bits (including '\0'). The function must therefore read until it reaches the EOF. You will likely want to use getchar, because scanf and fgets both stop reading when they encounter a newline.

Likewise, no initial user prompt should be given. However, if your program encounters any error in reading the text (perhaps as detected by ferror(stdin)), an error message should be printed and the empty string placed in buffer.

The number of characters read into the buffer should be returned.

Note that size_t is a system-independent type (typically an unsigned integer) used to represent the maximum size of any object. It is defined in several places, including stdlib.h.

Identify Random Bits

The C standard library specifies that rand will return a value between 0 and RAND_MAX (a value defined by stdlib.h). It is therefore possible that that not all the bits in the values returned by rand will be usably random.

Thus, your next task will be to determine how many bits are usable among the ints returned by rand.

We will assume as a precondition that the value of RAND_MAX in binary is a leading sequence of zeros followed by a sequence of ones. Note that because many systems use an arithmetic right shift, we will also require that the argument not exceed INT_MAX (defined in limits.h), which is to say that the argument must be positive (when signed) so that the leading/leftmost bit is always zero. (This assumption is not strictly necessary, but it greatly simplifies your function for our purposes.)

Write the following function that determines the number of (binary) ones in any integer formatted in such a sequence.

size_t
bits_in_mask (int mask);

Examples

bits_in_mask (255) // 28-1
8
bits_in_mask (3)
2
bits_in_mask (1048575) // 220-1
20
bits_in_mask ((1<<18)-1) // 218-1
18
bits_in_mask (RAND_MAX)
31

Note that the last value is system-dependent and not a universal property of the C ecosystem.

Encrypting Text

Using the examples outlined above as your guide, you will need to use the sizeof operator, the RAND_MAX constant and the bits_in_mask function to implement the XOR cipher stream. Complete the function

void
xor_encrypt (char buffer[], size_t len)
{
  size_t chars_per_key = ???; // Number of chars cipherable per random int

 ...
} // encrypt

Note that Approach 1 above will not need to calculate chars_per_key. Approaches 2 and 3 will require this value be calculated using system-independent operators and named constants. In the examples, the calculated value would be 3.

As indicated by main above, encrypt should transform the first len characters of buf in place using calls to rand to generate the keys.

Regardless of which approach is used, the same function can both encrypt and decrypt the text (assuming srand is called with the appropriate seed).

Additional Requirements

Tips

Grading

In addition to the general grading guidelines for evaluation, the assignment is worth 30 points.

A five point penalty will apply to any submission that includes personally identifying information anywhere other than the references/honesty file.

Global variables may not be used anywhere; use of any global variable will result in a grade of zero. Instead, use parameters to pass values into functions.