Laboratory: Programming in Python (Encoded ASCII Art)
CSC 105 - The Digital Age
Summary: In this laboratory, you will apply the
programming skills you have learned to a culminating
project: decoding a text file to retrieve an ASCII Art image.
Contents
First, please quickly review the previous lab so that the lessons you
learned there will be fresh in your mind for you to use today.
a.
Open a terminal window, and move to the code directory you created in the
first Python lab, and then open gedit as shown below.
As before, you will move back and forth between these two windows, writing
programs in gedit and running them in the terminal window.
cd 105/code
gedit &
Next, use the following command to copy four data files for this lesson
into your 105/code directory. (Recall that the final dot is
necessary.)
cp ~weinman/courses/CSC105/labs/data?.??? .
(The ? is a wild card that will only match a single character
in each case.)
b.
Have a look at the data file data1.txt. (Recall the
command less). Consider this: what causes each line to
actually be diplayed different line? In other words, why does the
first word of the second line start a new line? Why doesn't it simply
follow the last word of the previous line?
Answer: In the file, there is a character at the end of each line
which is not displayed when you look at the file. Thus, it is
invisible to us, but it is there for a purpose. We call it
the newline character (it has ASCII code 10), and its purpose
is to indicate that whatever follows it should start a new line. These
types of characters are called control characters because
they are not displayed, but effect how the text file is interpreted.
This will be important when we consider how to use Python to read text
from a file.
Some of the programs you have written so far
accept input that has been typed on the keyboard. We frequently also want
our programs to get input data from files stored on the computer's
hard disk. In this exercise you will learn to open data files and
read their contents.
Before a program can access the data within a file, it must "open" the
file. This can be done in Python with the statement shown below.
file = open("filename.txt", "r")
Now let's consider each part of this statement:
- open is the name of the Python function that will open the file.
- "filename.txt" is the name of the file on disk. You should
replace this name with the name of the specific file you want to
open. The quotation marks are necessary.
- "r" indicates that you want to read the file, and not
modify it. The quotation marks are necessary.
- file is a variable that you will use further on in your
program as a "handle," a way to refer to the opened file.
Once you have opened a file, you can then read data from it one line at a
time, using the Python statement shown below. Notice that we use
file again in this statement to specify that we want to read from
the file just opened. Then readline() is a method that
can do just that, and line is a string variable. When the
statement is executed, a line of data will be read from the file and stored
in the variable called line.
line = file.readline()
The next thing to understand is that the newline character
described above is included in the line read
by file.readline(). To remove it, we can use the following
(admittedly cryptic) statement:
line = line[:-1]
This process is often called "chopping" off the newline.
a.
At this point you should have a file called data1.txt in your
current directory. Write a program called readfile.py
that
- opens the file data1.txt
- reads a line from thee file
- chops the newline from the line
- prints the resulting line to the terminal window
(Your program will be four lines long, and the first three will look
remarkably similar to the example statements just given.)
b.
Now modify your program to make it read, chop, and print three lines from
data1.txt.
c.
You should also have a file in your code directory called data2.txt.
In data2.txt the first line contains a number, and that number
specifies the number of lines in the file after the number itself.
Please take a look at data2.txt to make sure you see what I mean.
Then write a program called readfile2.py that
opens data2.txt, reads the first line, and stores the number
found there in a numeric variable. (Since the line is initially read
as a string, in order to store it as a number, you will need to
convert the string to an integer with the
function int()).
Your program should then read the remaining lines in the file and
print them to the terminal window. To do this, write a loop:
the body of your loop (i.e., the indented statements
associated with the loop) should read and print one line from the
file, and your program should run the loop body as many times as there are
lines in the file. The output of your program should look like the
figure found in
data2.txt. (The number that occupies the first line of the file
should not be included in your output.)
Did you use the number 13 anywhere in your program? If so, please
go back and modify your program such that the number is not used
explicitly. Rather, use the variable you read from the file (even though
you know that in data2.txt its value is 13). This way,
your program will work on any file that has the appropriate format, not
just data2.txt!
In this exercise, we will return to reading a single line from
a file. But now the line that we read will be "encoded" as shown
below. What the code indicates is that the decoded text should begin
with 3 x's, then have 4 underlines, followed by 2 asterisks. This type
of encoding is called a
run-length encoding (we record the "run-length" for each
character), and it can be used for file compression.
coded text: decoded text:
3x4_2* xxx____**
You should have a data file named data3.enc that contains one line
of text that is encoded in this way. Your task is to write a program called
decode1.py that
opens data3.enc, reads the line of text, decodes it, and prints
the decoded text to the terminal window. Here are some hints that may be
helpful.
-
Because of the format of the encoding, you can trust that the
characters in the line will alternate between numbers and other
characters. You can also trust that each number will be a
single-digit number.
-
Remember from the end of our first Python lab that you can access
particular characters in a string by specifying the index number of the
character you are interested in. For example, line[0] indicates
the first character in the string, and line[1] indicates the
second. (Recall also that index numbers count from 0.)
-
Remember that a digit in a string must be converted to a number, for
example int(line[0]), in order for Python to recognize it as a
number.
-
Remember (from the section "Multiplication" in our second Python lab)
that you can multiply a string by an integer to produce a new string that
contains multiple copies of the original.
-
You may want to start by reading just the first digit/character pair, and
printing the expanded version of just that pair.
-
You can use a variable and arithmetic to index characters of the
string. For instance, the following is a valid Python program.
index=4
color="orange"
print "The ", str(index), "th character is ", color[index]
print "The character after it is ", color[index+1]
-
Once you have the first digit/character pair working, note that
you can use the Python function
len(line) to determine the number of characters in a string
called line (i.e., the length of the line). Then
write a loop to expand all of the digit/character pairs in the line. The
statements inside your loop should expand one digit/character pair, and
your program should run the loop as many times as needed
to account for all the digit/character pairs.
In this exercise you will use the last data file provided with this lab:
data4.enc. The format of the file is as follows:
-
The first line of the file contains a single number, which
indicates the number of lines in the file after the
number itself.
-
The remaining lines are encoded with the same coding scheme we
used in Exercise 2. (Each line contains alternating numbers and
characters, where the number indicates the number of times that the
following character should be printed.)
Your task is to write a program called decoder.py
that reads and decodes the file, and prints
the decoded text to the terminal window. Here are some hints that may be
useful:
-
Start by making a copy of your program
readfile2.py, and calling it decoder.py. Then remove
any lines that print data (since that program printed lines verbatim,
which is no longer what we want to do). This program will serve as an
outline for your decoder. Note that it contains a loop that executes one
time for each line in the input file.
-
Next, review your program decode1.py to find the portion
of it that processes (decodes and prints) a single line that has
been read from a file. Insert that set of statements
from decode1.py into the loop in decoder.py. If
all goes well, this should give your program the ability to decode
each line in the input file.
-
At this point, you should have one loop nested inside
another. This is a wonderfully powerful construct, but it has a
potential downfall you need to be aware of. Recall from the last
lab that each of the loops will have a control variable
that essentially counts the number of times the loop has been
executed, and causes execution to stop when it has been done
sufficiently often. These two control variables (one for the inner
loop and one for the outer loop) must have different names, for
example count and count2. Otherwise, execution
of the two loops will not occur the proper number of times.
If all has gone well, your program decoder.py should now have the
capability to read and decode the ASCII Art image represented in the data
file data4.enc.