Scheme-like Lists in C
Summary
Your experience with Scheme through the prior course should indicate that lists can be a particularly helpful structure for the storage and processing of many data types. Lists provide a very flexible context for processing, and unlike arrays, we do not need to specify a maximum size of a list when we create it. This reading discusses how lists might be implemented in C, following an approach analogous to the lists we studied previously in Scheme.
Because Scheme incorporates lists as a built-in data structure,
Scheme supplies several built-in operations
(e.g., cons, car, and
cdr) for list processing, and we could program in Scheme
using lists without considering the mechanics of how lists were
implemented. To process lists in C, however, we first need to
consider some internal details of Scheme lists. We then can
translate these details to C.
Background: Lists in Scheme
When you call cons in Scheme, you are building a
structure in memory with two parts, one of which refers to the first
argument to cons and the other of which refers to the
second. This structure is called a cons cell or a pair. Such lists in
Scheme can be modelled graphically with a box-and-pointer
representation. The basic idea is to use a
rectangle—divided in half—to represent the result of the
cons operation. From the first half of the rectangle, we
draw an arrow to the head of a list; from the second half of the
rectangle, we draw an arrow to the rest of the list. For example,
(cons 'a '()) would be represented as follows:
Here, the line to the symbol 'a indicates that
'a is the head of the list. The diagonal line through
the right half of the rectangle indicates that nothing comes later in
this list. Since (cons 'a '()) gives the list
'(a), this diagram represents '(a) as well.
Now consider the list (cons 'b (cons 'a null)) or the
list '(b a). Here, we draw another rectangle, where the
head points to the symbol 'b and the tail points to the
representation of '(a) that we already have seen. The
result is:
Similarly, the list '(d c b a) is constructed as
(cons 'd (cons 'c (cons 'b (cons 'a '()))))
and would be drawn as follows:
We may use a similar approach for lists having sublists as
components. For example, consider the list '((a) b (c d)
e) This is a list with four components, so at the top level we
will need four rectangles, just as in the previous example for the
list '(d c b a). Here, however, the first component
designates the list '(a), which itself involves the
box-and-pointer diagram already discussed. Similarly, the
list '(c d) has two boxes for its two components, just as
we discussed for '(b a) previously. The resulting diagram
follows:
Throughout these diagrams, the null list is represented by a null
pointer or line. Thus, the list containing the null list,
( ( ) )—that is
(cons '() '())—is
represented by a rectangle with lines through both halves:
Representing a Box-and-Pointer Diagram in C
In computer science, this box-and-pointer representation is a primary mechanism used to describe lists—not just in Scheme, but in most contexts. An implementation of lists in C typically utilizes this graphical perspective and involves three main elements:
-
a
struct nodethat implements a box-and-pointer unit, (called a pair in Scheme) -
a
firstpointer that indicates the location of the first node in a list, and - a collection of operations that aid in the processing of the box-and-pointer nodes.
A typical node contains two elements—one for
data and the other to identify the next node on a list.
Because C requires that we declare the type of data fields, we must
tailor the data to the application at hand. For the remainder of
this unit (today's reading and lab), we assume the data will be a
string. The following declarations capture these elements.
#define STRMAX 50 /* maximum size of a string within a list node */
typedef struct node node_t;
struct node {
char data [STRMAX+1]; /* Additional slot for sentinel */
node_t* next;
};
To clarify these lines, a node is a structure with
data and a next field, and
node_t is a synonym for struct
node. (It is simpler to write node_t than the
two keywords struct node, and
conceptually it seems cumbersome to have to write
the struct keyword for each
declaration.)
Some Designs for Scheme-like List
While the struct node
or node_t type provides appropriate support to build
lists that implement box-and-pointer representations, the design of a
list may combine these node_ts in one of several ways.
For example, here are some basic issues:
-
In Scheme, list processing follows a functional perspective:
procedures such
as
cons,car,cdr,null?, andlengthtake lists as parameters and return new lists or data. In C, two main choices are possible:-
should functions, such as
consandcdr, change an existing list, or - should operations return new and updated lists?
-
should functions, such as
-
When processing a list, perhaps with
consorcdr, should there be a connection between the old list and the new one; that is,- should an operation give rise to a completely new structure, or
- should an operation reuse nodes from an earlier structure when possible?
To clarify this second point, consider the Scheme statements:
(define x '(b c))
(define y (cons 'a x))
Thus, we can consider y to be the list
'(a b c). The following
figure shows two possible structures that could result:
In the first option, the nodes of the original list are copied, and
thus are explicitly distinct from those in the new list. In the
second option, a new node is created for the cons node,
a new value is added within that node, but the next part
of that list refers to the old list.
For the example shown, both options may be reasonable. However,
suppose we now change the second element of x
from c to d, using
Scheme's set-cdr! operation. (That is, the
new x is the list (b d)). In the first
option, y is not affected, while in the new
approach y becomes the list (a b d).
Because y refers to x when nodes are
reused, any change to x also affects y.
This may or may not be the desired result of changing x.
Overall both approaches have some advantages in certain cases. However, the first approach requires considerable overhead to duplicate nodes. Furthermore, in a purely functional context, lists are not altered during processing. In such a context, we could reuse nodes without fear of altering other lists unexpectedly, as old lists are never changed. Both of these observations explain why Scheme uses the second approach—reusing nodes when possible.
Implementing Scheme List Operations in C
To illustrate how to implement Scheme-style list operations in C,
program scheme-lists.c shows
implementations of the operations cons,
car, and cdr. In addition, the program
contains function listInit that initializes a list and
listPrint that prints the elements of a list in Scheme
format. Finally, whereas C requires programmers to handle all issues
of memory allocation and deallocation, the program contains function
listDelete that deallocates all nodes in a list and then
sets the list variable to NULL.
Before considering specific details of the C functions, we review
some elements of C syntax, based on the box-and-pointer representation
of the Scheme list (a b c).
In this diagram, first is a variable that points to a
node_t. C notates this type by adding an
asterisk * to the declaration:
struct node* first;
Alternatively, since we used
a typedef statement to
define node_t as struct node, we could
declare first as
node_t * first;
With either of these declarations, first is
a pointer to a
node_t—meaning it is a variable containing an
address. The dereferencing operation *first accesses
the node_t struct itself. Within
this node_t, (*first).data yields the
data field within the struct node,
and (*first).next yields the next field.
Alternatively, an arrow notation accomplishes the same result in a
slightly cleaner form:
first->data and
first->next.
With this notation, we now review various details of the C functions.
Full details of these functions are in
program scheme-lists.c.
car
Because a node_t contains a string as data, the
car function must return a pointer to a string (i.e., a
char *) as its result.
Altogether, we can access and return the car of
a struct node as:
return (char*)list->data;
Because the data field is technically an array, we knowingly cast to
a char * to eliminate a compiler
warning us about the type disparity.
cdr
The cdr operation returns the next—a
pointer to a struct node which has
type node_ t*. Accessing and
returning this field follows the same approach as car.
cons
For the cons, C first requires that we allocate space
explicitly. C's malloc function accomplishes this task
when we give it the amount of space to allocate. Because
malloc returns a void *, the compiler automatically converts the type to match that
of its assignee, which is node_t *. The relevant line is:
node_t * newNode = malloc (sizeof(node_t));
Once the space is allocated, we need to fill the data and
next fields. Following the above discussion,
the next field will point to the head of the next node.
For the data field, we copy the string into the
array.
listPrint
To print, we need a temporary variable current that
starts at the beginning of a list and then progresses node-by-node
until the end. By convention in C, a pointer that does not specify
any node has the value NULL. Also, given one position
in the list, the next node is obtained by looking in
the next field. Putting these details together, the
main structure of a list iteration loop is:
for ( node_t * current=(node_t*)list ; current!=NULL ; current=current->next )
{
/* printing details go here */
}
If we are to print results in the format given by Scheme, we should enclose an entire list in parentheses and separate successive list elements by a space. These details require some care.
We cast the list variable to a node_t* to
avoid a compiler warning about discarding
the const qualifier in the copying
assignment to current (not declared
as const), which is primarily used
as a signal to a caller that the data in the pointer
parameter list will not be changed
by listPrint. By making the cast, we explicitly signal
our intention to discard the qualifier; however, we are still on our
honor not to write through the pointers, since we have promised the
caller that we won't.
listInit
Initialization requires some thought and care. One approach would be to assign
first = NULL;
in the main program.
Although this will work fine, we might want to accomplish
initialization in a procedure. In this case,
passing first makes a copy of
first within the listInit procedure.
Instead, we must pass the address of first and within
the function we must place NULL at that address.
-
call:
listInit (&first) -
function header:
void listInit (node_t** listPtr) -
function body:
*list = NULL
listDelete
Deletion requires that we explicitly deallocate space for each node
and then set the first pointer to NULL.
Because the last step changes the first variable, we
must pass the address of first
to listDelete paralleling the approach used for
listInit.
For completeness, we should first deallocate space for the rest of a
list before deallocating the space for the first node. (If we
proceeded in the other order, once we deallocated the first node, we
could not be confident that the next field had valid
data, so working down the list would be unreliable.) Proceeding
recursively down the list handles subsequent nodes cleanly and
easily.
Finally, the deallocation of memory uses the standard C function
free.
Full Program
These definitions and methods combine to give
program scheme-lists.c:
/* Definition of data structure and operations
for working with Scheme-like lists
*/
/* libraries for standard I/O, strings, and memory allocation */
#include
#include
#include
#define STRMAX 50 /* maximum size of a string within a list node */
typedef struct node node_t;
struct node {
char data [STRMAX+1]; /* Additional slot for sentinel */
node_t * next;
};
/* ---------------------------------------------------------------- */
/* function prototypes, listed in alphabetical order */
char *
car (const node_t * list);
/* pre-condition: list is an initialized, non-empty list (actually list node)
post-condition: head of the list is returned
*/
node_t *
cdr (const node_t * list);
/* pre-condition: list is an initialized, non-empty list (actually list node)
post-condition: tail of the list is returned
*/
node_t *
cons (const char * newData, node_t * rest);
/* pre-condition: newData is an initialized string
rest is an intialized list
post-condition: returns new node with newData copied to data field
and next field pointing to rest
*/
/* function prototypes, listed in alphabetical order */
void
listDelete (node_t ** listPtr);
/* pre-condition: *listPtr is an initialized list
post-condition: *listPtr is changed to a NULL list
with any previously-defined nodes deallocated
*/
void
listInit (node_t ** listPtr);
/* pre-condition: none
post-condition: *listPtr is initialized to a null list
*/
void
listPrint (const node_t * list);
/* pre-condition: list is an initialized list
post-condition: data in each node is printed, Scheme-style, within ()
*/
/* ---------------------------------------------------------------- */
/* function definitions */
char *
car (const node_t * list) {
/* pre-condition: list is an initialized, non-empty list (actually list node)
post-condition: head of the list is returned
*/
return (char*)list->data;
} // car
node_t *
cdr (const node_t * list) {
/* pre-condition: list is an initialized, non-empty list (actually list node)
post-condition: tail of the list is returned
*/
return list->next;
} // cdr
node_t *
cons (const char * newData, node_t * rest) {
/* pre-condition: newData is an initialized string
rest is an initialized list
post-condition: returns new node with newData copied to data field
and next field pointing to rest
if new node cannot be allocated, returns NULL and prints error
*/
node_t * newNode = malloc (sizeof(node_t));
if (newNode == NULL) {
perror ("Unable to allocate node");
return NULL;
}
strncpy (newNode->data, newData, STRMAX+1); /* limit copy to avoid overflow */
newNode->data[STRMAX] = '\0'; /* ensure null termination of the string */
newNode->next = rest; /* define next field of node as the rest of the list */
return newNode; /* return newly initialized node */
} // cons
void
listDelete (node_t ** listPtr) {
/* pre-condition: *listPtr is an initialized list
post-condition: *listPtr is changed to a NULL list
with any previously-defined nodes deallocated
*/
if (*listPtr != NULL)
{
listDelete ( &((*listPtr)->next)); /* recursively remove rest of list */
free (*listPtr); /* deallocate the space for the node itself */
*listPtr = NULL; /* nullify list pointer, avoiding wayward refereces */
}
} // listDelete
void
listInit (node_t ** listPtr) {
/* pre-condition: none
post-condition: *listPtr is initialized to a null list
*/
*listPtr = NULL;
} // listInit
void
listPrint (const node_t * list) {
/* pre-condition: list is an initialized list
post-condition: data in each node is printed, Scheme-style, within ()
*/
char * separator = ""; /* Initialize to no separator */
printf ("(");
for ( node_t * current=(node_t*)list ; current!=NULL ; current=current->next )
{
printf ("%s\"%s\"", separator, current->data);
separator = " "; /* Set separator for intermediate elements */
}
printf (")\n");
} // listPrint
/* ---------------------------------------------------------------- */
/* main program: testing for lists-like-Scheme operations */
int
main (void)
{
printf ("testing of lists-like-scheme operations\n");
/* declarations */
node_t * a;
node_t * b;
node_t * c;
node_t * d;
node_t * e;
char * str;
/* create lists */
listInit (&a);
b = cons ("node B", a);
c = cons ("node C", b);
d = cons ("node D", c);
e = cdr (d);
/* check list creation */
printf ("list a: ");
listPrint (a);
printf ("list b: ");
listPrint (b);
printf ("list c: ");
listPrint (c);
printf ("list d: ");
listPrint (d);
printf ("list e: ");
listPrint (e);
/* test car operation (cdr tested earlier) */
str = car (d);
printf ("car of list d: %s\n", str);
/* clean up */
listDelete (&d);
a = NULL;
b = NULL;
c = NULL;
e = NULL;
printf ("list d: ");
listPrint (d);
printf ("end of testing\n");
} // main
