Characters and Strings
Background
As you likely know from your work in Scheme, strings (and their constituent characters) are a very important data type in most any programming language. There is little in the way of extra packaging when it comes to working with strings in the C programming language, as your reading from the textbooks will explain:
- King: Sections 7.3 (review), 13.1-13.2 and 13.4-13.5, pages 134-141, 277-296, and 287-299
Character Arrays, Strings, char* and Storage
Program string-intro.c
shows several variations related to the declaration of character arrays,
strings, and char* variables.
/* program illustrating arrays, strings, and pointers */
#include
int
main (void)
{
char first [4] = {'C', 'o', 'l', 'd'}; /* first as an array of characters */
char second[6] = "World"; /* second as a string: char array ending with null */
char third [16] = {'C', 'o', 'm', 'p', 'u', 't', 'e', 'r', /* third as array */
' ', 'S', 'c', 'i', 'e', 'n', 'c', 'e'}; /* of characters */
char * fourth = second; /* fourth as a pointer to an array of characters */
char * fifth = "Hello"; /* fifth as a pointer to a string literal */
printf ("first 3 characters in each array\n");
printf (" first: %c%c%c\n", first[0], first[1], first[2]);
printf (" second: %c%c%c\n", second[0], second[1], second[2]);
printf (" third: %c%c%c\n", third[0], third[1], third[2]);
printf (" fourth: %c%c%c\n", fourth[0], fourth[1], fourth[2]);
printf (" fifth: %c%c%c\n", fifth[0], fifth[1], fifth[2]);
printf ("Variable addresses and array base addresses\n");
printf (" first address: %p, array base address: %p\n", &first, first);
printf (" second address: %p, array base address: %p\n", &second, second);
printf (" third address: %p, array base address: %p\n", &third, third);
printf (" fourth address: %p, array base address: %p\n", &fourth, fourth);
printf (" fifth address: %p, array base address: %p\n", &fifth, fifth);
printf ("variables printed as strings\n");
printf (" first: %s\n", first);
printf (" second: %s\n", second);
printf (" third: %s\n", third);
printf (" fourth: %s\n", fourth);
printf (" fifth: %s\n", fifth);
return 0;
} // main
One run of this program produced the following output (hexadecimal addresses have been converted to base-ten integers for readability):
first 3 characters in each array first: Col second: Wor third: Com fourth: Wor fifth: Hel Variable addresses and array base addresses first address: 359157264, array base address: 359157264 second address: 359157248, array base address: 359157248 third address: 359157232, array base address: 359157232 fourth address: 359157224, array base address: 359157248 fifth address: 359157216, array base address: 4196464 variables printed as strings first: Cold\ufffd second: World third: Computer ScienceWorld fourth: World fifth: Hello
Understanding this program and output can provide substantial insights to how C works with arrays, characters, strings, and pointers.
Storage
The right column shows (in extreme detail) the allocation of memory for
program string-intro.c, based upon the above run. Starting at the
top of the program:
-
firstis allocated space for four characters, beginning in storage location 359157264 (see bottom part of the table). Following the normal approach of initializing arrays, the letters,C,o,l, anddare stored in these locations. The program does not specify what data might be located after this part of memory. -
secondis allocated space for six characters, beginning in storage location 359157248. In C, a string contains a sequence of characters, followed by a null character (code zero). SinceWorldcontains five characters, the string requires six characters to include the code0at the end. -
In organizing memory, the compiler decided not to use the space
between
secondandfirstfor data storage. Although these memory locations are present, the data in those unallocated memory addresses may be left over from the work of previous programs. -
thirdis allocated space for sixteen characters, beginning in storage location 359157232. As withfirst, this space is initialized with specified characters. As an array of characters (not a string), no code zero is placed in memory at the end of this array. -
fourthspecifies the address of a character (e.g., a pointer to the character). In this case, fourth is given the address that begins the stringseconddefined earlier. Note thatfourthrefers to a location in memory (359157224), and the address ofsecond(359157248) is stored in the variablefourth. -
fifthspecifies the address of a character. The address of a character can be the base address of a character array. Achar *may be considered either the location of a single character or the starting point for a string. In this case, information for variablefifthis located at 359157216, and that location contains the starting location 4196464 for the literal string "Hello" — compilers often reserve a separate part of main memory for literal data, such as literal strings.
Output
The first set of printf statements access the first
three characters in each character array. Within
a printf statement, the %c format prints
exactly one data element as a character, so that three characters
are printed for each printf statement here. Note that
arrays and subscripts work the same whether the variable is
declared as an array or as the base address of an array found
elsewhere.
The second set of printf statements display where each
variable is mapped in main memory. The output shown above maps to
the memory schematic on the right.
The third set of printf statements print data as C
strings. In C, a string variable identifies a starting or base address, and
the string is considered to continue until a code 0 or null character is
encountered.
-
For variables,
second,fourth, andfifth, the character data were stored with a null character at the end, and these character strings are printed without difficulty. -
For the variable
third, the initialization placed characters in the array, but no null character was at the end. Rather, from the mapping of memory identified in the table, the string "World" was located immediately after the characters in thethirdarray. When printingthird, theprintfstarted with the first character ofthird(i.e., theCcharacter) and continued character by character until reaching a null. Since no null character was encountered in the processing of thethirdarray, printing continued with the data from thesecondarray. -
For the variable
first, the array declaration specified four characters, without a null character at the end. Although this works fine for arrays, work with strings requires processing to continue until a null is found. In this case,firstis stored in memory at the end of the program area, and we have no idea what might follow. Thus, processing proceeds with the printing of random material until a null is found.
Schematic Memory Diagram
| variable | value stored | memory address |
|---|---|---|
|
section of memory for literal strings | H | 4196464 |
| e | 4196465 | |
| l | 4196466 | |
| l | 4196467 | |
| o | 4196468 | |
| \0 (number) | 4196469 | |
| … | ||
| fifth |
integer value 4196464 |
359157216 |
| 359157217 | ||
| 359157218 | ||
| 359157219 | ||
| 359157220 | ||
| 359157221 | ||
| 359157222 | ||
| 359157223 | ||
| fourth |
integer value 359157248 |
359157224 |
| 359157225 | ||
| 359157226 | ||
| 359157227 | ||
| 359157228 | ||
| 359157229 | ||
| 359157230 | ||
| 359157231 | ||
| third | C | 359157232 |
| o | 359157233 | |
| m | 359157234 | |
| p | 359157235 | |
| u | 359157236 | |
| t | 359157237 | |
| e | 359157238 | |
| r | 359157239 | |
| <space> | 359157240 | |
| S | 359157241 | |
| c | 359157242 | |
| i | 359157243 | |
| e | 359157244 | |
| n | 359157245 | |
| c | 359157246 | |
| e | 359157247 | |
| second | W | 359157248 |
| o | 359157249 | |
| r | 359157250 | |
| l | 359157251 | |
| d | 359157252 | |
| \0 (number) | 359157253 | |
| not specified | 359157254 | |
| not specified | 359157255 | |
| not specified | 359157256 | |
| not specified | 359157257 | |
| not specified | 359157258 | |
| not specified | 359157259 | |
| not specified | 359157260 | |
| not specified | 359157261 | |
| not specified | 359157262 | |
| not specified | 359157263 | |
| first | C | 359157264 |
| o | 359157265 | |
| l | 359157266 | |
| d | 359157267 | |
| not specified | 359157268 | |
| not specified | 359157269 | |
| … | ||
