Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.
1. Memory and addresses
Memory in a typical modern computer is divided into two classes: a small number of registers, which live on the CPU chip and perform specialized functions like keeping track of the location of the next machine code instruction to execute or the current stack frame, and main memory, which (mostly) lives outside the CPU chip and which stores the code and data of a running program. When the CPU wants to fetch a value from a particular location in main memory, it must supply an address: a 32-bit or 64-bit unsigned integer on typical current architectures, referring to one of up to 232 or 264 distinct 8-bit locations in the memory. These integers can be manipulated like any other integer; in C, they appear as pointers, a family of types that can be passed as arguments, stored in variables, returned from functions, etc.
2. Pointer variables
2.1. Declaring a pointer variable
The convention is C is that the declaration of a complex type looks like its use. To declare a pointer-valued variable, write a declaration for the thing that it points to, but include a before the variable name:
2.2. Assigning to pointer variables
Declaring a pointer-valued variable allocates space to hold the pointer but not to hold anything it points to. Like any other variable in C, a pointer-valued variable will initially contain garbage---in this case, the address of a location that might or might not contain something important. To initialize a pointer variable, you have to assign to it the address of something that already exists. Typically this is done using the (address-of) operator:
2.3. Using a pointer
Pointer variables can be used in two ways: to get their value (a pointer), e.g. if you want to assign an address to more than one pointer variable:
But more often you will want to work on the value stored at the location pointed to. You can do this by using the (dereference) operator, which acts as an inverse of the address-of operator:
The operator binds very tightly, so you can usually use anywhere you could use the variable it points to without worrying about parentheses. However, a few operators, such as , , and (used in C/Structs) bind tighter, requiring parantheses if you want the to take precedence.
2.4. Printing pointers
You can print a pointer value using with the format specifier. To do so, you should convert the pointer to type first using a cast (see below for pointers), although on machines that don't have different representations for different pointer types, this may not be necessary.
Here is a short program that prints out some pointer values:looking_at_pointers.c
When I run this on a Mac OS X 10.6 machine after compiling with , the output is:&G = 0x100001078 &s = 0x10000107c &a = 0x7fff5fbff2bc &p = 0x7fff5fbff2b0 p = 0x100100080 main = 0x100000e18
The interesting thing here is that we can see how the compiler chooses to allocate space for variables based on their storage classes. The global variable and the static local variable both persist between function calls, so they get placed in the BSS segment (see .bss) that starts somewhere around , typically after the code segment containing the actual code of the program. Local variables and are allocated on the stack, which grows down from somewhere near the top of the address space. The block that returns that points to is allocated off the heap, a region of memory that may also grow over time and starts after the BSS segment. Finally, appears at 0x100000e18; this is in the code segment, which is a bit lower in memory than all the global variables.
3. The null pointer
The special value , known as the null pointer may be assigned to a pointer of any type. It may or may not be represented by the actual address , but it will act like in all contexts (e.g., it has the value false in an or statement). Null pointers are often used to indicate missing data or failed functions. Attempting to dereference a null pointer can have catastrophic effects, so it's important to be aware of when you might be supplied with one.
4. Pointers and functions
A simple application of pointers is to get around C's limit on having only one return value from a function. Because C arguments are copied, assigning a value to an argument inside a function has no effect on the outside. So the function below doesn't do much:bad_doubler.c
However, if instead of passing the value of into we pass a pointer to , then the function can reach out of its own stack frame to manipulate itself:good_doubler.c
Generally, if you pass the value of a variable into a function (with no ), you can be assured that the function can't modify your original variable. When you pass a pointer, you should assume that the function can and will change the variable's value. If you want to write a function that takes a pointer argument but promises not to modify the target of the pointer, use , like this:
The qualifier tells the compiler that the target of the pointer shouldn't be modified. This will cause it to return an error if you try to assign to it anyway:
Passing pointers is mostly used when passing large structures to functions, where copying a 32-bit pointer is cheaper than copying the thing it points to.
If you really want to modify the target anyway, C lets you "cast away ":
There is usually no good reason to do this; the one exception might be if the target of the pointer represents an AbstractDataType, and you want to modify its representation during some operation to optimize things somehow in a way that will not be visible outside the abstraction barrier, making it appear to leave the target constant.
Note that while it is safe to pass pointers down into functions, it is very dangerous to pass pointers up. The reason is that the space used to hold any local variable of the function will be reclaimed when the function exits, but the pointer will still point to the same location, even though something else may now be stored there. So this function is very dangerous:
An exception is when you can guarantee that the location pointed to will survive even after the function exits, e.g. when the location is dynamically allocated using (see below) or when the local variable is declared :
Usually returning a pointer to a local variable is not good practice, since the point of making a variable local is to keep outsiders from getting at it. If you find yourself tempted to do this, a better approach is to allocate a new block using (see below) and return a pointer to that. The downside of the method is that the caller has to promise to call on the block later, or you will get a storage leak.
5. Pointer arithmetic and arrays
Because pointers are just numerical values, one can do arithmetic on them. Specifically, it is permitted to
Add an integer to a pointer or subtract an integer from a pointer. The effect of where is a pointer and is an integer is to compute the address equal to plus times the size of whatever points to (this is why pointers and pointers aren't the same).
Subtract one pointer from another. The two pointers must have the same type (e.g. both or both ). The result is an integer value, equal to the numerical difference between the addresses divided by the size of the objects pointed to.
Compare two pointers using , , , , , or .
Increment or decrement a pointer using or .
The main application of pointer arithmetic in C is in arrays. An array is a block of memory that holds one or more objects of a given type. It is declared by giving the type of object the array holds followed by the array name and the size in square brackets:
Declaring an array allocates enough space to hold the specified number of objects (e.g. 200 bytes for above and 400 for ---note that a is an address, so it is much bigger than a ). The number inside the square brackets must be a constant whose value can be determined at compile time.
The array name acts like a constant pointer to the zeroth element of the array. It is thus possible to set or read the zeroth element using . But because the array name is constant, you can't assign to it:
More common is to use square brackets to refer to a particular element of the array. The expression is defined to be equivalent to ; the index (an integer) is added to the base of the array (a pointer), to get to the location of the -th element of . The implicit then dereferences this location so that you can read its value (in a normal expression) or assign to it (on the left-hand side of an assignment operator). The effect is to allow you to use just as you would any other variable of type (or whatever type was declared as).
Note that C doesn't do any sort of bounds checking. Given the declaration , only indices from to can be used safely. However, the compiler will not blink at or . If you read from such a location you will get garbage data; if you write to it, you will overwrite god-knows-what, possibly trashing some other variable somewhere else in your program or some critical part of the stack (like the location to jump to when you return from a function). It is up to you as a programmer to avoid such buffer overruns, which can lead to very mysterious (and in the case of code that gets input from a network, security-damaging) bugs. The program can help detect such overruns in some cases (see C/valgrind).
Another curious feature of the definition of as identical to is that it doesn't actually matter which of the array name or the index goes inside the braces. So all of , , and refer to the zeroth entry in . Unless you are deliberately trying to obfuscate your code, it's best to write what you mean.
5.1. Arrays and functions
Because array names act like pointers, they can be passed into functions that expect pointers as their arguments. For example, here is a function that computes the sum of all the values in an array of size :
Note the use of to promise that won't modify the contents of .
Another way to write the function header is to declare as an array of unknown size:
This has exactly the same meaning to the compiler as the previous definition. Even though normally the declarations and mean very different things (the first one allocates space to hold 10 s, and prevents assigning a new value to ), in a function argument is just SyntacticSugar for . You can even modify what points to inside by assigning to it. This will allow you to do things that you usually don't want to do, like write this hideous routine:
5.2. Multidimensional arrays
Arrays can themselves be members of arrays. The result is a multidimensional array, where a value in row and column is accessed by .
Declaration is similar to one-dimensional arrays:
This declaration produces an array of 24 values, packed contiguously in memory. The interpretation is that is an array of 6 objects, each of which is an array of 4 s.
If we imagine the array to contain increasing values like this:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
the actual positions in memory will look like this:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ^ ^ ^ a a a
To look up a value, we do the usual array-indexing magic. Suppose we want to find . The name acts as a pointer to the base of the array.The name says to skip ahead 1 times the size of the things pointed to by , which are arrays of 6 s each, for a total size of 24 bytes assuming 4-byte s. For , we start at and move forward 4 times the size of the thing pointed to by , which is an ; this puts us 24+16 bytes from , the position of 10 in the picture above.
Like other array declarations, the size must be specified at compile time in pre-C99 C. If this is not desirable, a similar effect can be obtained by allocating each row separately using and building a master list of pointers to rows, of type . The downside of this approach is that the array is no longer contiguous (which may affect cache performance) and it requires reading a pointer to find the location of a particular value, instead of just doing address arithmetic starting from the base address of the array. But elements can still be accessed using the syntax. An example of this approach is given in malloc2d.c.
5.3. Variable-length arrays
C99 adds the feature of variable-length arrays, where the size of the array is determined at run-time. These can only appear as local variables in procedures (automatic variables) or in argument lists. In the case of variable-length arrays in argument lists, it is also necessary that the length of the array be computable from previous arguments.
For example, we could make the length of the array explicit in our function:
This doesn't accompish much, because the length of the array is not used. However, it does become useful if we have a two-dimensional array, as otherwise there is no way to compute the length of each row:
If you want to append a single character to a string allocated on the heap, here's one way to do it:
However, this is inefficient to do repeatedly in a loop, because you'll be repeatedly calling on (essentially) the same string, and reallocating the buffer to fit one more character each time.
If you want to be smarter about your reallocations, keep track of the buffer's current allocated capacity separately from the length of the string within it — if you know C++, think of the difference between a object's "size" and its "capacity" — and when it's necessary to reallocate, multiply the buffer's size by a scaling factor (e.g. double it) instead of adding 1, so that the number of reallocations is O(log n) instead of O(n).
This is the sort of thing that a good string class would do in C++. In C, you'll probably want to move this buffer-management stuff into its own module.
answered Jul 15 '14 at 4:08