EID 151: Programming Languages

Spring 2004

Monday 3:00 - 5:00 Rm 521, Thursday 4:00 - 5:00 Rm 621

Click here for main page of course...

Topic #4: Linked Lists

Linked lists constitute one of the most important topics for programming in a language like C. For beginners, it is often one of the most difficult. This topic combines the concepts of structures, pointers, and dynamic memory allocation.

A linked list can be thought of as a sequence of nodes. Each node is an entry in the list, containing data and a pointer to the next node. These pointers from each node to the next are referred to as links. If the linked list is a doubly linked list, each node will also contain a pointer, or link, to the previous node. The value of the link of the final node in a linked list will be NULL, indicating, in a sense, that this node does not point anywhere. If the linked list is a circular linked list, the value of the link of the final node will be a pointer to the first node. Programs that use linked lists also store a pointer to the first node of the linked list.

Each node is represented by a structure. This structure is a user-defined data type, just like any other structure. The data represented by each node are simply the fields of the structures. In this sense, a linked list of nodes (really a linked list of structures) is not so different from an array of structures, as we have seen before. Unlike arrays of structures, however, the memory used to store each node (structure) is not necessarily adjacent to the memory used to store the next or previous node. Therefore, you can not simply increment or decrement a pointer to a node and expect to to point to the next or previous node. You must traverse a linked list using the links. The link of a node is also just a field of the structure representing the node, often called "next" or "link", but of course you can call it anything you like.

As with general structures, defining the structure of a node is not declaring any variables. In other words, there is no memory set aside for any variable based on the definition of a structure, regardless of whether it is a normal structure definition or the definition of the node in a linked list. Often, no variables will be declared of a node type, but instead will be declared as pointers to that type. The memory to store each node will be allocated (for example, using a malloc statement), and pointed to by such a variable, or by the link of another node. This memory will therefore be come from the heap, and typically only the pointer to the first node will actually exist in the stack (if it is a local) or the static area of memory (if it is a global).

Common operations on linked lists include inserting a node, deleting a node, or searching for a node with certain data. It is also sometimes useful to reverse the nodes in a linked list, thus creating a new list whose first element is the node that used to be the last, and whose last element is the node that used to be the first. There is a common iterative and a common recursive solution for this task, and all of these routines have been covered in class. Routines for inserting a node into a linked list vary depending on how nodes are ordered. For example, new nodes by may added at the start of the list (common if the list is being used to implement a stack), at the end of the list (common if the list is being used to implement a queue, in which case it is likely that a pointer to the last node of the linked list is maintained), or in sorted order (in which case the insert routine is more complicated). If the linked list is being used to implement a stack or a queue, a pop will generally retrieve the value stored in the first node and then delete that node. Otherwise, deleting a node typically entails finding the previous node to the one being deleted. A delete then consists of making the previous node point to the next node, after which the memory associated with the node being deleted can be freed.

Some linked lists are designed such that duplicates of data objects are not represented by separate nodes; instead, one field of each node is a count, indicating the number of times that the data occurs. In this case, an insert is not necessarily inserting a node into the list, but is inserting an instance of the data. If the current data is being seen for the first time, the insert routine must create a node for the data and insert it in the appropriate location; otherwise, the count of the node which represents this data is increased. A delete from such a list may involve decreasing such a count; if the count becomes zero, the program may want to perform a typical delete, making the previous node point to the next node and freeing the memory associated with the deleted node.

Most insert routines and delete routines may need to contain special cases, for example if a new node becomes the new start of the list or if a deleted node was the first node of the list. This particular special case can be handled in a few different ways (in terms of the actual implementation in C). One solution which I personally do not like is to use a "dummy" node that never changes and whose data is ignored. This dummy node always points to the node containing the data of the actual first element of the list. Since the memory location of this node never changes, routines essentially do not have to handle the special case. Another solution, which I sometimes use, is to code the routines to take a pointer to a pointer to a list, instead of just a pointer to the list. Then the pointer to the list in the calling routine can be changed in the called routine. The third solution, which is what I use most commonly, is to have the called routines return a pointer to the first node in the list if there is any chance that this node will be changed. Then the caller can simply pass a normal pointer to the first node in the list, and the return value will be assigned back to the variable in the calling routine that points to this node. If the first node doesn't change, which may be the common case, the assignment doesn't help but doesn't hurt; when the first node does change, the assignment is necessary.

Linked lists are important because they are the simplest way of handling tasks for which the amount of data that a program will need to handle is unknown by both the programmer and the user. If the programmer knows ahead of time how much data there will be (or at least a reasonable maximum), the programmer can use an array to store all of the data (regardless of whether this is an array of structures, integers, or whatever). If the programmer does not know how much data there will be, but the user will decide at the start of the program (or if the amount is indicated in some other way, for example at the start of a file), a dynamic array can be used. For example, in the first simple phone book program we saw in class, after reading the number of entries at the start of a file, a single call to malloc allocates enough memory for all entries. As with a linked list, this memory is coming from the heap, and only the pointer to the start exists in an activation record or static memory. Unlike a linked list, the memory is all consecutive, and can be treated as a simple array. In many real-world applications, however, neither the programmer nor the user can indicate how much data will be needed. For example, applications that allow users to create documents (word processors, spreadsheets, databases, etc.) can not assume a particular maximum size of a document. If such a maximum were assumed, it would likely either be too small (in which case, once reached, the user would not be able to enter any more data - obviously terribly undesirable), or too large (in which case memory would be wasted and the user would possibly run out of memory). A few simple applications of linked lists are discussed as the next topic.