Coder Social home page Coder Social logo

linked-list-good-taste's Introduction

Linked lists, pointer tricks and good taste

Introduction

In a 2016 TED interview (14:10) Linus Torvalds speaks about what he considers good taste in coding. As an example, he presents two implementations of item removal in singly linked lists (reproduced below). In order to remove the first item from a list, one of the implementations requires a special case, the other one does not. Linus, obviously, prefers the latter.

His comment is:

[...] I don't want you to understand why it doesn't have the if statement. But I want you to understand that sometimes you can see a problem in a different way and rewrite it so that a special case goes away and becomes the normal case, and that's good code. [...] -- L. Torvalds

The code snippets he presents are C-style pseudocode and are simple enough to follow. However, as Linus mentions in the comment, the snippets lack a conceptual explanation and it is not immediately evident how the more elegant solution actually works.

The next two sections look at the technical approach in detail and demonstrate how and why the indirect addressing approach is so neat. The last section extends the solution from item deletion to insertion.

The code

The basic data structure for a singly linked list of integers is shown in Figure 1.

linked list
Figure 1: Singly linked list of integers.

Numbers are arbitrarily chosen integer values and arrows indicate pointers. head is a pointer of type list_item * and each of the boxes is an instance of an list_item struct, each with a member variable (called next in the code) of type list_item * that points to the next item.

The C implementation of the data structure is:

struct list_item {
        int value;
        struct list_item *next;
};
typedef struct list_item list_item;

struct list {
        struct list_item *head;
};
typedef struct list list;

We also include a (minimal) API:

/* The textbook version */
void remove_cs101(list *l, list_item *target);
/* A more elegant solution */
void remove_elegant(list *l, list_item *target);

With that in place, let's have a look at the implementations of remove_cs101() and remove_elegant(). The code of these examples is true to the pseudocode from Linus' example and also compiles and runs.

The CS101 version

simple data model
Figure 2: The conceptual model for the list data structure in the CS101 algorithm.

void remove_cs101(list *l, list_item *target)
{
        list_item *cur = l->head, *prev = NULL;
        while (cur != target) {
                prev = cur;
                cur = cur->next;
        }
        if (prev)
                prev->next = cur->next;
        else
                l->head = cur->next;
}

The standard CS101 approach makes use of two traversal pointers cur and prev, marking the current and previous traversal position in the list, respectively. cur starts at the list head head, and advances until the target is found. prev starts at NULL and is subsequently updated with the previous value of cur every time cur advances. After the target is found, the algorithm tests if prev is non-NULL. If yes, the item is not at the beginning of the list and the removal consists of re-routing the linked list around cur. If prev is NULL, cur is pointing to the first element in the list, in which case, removal means moving the list head forward.

A more elegant solution

The more elegant version has less code and does not require a separate branch to deal with deletion of the first element in a list.

void remove_elegant(list *l, list_item *target)
{
        list_item **p = &l->head;
        while (*p != target)
                p = &(*p)->next;
        *p = target->next;
}

The code uses an indirect pointer p that holds the address of a pointer to a list item, starting with the address of head. In every iteration, that pointer is advanced to hold the address of the pointer to the next list item, i.e. the address of the next element in the current list_item. When the pointer to the list item *p equals target, we exit the search loop and remove the item from the list.

How does it work?

The key insight is that using an indirect pointer p has two conceptual benefits:

  1. It allows us to interpret the linked list in a way that makes the head pointer an integral part the data structure. This eliminates the need for a special case to remove the first item.
  2. It also allows us to evaluate the condition of the while loop without having to let go of the pointer that points to target. This allows us to modify the pointer that points to target and to get away with a single iterator as opposed to prev and cur.

Let's look each of these points in turn.

Integrating the head pointer

The standard model interprets the linked list as a sequence of list_item instances. The beginning of the sequence can be accessed through a head pointer. This leads to the conceptual model illustrated in Figure 2 above. The head pointer is merely considered as a handle to access the start of the list. prev and cur are pointers of type list_item * and always point to an item or NULL.

The elegant implementation uses indirect addressing scheme that yields a different view on the data structure:

Data model for indirect addressing
Figure 3: The conceptual model for the list data structure in the more elegant approach.

Here, p is of type list_item ** and holds the address of the pointer to the current list item. When we advance the pointer, we forward to the address of the pointer to the next list item.

In code, this translates to p = &(*p)->next, meaning we

  1. (*p): dereference the address to the pointer to the current list item
  2. ->next: dereference that pointer again and select the field that holds the address of the next list item
  3. &: take the address of that address field (i.e. get a pointer to it)

This corresponds to an interpretation of the data structure where the list is a a sequence of pointers to list_items (cf. Figure 3).

Maintaining a handle

An additional benefit of that particular interpretation is that it supports editing the next pointer of the predecessor of the current item throughout the entire traversal.

With p holding the address of a pointer to a list item, the comparison in the search loop becomes

while (*p != target)

The search loop will exit if *p equals target, and once it does, we are still able to modify *p since we hold its address p. Thus, despite iterating the loop until we hit target, we still maintain a handle (the address of the next field or the head pointer) that can be used to directly modify the pointer that points to the item.

This is the reason why we can modify the incoming pointer to an item to point to a different location using *p = target->next and why we do not need prev and cur pointers to traverse the list for item removal.

Going beyond

It turns out that the idea behind remove_elegant() can be applied to yield a particularly concise implementation of another function in the list API: insert_before(), i.e. inserting a given item before another one.

Inserting before existing items

First, let's add the following declaration to the list API in list.h:

void insert_before(list *l, list_item *before, list_item *item);

The function will take a pointer to a list l, a pointer before to an item in that list and a pointer to a new list item item that the function will insert before before.

Quick refactor

Before we move on, we refactor the search loop into a separate function

static inline list_item **find_indirect(list *l, list_item *target)
{
        list_item **p = &l->head;
        while (*p != target)
                p = &(*p)->next;
        return p;
}

and use that function in remove_elegant() like so

void remove_elegant(list *l, list_item *target)
{
        list_item **p = find_indirect(l, target);
        *p = target->next;
}

Implementing insert_before()

Using find_indirect(), it is straightforward to implement insert_before():

void insert_before(list *l, list_item *before, list_item *item)
{
        list_item **p = find_indirect(l, before);
        *p = item;
        item->next = before;
}

A particularly beautiful outcome is that the implementation has consistent semantics for the edge cases: if before points to the list head, the new item will be inserted at the beginning of the list, if before is NULL or invalid (i.e. the item does not exist in l), the new item will be appended at the end.

Conclusion

The premise of the more elegant solution for item deletion is a single, simple change: using an indirect list_item ** pointer to iterate over the pointers to the list items. Everything else flows from there: there is no need for a special case or branching and a single iterator is sufficient to find and remove the target item. It also turns out that the same approach provides an elegant solution for item insertion in general and for insertion before an existing item in particular.

So, going back to Linus' initial comment: is it good taste? Hard to say, but it's certainly a different, creative and very elegant solution to a well-known CS task.

linked-list-good-taste's People

Contributors

felipec avatar mkirchner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linked-list-good-taste's Issues

test doesn't compile on GCC

gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0

reports

$make test
gcc -Wall -O2 -o test_list test_list.c
test_list.c:7:20: error: variably modified ‘items’ at file scope
    7 | static IntListItem items[n];
      |                    ^~~~~
make: *** [Makefile:7: test] Error 1

in line 6 n is defined as a const size_t, C requires compile-time constant expressions, const in C doesn't mean that

by changing

const size_t n = 1000;

to

#define n 1000

the problem gets fixed

A question about the implementation

As knowing the node to remove, why traverse the linked list instead of doing something such as:

target->value=target->next->value;
target->next=target->next->next;

Will this result in memory leaks?

Will both examples (CS101 and elegant) result in memory leaks because the target not is not freed after being removed from the list?

Value of simplicity

I agree that the indirect handle approach has a subjective beauty to it.

There is also something to be said for code that can be understood with a cursory glance, and do not require explanations.

pointer vs value equality

I was confused for the longest time why pointers were being tested for equality, rather than the value. was there a specific reason for this?

seems a bit silly to need to know memory location of target to remove it, and it renders the value element to be unnecessary in this example, along with the whole thing about the integers being randomized.

I found out the **p can be removed, which leads to lesser codes, but I don't know whether it's better or worse since it introduces a warning, what do you think?

with **p:

static inline list_item **find_indirect(list *l, list_item *target)
{
        list_item **p = &l->head;
        while (*p != target)
                p = &(*p)->next;
        return p;
}

without **p:

static inline list_item **find_indirect(list *l, list_item *target)
{
        while (l->head != target)
                l = &(l->head)->next;
        return l;
}

It works, but it introduces a warning:

Incompatible pointer types assigning to 'list *' (aka 'struct list *') from 'struct list_item **'

I'm not familiar with C, I don't know whether it's better or worse, what do you think?

Related to derivatives of data structures

I couldn't help noticing that the technique used here is related to the same used (in a functional setting) of the concept of a derivative of a data structure, aka "zipper" https://en.wikipedia.org/wiki/Zipper_(data_structure) . The idea is to use a "hole", an indirection, to hang onto the context of a place where things may be modified. It can be applied (and in a mechanically standardized way, "differentiation") to all kinds of tree structures beyond lists. Very elegant indeed. You may want to point to the literature on derivatives of data structures.

About Linus taste

Since you're quoting Linus Torvalds about his opinions of what is good and bad taste, I'd like to point out that he consider typedef struct list_item list_item; to be very bad taste.

Please don’t use things like vps_t. It’s a mistake to use typedef for structures and pointers. When you see a
vps_t a;
in the source, what does it mean? In contrast, if it says
struct virtual_container *a;
you can actually tell what a is.

Lots of people think that typedefs help readability. Not so. They are useful only for:
[Some listed rules. See link to read them.]
Maybe there are other cases too, but the rule should basically be to NEVER EVER use a typedef unless you can clearly match one of those rules.

In general, a pointer, or a struct that has elements that can reasonably be directly accessed should never be a typedef.

https://www.kernel.org/doc/html/v4.10/process/coding-style.html#typedefs

Crash

To me elegant code is not only efficient but also bulletproof.

remove_elegant will crash if:

  1. List is empty, and head is NULL
  2. target is not on the list. (does not check for final node's NULL)

There is NO NULL testing in there at all. It totally assumes the list is never null and target is always on it.
Poor code IMO.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.