List Processing in C

microtonal · on Dec 21, 2010

I guess it's just not very well known that C has functions which can also be passed as values.

Perhaps more interesting and newsworthy in this context are Apple's blocks, which are closures:

maw · on Dec 21, 2010

I thought function pointers were a well known part of C. How long can you program C before needing to call qsort or similar? Or before you encounter (or create) your own method table?

GLib offers closures in C; I've never used them, though, and they don't look very convenient.

JoachimSchipper · on Dec 21, 2010

That's not standard C, though.

microtonal · on Dec 21, 2010

That's why it says 'Apple's blocks' and not 'C99 blocks' or 'C1X blocks'. Though, it is part of mainline clang, so it is not restricted to OS X in any way.

pavlov · on Dec 21, 2010

Of course blocks also requires runtime support. The compiler is useless without a matching runtime library. Is there a production-quality "C-with-blocks" runtime available for other platforms than OS X?

memoryhole · on Dec 21, 2010

I'm not sure if it is production quality, but I have successfully used this to get C-with-blocks in Archlinux. http://compiler-rt.llvm.org/index.html

timrobinson · on Dec 21, 2010

I guess it's just not very well known that C has functions which can also be passed as values.

Sure, C has function pointers. But it doesn't have closures: the things you can do with just a function pointer are fairly limited.

The callback syntax in the post should really be:

    typedef void *(ListMapCallback)(void *item, void *state);

...which gets ugly quickly (passing your closure state around as a void*) and doesn't compose well (passing your closure state into your implementations of List_map_, List_fold_, etc.)

gjm11 · on Dec 21, 2010

It's not only that C doesn't have closures; it doesn't even have anonymous functions. Here's a little Lisp expression that doesn't need closures

  (mapcar #'(lambda (x) (* x x)) some-list)

which squares all the elements of the list. To do that in C you need some abomination like

  /* Somewhere outside the function where you actually need this */
  int square(x) { return x*x; }
  
  /* ... */

    List *r = List_new();
    LIST_FOREACH(self, i, v, List_append_(r, square(v)););

Not to mention that this sort of thing tends to get ugly without automatic memory management.

yason · on Dec 21, 2010

It's not necessarily just ugly but plain idiomatic C. It's just C: a function pointer + userdata pointer.

All hooks, callbacks, etc. use this approach because that's what a closure is on a low enough level. And in C it looks like void()(void userdata).

timrobinson · on Dec 21, 2010

My "ugly" comment comes from spending my career programming against APIs like this :)

    typedef BOOL (* WNDENUMPROC) (HWND, LPARAM);
    BOOL WINAPI EnumWindows(WNDENUMPROC lpEnumFunc, LPARAM lParam);

JoachimSchipper · on Dec 21, 2010

The proper way to do this is (simplified from BSD sys/queue.h):

    #define LIST_FOREACH(var, head) \
        for ((var) = (head); (var) != NULL; (var) = (var)->next);
    LIST_FOREACH(list_p, list_head) {
        ...
    }

You may also occasionally want to use (C99) variadic macros (gcc seems to need -std=c99):

    #define LIST_MAP(my_list, f, ...) \
    	for (struct list *n = my_list; n != NULL; n = n->next) \
    		f(__VA_ARGS__, n)
    
    void list_print(const char *format, struct list *n) {
    	printf(format, n->data);
    }
    
    int main(void) {
    	struct list	 n3 = {NULL, 3},
    			 n2 = {&n3,  2},
    			 n1 = {&n2,  1};
    
    	LIST_MAP(&n1, list_print, "%d\n");
    }

regularfry · on Dec 21, 2010

There was a spectacular hack involving gcc and dynamic linking which let you do closures in C posted to reddit a few months ago. Naturally I can't find it - anyone save the link?

demallien · on Dec 21, 2010

A couple of years back, I wrote an implementation of a list in C that provided a List_ForEach(list, callback, context) function, because I really liked that idiom in Ruby. 2 years down the track, I almost never use that function, preferring to simply do this:

  for (pItem = List_First(pList); pItem; pItem = List_Next(pList))
  {
        //do stuff to pItem
  }

It really isn't that much wordier than the ruby (or lisp, or python etc, they all have similar styles for this):

  list.each do |item|
    #do stuff to item
  end

and as you do it all of the time, you don't even have a high cognitive load, your fingers can pretty much type it without having to engage the brain.

OK, so it's implemented using control structures rather than closures. Is that really such a terrible thing? Am I missing some big advantage here?

djacobs · on Dec 21, 2010

A lot of people use #each this way in Ruby, yes, but I don't think it's usually justified. #each is primarily designed to handle side effects. "Do stuff to item" is usually better accomplished with a chain of #map, #select, #inject, #grep, etc. If "do stuff to item" is a multiline set of steps, sure you can use #each. I prefer, though, to abstract that set of steps into a lambda or named function and then map the function onto each item.

Then again, I'm biased. I really like Ruby for its higher-order Lispiness, and not for its webbiness.

jules · on Dec 21, 2010

Small stuff adds up. Also, in Ruby you can use the same construct to iterate over any collection.

jallmann · on Dec 21, 2010

sys/queue.h has some really useful list macros.

It has FOREACH, HEAD, TAIL, INSERT_*, the works.

http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/sys...

demallien · on Dec 21, 2010

And may it rot in hell for forcing you to add fields into every struct that might find itself in a list at some point in it's life. May it doubly rot in hell for forcing you to add those fields for every list that the object might be a part of. May it triply rot in hell for using unsightly macros that are impossible to debug, particularly when working in an embedded environment.

queue.h - just say no.

rw2- · on Dec 21, 2010

so, what's the alternative? do you have a nice typesafe library for external lists?

demallien · on Dec 21, 2010

If I have to throw typesafety onto the sacrificial bonfire to avoid queue.h, then I do it with a light heart. I really don't see any value in it. You have two possible situations:

1) The list contains only items of one type - in this case you already know what the type of all objects in the list is - it's a list of widgets, so all items are of type Widget. Whilst queue.h will give you a compiler warning if ever you try to put a File into your widget list, is that every really likely to happen in real life???

2) You have a list of mixed-type objects. In that case you need code to identify at runtime what the type of the object was anyhow, so type safety gets you nothing in that case either.

So, considering how little you get back for the decidely annoying problems of macros and polluted data structures, I really have no hesitation in counselling the avoidance of queue.h.

In answer to your question, no, I don't have a nice typesafe library. But I do have a nice non-typesafe library.

rw2- · on Dec 21, 2010

very ugly...

ccan has a much nicer list-for-each: http://ccan.ozlabs.org/info/list.html

alexgartrell · on Dec 21, 2010

Rather than

  #define LIST_FOREACH(list, index, value, code)

I'd prefer

  #define LIST_FOREACH(list, index, value) \
  for(index = 0, value = list->items[0]; \
      index < list->size; \
      index++, value=list->items[index])

It lets you treat a LIST_FOREACH(...) as a normal for(...) in terms of the other syntax.

edit: I did some looking around and determined that the following is probably BS

[1] If you're wondering why I didn't do

  index++, value = list->items[index]

it's because expressions separated by a comma can be evaluated in any order (I believe). My friend ran up against it with a bug in jscore.

dzorz · on Dec 21, 2010

> it's because expressions separated by a comma can be evaluated in any order (I believe).

Expressions separated by the comma operator are always evaluated left to right. On the other hand function arguments (which are separated by comma) can be evaluated in any order.

jpr · on Dec 21, 2010

> Expressions separated by the comma operator are always evaluated left to right.

That's correct (for C), but I feel the need to add that in function calls the parameters are separated by commas, and they can be evaluated in any order.

mfukar · on Dec 21, 2010

In the contexts of a function declaration/definition/call, comma is not acting as an operator.

route66 · on Dec 21, 2010

Your preference looks a lot like the linux-kernel list.h.

(e.g. http://isis.poly.edu/kulesh/stuff/src/klist/).

Requiring idiosyncratic setup of your list struct though.

alexgartrell · on Dec 21, 2010

It's certainly inspired by list.h, but I used the same struct as the original. I just think it's crufty to pass "code" as a parameter to a function (in that I've ever really seen it done well).

Of course, I'm talking only about C, as I love to overuse lambda in Python like it's my job

vog · on Dec 21, 2010

Why don't you simply replace

    index++, value=list->items[index]

with

    value=list->items[++index]

?

alexgartrell · on Dec 21, 2010

I did initially, until I looked up [1] and changed it in the edit. I figure separating it is cleaner.

[1] http://en.wikipedia.org/wiki/Comma_operator

vog · on Dec 22, 2010

But the combined expression solves exactly your problem!

You were not sure whether the increment would always happen before the assignment. In the combined expression you can simply use the pre-increment operator, which will always increment before the rest of the expression.

It's shorter, unambiguous and leaves less questions. How can something be any "cleaner" than that?

Also, the statements "array[index++]" and "array[++index]" are pretty well-understood idioms in C.

alexgartrell · on Dec 21, 2010

The following comment adds no value to the discussion, but I'm kind of childish and don't want to look like a dumbass in front of the cool kids :)

You'll notice that there's a footnote that isn't actually related to anything in the body of the comment. It's because the footnote was wrong; I just felt like it would be dishonest to hide it in the edit. (this is also the reason the edit: I did some looking around...) portion exists.

Interestingly, the edit happened prior to any child posts on the comment, and so was not really a reaction to being corrected. I'd actually googled and found [1], and realized (as someone else said) that function calls don't use the comma as an operator.

tl;dr; never, EVER expect left-to-right evaluation of arguments to a function :)

[1] http://en.wikipedia.org/wiki/Comma_operator

Nitramp · on Dec 21, 2010

He quotes:

For example, LISP, Python, and Ruby all offer beautiful and concise constructs for operating on lists of things.

This might be a matter of taste, but macros, void pointers, lack of closures, etc. don't really spell "beautiful and concise" to me. And I think that's what the original quotation is about - you can get similar things done in Java using anonymous classes, and C using function pointers, and probably more or less any language if you try hard enough, but it's just neither beautiful nor concise.

bitwize · on Dec 21, 2010

I once used function pointers in C to generate -- and iterate over -- menu options in a Mac app I was writing. I felt like such a naughty, naughty boy for doing so.

There's probably a rule in those Joint Strike Fighter C++ guidelines that says function pointers shall not be used.

rbranson · on Dec 21, 2010

What's wrong with function pointers? Did you know that when you call most I/O functions in GNU libc they go through function pointers? It's a bit of faux OOP.

_3u10 · on Dec 21, 2010

Regarding first class functions, I've never heard of the requirement that they be compilable at runtime. Compilation is orthogonal to the definition of a language.

CrLf · on Dec 21, 2010

The tip about using function pointers to iterate a list is good but... macros, really?

cturner · on Dec 21, 2010

Are macros are inherently evil? If you had a convention (e.g. only capitalised words in macros, none in your program propper) then they could exist safely. Perhaps a good DSL could sit on top of C macros to shield developers from the syntax. Are there any good preprocessors that do this?

arethuza · on Dec 21, 2010

I don't think macros are inherently evil - but they certainly facilitate Lovecraftian horrors. I have seen production embedded code where the authors didn't really like C and decided that vaguely Pascal like syntax would be better and did this through macros - probably vaguely influenced by the original Bourne shell sources but with a darker madder twist.

sdevlin · on Dec 21, 2010

It seems like you could say that about any sufficiently powerful construct. Sharper knives require better judgment to be used safely.

This is why we often give novices rules of thumb like "don't use macros" or "don't use goto." I've seen some people criticize this practice, but I think it's actually a good thing; when you're just starting out, you don't know when these things are really called for. Working under some constraints simplifies things.

With experience, you learn the pain points well enough to outgrow those rules and use macros/goto/whatever when called for.

roel_v · on Dec 21, 2010

Nah, macros are cool. Of course you need to use them properly, you can use operator overloading the wrong way too, and you can drink beer to have a good night out with friend or you can become the drunk on the main square asking for nickles from passers by.

A nice macro which implements foreach() for C++ is in boost, (BOOST_FOREACH), in foreach.hpp. It's documentation page has an overview of the design considerations that went into making this a 'safe' macro. It's quite informative for learning what to look out for when writing macros.

silentbicycle · on Dec 21, 2010

Chicken (http://www.call-cc.org/), or anything that compiles to C in general.

adambyrtek · on Dec 21, 2010

Please remember to put the year in the title if a submission is old (in this case it's from 2007).

silentbicycle · on Dec 21, 2010

Is that really necessary? The article is only a few years old. Also, the elements involved (in this case, C) have not changed significantly in that time.

Tagging posts with dates like that IMHO encourages the view that anything technology related is instantly obsolete. Since the industry seems to keep rediscovering old ideas every couple years and then rejecting them as "old", this probably does far more harm than good.

"You want to make your way in the CS field? Simple. Calculate rough time of amnesia (hell, 10 years is plenty, probably 10 months is plenty), go to the dusty archives, dig out something fun, and go for it. It's worked for many people, and it can work for you." - Ron Minnich