Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
List Processing in C (dekorte.com)
43 points by gnosis on Dec 21, 2010 | hide | past | favorite | 42 comments


I guess it's just not very well known that C has functions which can also be passed as values.

Perhaps more interesting and newsworthy in this context are Apple's blocks, which are closures:

http://thirdcog.eu/pwcblocks/


I thought function pointers were a well known part of C. How long can you program C before needing to call qsort or similar? Or before you encounter (or create) your own method table?

GLib offers closures in C; I've never used them, though, and they don't look very convenient.


That's not standard C, though.


That's why it says 'Apple's blocks' and not 'C99 blocks' or 'C1X blocks'. Though, it is part of mainline clang, so it is not restricted to OS X in any way.


Of course blocks also requires runtime support. The compiler is useless without a matching runtime library. Is there a production-quality "C-with-blocks" runtime available for other platforms than OS X?


I'm not sure if it is production quality, but I have successfully used this to get C-with-blocks in Archlinux. http://compiler-rt.llvm.org/index.html


I guess it's just not very well known that C has functions which can also be passed as values.

Sure, C has function pointers. But it doesn't have closures: the things you can do with just a function pointer are fairly limited.

The callback syntax in the post should really be:

    typedef void *(ListMapCallback)(void *item, void *state);
...which gets ugly quickly (passing your closure state around as a void*) and doesn't compose well (passing your closure state into your implementations of List_map_, List_fold_, etc.)


It's not only that C doesn't have closures; it doesn't even have anonymous functions. Here's a little Lisp expression that doesn't need closures

  (mapcar #'(lambda (x) (* x x)) some-list)
which squares all the elements of the list. To do that in C you need some abomination like

  /* Somewhere outside the function where you actually need this */
  int square(x) { return x*x; }
  
  /* ... */

    List *r = List_new();
    LIST_FOREACH(self, i, v, List_append_(r, square(v)););
Not to mention that this sort of thing tends to get ugly without automatic memory management.


It's not necessarily just ugly but plain idiomatic C. It's just C: a function pointer + userdata pointer.

All hooks, callbacks, etc. use this approach because that's what a closure is on a low enough level. And in C it looks like void()(void userdata).


My "ugly" comment comes from spending my career programming against APIs like this :)

    typedef BOOL (* WNDENUMPROC) (HWND, LPARAM);
    BOOL WINAPI EnumWindows(WNDENUMPROC lpEnumFunc, LPARAM lParam);


The proper way to do this is (simplified from BSD sys/queue.h):

    #define LIST_FOREACH(var, head) \
        for ((var) = (head); (var) != NULL; (var) = (var)->next);
    LIST_FOREACH(list_p, list_head) {
        ...
    }
You may also occasionally want to use (C99) variadic macros (gcc seems to need -std=c99):

    #define LIST_MAP(my_list, f, ...) \
    	for (struct list *n = my_list; n != NULL; n = n->next) \
    		f(__VA_ARGS__, n)
    
    void list_print(const char *format, struct list *n) {
    	printf(format, n->data);
    }
    
    int main(void) {
    	struct list	 n3 = {NULL, 3},
    			 n2 = {&n3,  2},
    			 n1 = {&n2,  1};
    
    	LIST_MAP(&n1, list_print, "%d\n");
    }


There was a spectacular hack involving gcc and dynamic linking which let you do closures in C posted to reddit a few months ago. Naturally I can't find it - anyone save the link?


A couple of years back, I wrote an implementation of a list in C that provided a List_ForEach(list, callback, context) function, because I really liked that idiom in Ruby. 2 years down the track, I almost never use that function, preferring to simply do this:

  for (pItem = List_First(pList); pItem; pItem = List_Next(pList))
  {
        //do stuff to pItem
  }
It really isn't that much wordier than the ruby (or lisp, or python etc, they all have similar styles for this):

  list.each do |item|
    #do stuff to item
  end
and as you do it all of the time, you don't even have a high cognitive load, your fingers can pretty much type it without having to engage the brain.

OK, so it's implemented using control structures rather than closures. Is that really such a terrible thing? Am I missing some big advantage here?


A lot of people use #each this way in Ruby, yes, but I don't think it's usually justified. #each is primarily designed to handle side effects. "Do stuff to item" is usually better accomplished with a chain of #map, #select, #inject, #grep, etc. If "do stuff to item" is a multiline set of steps, sure you can use #each. I prefer, though, to abstract that set of steps into a lambda or named function and then map the function onto each item.

Then again, I'm biased. I really like Ruby for its higher-order Lispiness, and not for its webbiness.


Small stuff adds up. Also, in Ruby you can use the same construct to iterate over any collection.


sys/queue.h has some really useful list macros.

It has FOREACH, HEAD, TAIL, INSERT_*, the works.

http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/sys...


And may it rot in hell for forcing you to add fields into every struct that might find itself in a list at some point in it's life. May it doubly rot in hell for forcing you to add those fields for every list that the object might be a part of. May it triply rot in hell for using unsightly macros that are impossible to debug, particularly when working in an embedded environment.

queue.h - just say no.


so, what's the alternative? do you have a nice typesafe library for external lists?


If I have to throw typesafety onto the sacrificial bonfire to avoid queue.h, then I do it with a light heart. I really don't see any value in it. You have two possible situations:

1) The list contains only items of one type - in this case you already know what the type of all objects in the list is - it's a list of widgets, so all items are of type Widget. Whilst queue.h will give you a compiler warning if ever you try to put a File into your widget list, is that every really likely to happen in real life???

2) You have a list of mixed-type objects. In that case you need code to identify at runtime what the type of the object was anyhow, so type safety gets you nothing in that case either.

So, considering how little you get back for the decidely annoying problems of macros and polluted data structures, I really have no hesitation in counselling the avoidance of queue.h.

In answer to your question, no, I don't have a nice typesafe library. But I do have a nice non-typesafe library.


very ugly...

ccan has a much nicer list-for-each: http://ccan.ozlabs.org/info/list.html


Rather than

  #define LIST_FOREACH(list, index, value, code) 
I'd prefer

  #define LIST_FOREACH(list, index, value) \
  for(index = 0, value = list->items[0]; \
      index < list->size; \
      index++, value=list->items[index])
It lets you treat a LIST_FOREACH(...) as a normal for(...) in terms of the other syntax.

edit: I did some looking around and determined that the following is probably BS

[1] If you're wondering why I didn't do

  index++, value = list->items[index]
it's because expressions separated by a comma can be evaluated in any order (I believe). My friend ran up against it with a bug in jscore.


> it's because expressions separated by a comma can be evaluated in any order (I believe).

Expressions separated by the comma operator are always evaluated left to right. On the other hand function arguments (which are separated by comma) can be evaluated in any order.


> Expressions separated by the comma operator are always evaluated left to right.

That's correct (for C), but I feel the need to add that in function calls the parameters are separated by commas, and they can be evaluated in any order.


In the contexts of a function declaration/definition/call, comma is not acting as an operator.


Your preference looks a lot like the linux-kernel list.h.

(e.g. http://isis.poly.edu/kulesh/stuff/src/klist/).

Requiring idiosyncratic setup of your list struct though.


It's certainly inspired by list.h, but I used the same struct as the original. I just think it's crufty to pass "code" as a parameter to a function (in that I've ever really seen it done well).

Of course, I'm talking only about C, as I love to overuse lambda in Python like it's my job


Why don't you simply replace

    index++, value=list->items[index]
with

    value=list->items[++index]

?


I did initially, until I looked up [1] and changed it in the edit. I figure separating it is cleaner.

[1] http://en.wikipedia.org/wiki/Comma_operator


But the combined expression solves exactly your problem!

You were not sure whether the increment would always happen before the assignment. In the combined expression you can simply use the pre-increment operator, which will always increment before the rest of the expression.

It's shorter, unambiguous and leaves less questions. How can something be any "cleaner" than that?

Also, the statements "array[index++]" and "array[++index]" are pretty well-understood idioms in C.


The following comment adds no value to the discussion, but I'm kind of childish and don't want to look like a dumbass in front of the cool kids :)

You'll notice that there's a footnote that isn't actually related to anything in the body of the comment. It's because the footnote was wrong; I just felt like it would be dishonest to hide it in the edit. (this is also the reason the edit: I did some looking around...) portion exists.

Interestingly, the edit happened prior to any child posts on the comment, and so was not really a reaction to being corrected. I'd actually googled and found [1], and realized (as someone else said) that function calls don't use the comma as an operator.

tl;dr; never, EVER expect left-to-right evaluation of arguments to a function :)

[1] http://en.wikipedia.org/wiki/Comma_operator


He quotes:

For example, LISP, Python, and Ruby all offer beautiful and concise constructs for operating on lists of things.

This might be a matter of taste, but macros, void pointers, lack of closures, etc. don't really spell "beautiful and concise" to me. And I think that's what the original quotation is about - you can get similar things done in Java using anonymous classes, and C using function pointers, and probably more or less any language if you try hard enough, but it's just neither beautiful nor concise.


I once used function pointers in C to generate -- and iterate over -- menu options in a Mac app I was writing. I felt like such a naughty, naughty boy for doing so.

There's probably a rule in those Joint Strike Fighter C++ guidelines that says function pointers shall not be used.


What's wrong with function pointers? Did you know that when you call most I/O functions in GNU libc they go through function pointers? It's a bit of faux OOP.


Regarding first class functions, I've never heard of the requirement that they be compilable at runtime. Compilation is orthogonal to the definition of a language.


The tip about using function pointers to iterate a list is good but... macros, really?


Are macros are inherently evil? If you had a convention (e.g. only capitalised words in macros, none in your program propper) then they could exist safely. Perhaps a good DSL could sit on top of C macros to shield developers from the syntax. Are there any good preprocessors that do this?


I don't think macros are inherently evil - but they certainly facilitate Lovecraftian horrors. I have seen production embedded code where the authors didn't really like C and decided that vaguely Pascal like syntax would be better and did this through macros - probably vaguely influenced by the original Bourne shell sources but with a darker madder twist.


It seems like you could say that about any sufficiently powerful construct. Sharper knives require better judgment to be used safely.

This is why we often give novices rules of thumb like "don't use macros" or "don't use goto." I've seen some people criticize this practice, but I think it's actually a good thing; when you're just starting out, you don't know when these things are really called for. Working under some constraints simplifies things.

With experience, you learn the pain points well enough to outgrow those rules and use macros/goto/whatever when called for.


Nah, macros are cool. Of course you need to use them properly, you can use operator overloading the wrong way too, and you can drink beer to have a good night out with friend or you can become the drunk on the main square asking for nickles from passers by.

A nice macro which implements foreach() for C++ is in boost, (BOOST_FOREACH), in foreach.hpp. It's documentation page has an overview of the design considerations that went into making this a 'safe' macro. It's quite informative for learning what to look out for when writing macros.


Chicken (http://www.call-cc.org/), or anything that compiles to C in general.


Please remember to put the year in the title if a submission is old (in this case it's from 2007).


Is that really necessary? The article is only a few years old. Also, the elements involved (in this case, C) have not changed significantly in that time.

Tagging posts with dates like that IMHO encourages the view that anything technology related is instantly obsolete. Since the industry seems to keep rediscovering old ideas every couple years and then rejecting them as "old", this probably does far more harm than good.

"You want to make your way in the CS field? Simple. Calculate rough time of amnesia (hell, 10 years is plenty, probably 10 months is plenty), go to the dusty archives, dig out something fun, and go for it. It's worked for many people, and it can work for you." - Ron Minnich




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: