The WTF::ParkingLot example is interesting because it shows that you don't actually need futexes in the kernel to implement them efficiently; you just need something like a spinlock (with sane backoff!) to guard the userspace wait queue and a per-thread eventfd to wake up waiters.
Yes, you can do a good futex impression in userspace and add any missing functionality you need. Most importantly for webkit, I think, you get portability.
The advantage of futex provided by the kernel is ABI stability and cross process support.