> this is more of a criticism of Unicode than of Python
True, although it's more specifically a criticism of Python for using Unicode, where these kinds of warts are pervasive. See also "\xC7\xB1" (U+01F1 "DZ") which is two bytes, one code point, and two characters with no correspondence to those bytes.
> the answer is to not use combining diacritics
This doesn't actually work, sadly, because you can't represent eg "f̈"[0] without some means of composing arbitrary base characters with arbitrary diacritics.
0: If unicode has a added a specific NFC code point for that particular character, then that's bad example but the general point still stands.
True, although it's more specifically a criticism of Python for using Unicode, where these kinds of warts are pervasive. See also "\xC7\xB1" (U+01F1 "DZ") which is two bytes, one code point, and two characters with no correspondence to those bytes.
> the answer is to not use combining diacritics
This doesn't actually work, sadly, because you can't represent eg "f̈"[0] without some means of composing arbitrary base characters with arbitrary diacritics.
0: If unicode has a added a specific NFC code point for that particular character, then that's bad example but the general point still stands.