Windows really should provide an API that treats path names as just bytes, without any of these stupid encoding stuff. Could probably have done that when they introduced UNC paths.
Ever since Windows 95 Long File Names for FAT, filenames have been 16-bit characters in their on-disk format. So passing "bytes" means that they need to become wide characters before the filesystem can act on them. And case-sensitivity is still applied, stupidly enough, using locale-specific rules. (Change your locale, and you change how case-insensitive filenames work!)
It is possible to request for a directory to contain case-sensitive files though, and the filesystem will respect that. And if you use the NT Native API, you have no restrictions on filenames, except for the Backslash character. You can even use filenames that Win32 doesn't allow (name with a ":", name with a null byte, file named "con" etc), and every Win32 program will break badly if it tries to access such a file.
It's also possible to use unpaired surrogate characters (D800-DFFF without the matching second part) in a filename. Now you have a file on the disk whose name can't be represented in UTF-8, but the filename is still sitting happily in the filesystem. So people invented "WTF-8" encoding to allow those characters to be represented.
> And case-sensitivity is still applied, stupidly enough, using locale-specific rules. (Change your locale, and you change how case-insensitive filenames work!)
AFAIK, it's even worse: it uses the rules for the locale which was in use when the filesystem was created (it's stored in the $UpCase table in NTFS, or its equivalent in EXFAT). So you could have different case-insensitive rules in a single system, if it has more than one partition and they were formatted with different locales.
IMO, case-insensitive filesystems are an abomination; the case-insensitivity should have been done in the user interface layer, not in the filesystem layer.
> IMO, case-insensitive filesystems are an abomination; the case-insensitivity should have been done in the user interface layer, not in the filesystem layer.
Implementing case-insensitivity in a file picker or something is OK, but doing that throughout your app's runtime is insane since you'd have to hook every file open and then list the directory, whereas in a file picker you're probably listing the directory anyways.
Did not know about $UpCase, the only part I knew was that the FAT16/32 driver from Microsoft (Which has the source code officially released, it's used as an example for how to implement a filesystem on Windows NT) uses locale-specific case-sensitivity tests.
You're right, in the case of FAT16/FAT32 it AFAIK has to use the current system locale, since unlike EXFAT or NTFS there isn't a place in the filesystem to store that locale table.
"\\?\" is strange, because it looks just like a UNC path. But it actually isn't. It's actually a way for Win32 programs to request a path in the NT Object Namespace.
What's the NT Object Namespace? You can use "WinObj" from SysInternals to see it.
The NT Object Namespace uses its own special paths called NT-Native paths. A file might be "C:\hello.txt" as a Win32 path, but as an NT-Native path, it's "\??\C:\hello.txt". "\??\" isn't a prefix, or a escape or anything like that. It's a real directory sitting in the NT Object Namespace named "\??", and it's holding symbolic links to all your drive letters. For instance, on my system, "\??\C:" is a symbolic link that points to "\Device\HarddiskVolume4".
Just like Linux has the "/dev/" directory that holds devices, the NT Object Namespace has a directory named "\Device\" that holds all the devices. You can perform File IO (open files, memory map, device IO control) on these devices, just like on Linux.
"\??\" in addition to your drive letters, also happens to have a symbolic link named "GLOBALROOT" that points back to the NT-Native path "\".
Anyway, back to "\\?\". This is a special prefix that when Win32 sees it, it causes the path to be parsed differently. Many of the checks are removed, and the path is rewritten as an NT-Native path that begins with "\??\". You can even use the Win32 Path "\\?\GLOBALROOT\Device\HarddiskVolume4\" (at least on my PC) as another way to get to your C:\ drive. *Windows Explorer and File Dialogs forbid this style of path.* But 7-Zip File Manager allows it! And regular programs will accept a filename as a command line argument in that format.
Another noteworthy path in "\??\" is "\??\UNC\". It's a symbolic link to "\Device\Mup". From there, you can add on the hostname/IP address, and share name, and access a network share. So in addition to the classic UNC path "\\hostname\sharename", you can also access the share with "\\?\UNC\hostname\sharename" or "\\?\GLOBALROOT\Device\Mup\hostname\sharename".
On Unix the reason for this is that the kernel has no idea what codeset you're using for your strings in user-land, so filesystem-related system calls have to limit themselves to treating just a few ASCII codepoints as such (mainly NUL, `/`, and `.`).