[ACCEPTED]-wchar_t vs wint_t-string

Accepted answer
Score: 23

wint_t is capable of storing any valid value of 3 wchar_t. A wint_t is also capable of taking on the result 2 of evaluating the WEOF macro (note that a wchar_t might 1 be too narrow to hold the result).

Score: 9

As @musiphil so nicely put in his comment, which 27 I'll try to expand here, there is a conceptual difference 26 between wint_t and wchar_t.

Their different sizes are 25 a technical aspect that derives from the 24 fact each has very distinct semantics:

  • wchar_t is large enough 23 to store characters, or codepoints if you prefer. As such, they 22 are unsigned. They are analogous to char, which was, in 21 virtually all platforms, limited to 8-bit 20 256 values. So wide-char strings variables are naturally 19 arrays or pointers of this type.

  • Now enter 18 string functions, some of which need to be able to 17 return any wchar_t plus additional statuses. So their return 16 type must be larger than wchar_t. So wint_t is used, which 15 can express any wide char and also WEOF. Being 14 a status, it can also be negative (and usually is), hence 13 wint_t is most likely signed. I say "possibly" because 12 the C standard does not mandate it to be. But regardless 11 of sign, status values need to be outside the range 10 of wchar_t. They are only useful as return vales, and 9 never meant to store such characters.

The analogy 8 with "classic" char and int is great to clear any 7 confusion: strings are not of type int [], they 6 are char var[] (or char *var). And not because char is "half the size of int", but because 5 that's what a string is.

Your code looks correct: c is 4 used to check the result of getwch() so it is wint_t. And 3 if its value is not WEOF, as your if tests, then 2 it's safe to assign it to a wchar_t character (or 1 a string array, pointer, etc)

Score: 4

UTF-8 is one possible encoding for Unicode. It 23 defines 1, 2, 3 or 4 bytes per character. When 22 you read it through getwc(), it will fetch one 21 to four bytes and compose from them a single 20 Unicode character codepoint, which would fit within 19 a wchar (which can be 16 or even 32 bits wide, depending 18 on platform).

But since Unicode values map 17 to all of the values from 0x0000 to 0xFFFF, there are 16 no values left to return condition or error 15 codes in. (Some have pointed out that Unicode 14 is larger than 16 bits, which is true; in 13 those cases surrogate pairs are used. But the point here 12 is that Unicode uses all of the available values 11 leaving none for EOF.)

Various error codes 10 include EOF (WEOF), which maps to -1. If you 9 were to put the return value of getwc() in a wchar, there 8 would be no way to distinguish it from a 7 Unicode 0xFFFF character (which, BTW, is reserved 6 anyway, but I digress).

So the answer is 5 to use a wider type, an wint_t (or int), which holds at 4 least 32 bits. That gives the lower 16 3 bits for the real value, and anything with 2 a bit set outside of that range means something 1 other than a character returning happened.

Why don't we always use wchar then instead of wint? Most string-related functions use wchar because on most platforms it's ½ the size of wint, so strings have a smaller memory footprint.

More Related questions