[ACCEPTED]-wchar_t vs wint_t-string
wint_t is capable of storing any valid value of 3
wint_t is also capable of taking on the result 2 of evaluating the
WEOF macro (note that a
wchar_t might 1 be too narrow to hold the result).
As @musiphil so nicely put in his comment, which 27 I'll try to expand here, there is a conceptual difference 26 between
Their different sizes are 25 a technical aspect that derives from the 24 fact each has very distinct semantics:
wchar_tis large enough 23 to store characters, or codepoints if you prefer. As such, they 22 are unsigned. They are analogous to
char, which was, in 21 virtually all platforms, limited to 8-bit 20 256 values. So wide-char strings variables are naturally 19 arrays or pointers of this type.
Now enter 18 string functions, some of which need to be able to 17 return any
wchar_tplus additional statuses. So their return 16 type must be larger than
wint_tis used, which 15 can express any wide char and also
WEOF. Being 14 a status, it can also be negative (and usually is), hence 13
wint_tis most likely signed. I say "possibly" because 12 the C standard does not mandate it to be. But regardless 11 of sign, status values need to be outside the range 10 of
wchar_t. They are only useful as return vales, and 9 never meant to store such characters.
The analogy 8 with "classic"
int is great to clear any 7 confusion: strings are not of type
int , they 6 are
char var (or
char *var). And not because
char is "half the size of
int", but because 5 that's what a string is.
Your code looks correct:
c is 4 used to check the result of
getwch() so it is
wint_t. And 3 if its value is not
WEOF, as your
if tests, then 2 it's safe to assign it to a
wchar_t character (or 1 a string array, pointer, etc)
UTF-8 is one possible encoding for Unicode. It 23 defines 1, 2, 3 or 4 bytes per character. When 22 you read it through
getwc(), it will fetch one 21 to four bytes and compose from them a single 20 Unicode character codepoint, which would fit within 19 a
wchar (which can be 16 or even 32 bits wide, depending 18 on platform).
But since Unicode values map 17 to all of the values from
0xFFFF, there are 16 no values left to return condition or error 15 codes in. (Some have pointed out that Unicode 14 is larger than 16 bits, which is true; in 13 those cases surrogate pairs are used. But the point here 12 is that Unicode uses all of the available values 11 leaving none for EOF.)
Various error codes 10 include EOF (
WEOF), which maps to -1. If you 9 were to put the return value of
getwc() in a
wchar, there 8 would be no way to distinguish it from a 7 Unicode
0xFFFF character (which, BTW, is reserved 6 anyway, but I digress).
So the answer is 5 to use a wider type, an
int), which holds at 4 least 32 bits. That gives the lower 16 3 bits for the real value, and anything with 2 a bit set outside of that range means something 1 other than a character returning happened.
Why don't we always use
wchar then instead of
wint? Most string-related functions use
wchar because on most platforms it's ½ the size of
wint, so strings have a smaller memory footprint.
More Related questions