[ACCEPTED]-Unix sort treatment of underscore character-collation
You can set
LC_COLLATE to traditional sort order just 4 for your command:
env LC_COLLATE=C sort tmp
This won't change the current 3 environment just the one in which the sort 2 command executes. You should have the same 1 behaviour with this.
This is likely caused by a difference in 9 locale. In the
en_US.UTF-8 locale, underscores (
_) sort 8 after letters and numbers, whereas in the 7 POSIX
C locale they sort after uppercase letters and numbers, but 6 before lowercase letters.
# won't change LC_COLLATE=C after execution $ LC_COLLATE=C sort filename
You can also use
sort --debug to 5 show more information about the sorting 4 behavior in general:
$ (echo 'foo_bar'; echo 'fooAbar'; echo 'foo0bar'; echo 'fooabar') | LC_COLLATE=en_US.UTF-8 sort --debug sort: using ‘en_US.UTF-8’ sorting rules foo0bar fooabar fooAbar foo_bar $ (echo 'foo_bar'; echo 'fooAbar'; echo 'foo0bar'; echo 'fooabar') | LC_COLLATE=C sort --debug sort: using simple byte comparison foo0bar fooAbar foo_bar fooabar
As also shown in this answer, you 3 can use the above formula to force
LC_COLLATE=C for 2 a single command, without modifying your 1 shell environment:
sort order depends on the current value 4 of the environment variable LC_COLLATE. Check 3 your local documentation for 'locale', 'setlocale', etc. Set 2 LC_COLLATE to 'POSIX' on both machines, and 1 the results should match.
I really liked the answer above with the 21 useful example, i'd just add another string 20 to its list to show how strange the sorting 19 behavior can be:
$ (echo 'foo_bar'; echo 'fooAbar'; echo 'foo0bar'; echo 'fooabar'; echo 'foobbar'; echo 'foobar') | LC_COLLATE=en_US.UTF-8 sort --debug sort: using ‘en_US.UTF-8’ sorting rules foo0bar _______ fooabar _______ fooAbar _______ foobar ______ foo_bar _______ foobbar _______
Seems crazy right ? Found 18 the explanation here, in this case it's 17 because the unicode collation algorithm 16 is being used in this locale : https://unix.stackexchange.com/questions/252419/unexpected-sort-order-in-en-us-utf-8-locale
HOWEVER, even 15 the 'sort --debug' option is not able to easily demonstrate 14 the subtleties that go into the strcoll() function's 13 rules for obeying the locale sorting specification.
POSIX 12 stipulates that locale authors (for all 11 but the C locale) have absolute control 10 over all sorts of fiddly aspects of how 9 strcoll() behaves, and the fact that two vendors 8 declare that their locale is named en_US.UTF-8 does NOT imply/require those 7 two vendors to have the same locale definition. So 6 the collation rules between two different 5 platforms are very likely different, based 4 on whoever wrote the locale file for that 3 platform, and what bug fixes have been incorporated 2 into the locale definition over time.
Thank 1 you Eric Blake at Red Hat for this insight.
There 7 are a number of different locale categories, such 6 as
LC_MESSAGES. You can change them all by 5 setting the environment variable
LANG, or 4 only the collation (sort) order by setting 3 the environment variable
LC_COLLATE. The locale
C or 2
POSIX is a basic locale defined by the standard; others 1 include
en_US (US English),
fr_FR (French), etc.
More Related questions