[ACCEPTED]-What's the difference between --general-numeric-sort and --numeric-sort options in gnu sort-sorting

Accepted answer
Score: 91

General numeric sort compares the numbers 39 as floats, this allows scientific notation 38 eg 1.234E10 but is slower and subject to 37 rounding error (1.2345678 could come after 36 1.2345679), numeric sort is just a regular 35 alphabetic sort that knows 10 comes after 34 9.

See http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html

‘-g’ ‘--general-numeric-sort’ ‘--sort=general-numeric’ Sort numerically, using 33 the standard C function strtod to convert 32 a prefix of each line to a double-precision floating 31 point number. This allows floating point 30 numbers to be specified in scientific 29 notation, like 1.0e-34 and 10e100. The 28 LC_NUMERIC locale determines the decimal-point character. Do 27 not report overflow, underflow, or conversion 26 errors. Use the following collating sequence: Lines 25 that do not start with numbers (all considered 24 to be equal). NaNs (“Not a Number” values, in 23 IEEE floating point arithmetic) in a consistent 22 but machine-dependent order. Minus infinity. Finite numbers 21 in ascending numeric order (with -0 and 20 +0 equal). Plus infinity.

Use this option 19 only if there is no alternative; it is 18 much slower than --numeric-sort (-n) and 17 it can lose information when converting 16 to floating point.

‘-n’ ‘--numeric-sort’ ‘--sort=numeric’ Sort 15 numerically. The number begins each line 14 and consists of optional blanks, an optional 13 ‘-’ sign, and zero or more digits possibly 12 separated by thousands separators, optionally followed 11 by a decimal-point character and zero 10 or more digits. An empty number is treated 9 as ‘0’. The LC_NUMERIC locale specifies 8 the decimal-point character and thousands separator. By 7 default a blank is a space or a tab, but 6 the LC_CTYPE locale can change this.

Comparison 5 is exact; there is no rounding error.

Neither 4 a leading ‘+’ nor exponential notation 3 is recognized. To compare such strings 2 numerically, use the --general-numeric-sort 1 (-g) option.

Score: 12

You should be careful with your locale. For 7 example, you might intend to sort a floating 6 number (like 2.2) whereas your locale might 5 expect the use of a comma (like 2,2).

As 4 reported in this forum, you may have wrong results 3 using the -n or -g flags.

In my case I use:

LC_ALL=C sort -k 6,6n file

in 2 order to sort the 6th column that contains:

2.5
3.7
1.4

in 1 order to obtain

1.4
2.5
3.7
Score: 0

In addition to the accepted answer which 20 mention -g allow scientific notation, I want to shows the part which 19 most likely causes undesirable behavior.

With 18 -g:

$ LC_COLLATE=fr_FR.UTF-8 LC_NUMERIC=en_US.UTF-8 sort -g myfile
baa
--inf
--inf  
--inf- 
--inf--
--inf-a
--nnf
nnf--
   nnn  
tnan
zoo
   naN
Nana
nani lol
-inf
-inf--
-11
-2
-1
1
+1
2
+2
0xa
11
+11
inf

Look at the zoo, three important things here:

  • Line 17 starts with NAN(e.g. Nana and nani lol) or -INF(single dash, not 16 --INF) move to end but before digits. While INF move 15 to the last after digits because it means infinity.

  • The NAN, INF, and 14 -INF are case insensitive.

  • The lines always ignore whitespace from either side of NAN, INF, -INF (regardless of LC_CTYPE). Other 13 alphabetic may ignore whitespace from either 12 side depends on locale LC_COLLATE (e.g. LC_COLLATE=fr_FR.UTF-8 ignore but 11 LC_COLLATE=us_EN.UTF-8 not ignore).

So if you are sorting arbitrary alphanumeric then 10 you probably don't want -g. If you really 9 need scientific notation comparison with 8 -g, then you probably want to extract alphabet 7 and numeric data and do comparison separately.

If you only need 6 ordinary number(e.g. 1, -1) sorting, and feel 5 that 0x/E/+ sorting not important, just use -n enough:

$ LC_COLLATE=fr_FR.UTF-8 LC_NUMERIC=en_US.UTF-8 sort -n myfile
-1000
-22
-13
-11
-010
-10
-5
-2
-1
-0.2
-0.12
-0.11
-0.1
0x1
0x11
0xb
+1
+11
+2
-a
-aa
--aa
-aaa
-b
baa
BAA
bbb
+ignore
inf
-inf
--inf
--inf  
--inf- 
--inf--
-inf--
--inf-a
   naN
Nana
nani lol
--nnf
nnf--
   nnn  
None         
uum
Zero cool
-zzz
1
1.1
1.234E10
5
11

Either 4 of -g or -n, be aware of locale effect. You may want to specify 3 LC_NUMERIC as us_EN.UTF-8 to avoid fr_FR.UTF-8 sort - with floating number failed:

$ LC_COLLATE=fr_FR.UTF-8 LC_NUMERIC=fr_FR.UTF-8 sort -n myfile
-10
-5
-2
-1
-1.1
-1.2
-0.1
-0.11
-0.12
-0.2
-a
+b
middle
-wwe
+zoo
1
1.1

With LC_NUMERIC=en_US.UTF-8:

$ LC_COLLATE=fr_FR.UTF-8 LC_NUMERIC=en_US.UTF-8 sort -n myfile
-10
-5
-2
-1.2
-1.1
-1
-0.2
-0.12
-0.11
-0.1
-a
+b
middle
-wwe
+zoo
1
1.1

Or LC_NUMERIC=us_EN.UTF-8 to group +|-|space with alpha:

$ LC_COLLATE=fr_FR.UTF-8 LC_NUMERIC=us_EN.UTF-8 sort -n myfile
-0.1
    a
    b
 a
 b
+b
+zoo
-a
-wwe
middle
1

You probably 2 want to specify locale when using sort if want to 1 write portable script.

More Related questions