SORT(1) | General Commands Manual | SORT(1) |
sort
— sort or
merge records (lines) of text and binary files
sort |
[-bcCdfghiRMmnrsuVz ]
[-k field1[,field2]]
[-S memsize]
[-T dir]
[-t char]
[-o output]
[file ...] |
sort |
--help |
sort |
--version |
The sort
utility sorts text and binary
files by lines. A line is a record separated from the subsequent record by a
newline (default) or NUL ´\0´ character (-z option). A record
can contain any printable or unprintable characters. Comparisons are based
on one or more sort keys extracted from each line of input, and are
performed lexicographically, according to the current locale's collating
rules and the specified command-line options that can tune the actual
sorting behavior. By default, if keys are not given,
sort
uses entire lines for comparison.
The command line options are as follows:
-c
,
--check
, -C
,
--check=silent|quiet
sort
produces the appropriate error messages and
exits with code 1, otherwise returns 0. If -C
or
--check=silent
is specified,
sort
produces no output. This is a
"silent" version of -c
.-m
,
--merge
-o
output,
--output
=output-S
size,
--buffer-size
=sizesort
takes up to about 90%
of available memory. If the file size is too big to fit into the memory
buffer, the temporary disk files are used to perform the sorting.-T
dir,
--temporary-directory
=dirTMPDIR
or /var/tmp if
TMPDIR
is not defined.-u
,
--unique
-s
, implies a stable sort. If used with
-c
or -C
,
sort
also checks that there are no lines with
duplicate keys.-s
--version
--help
The following options override the default ordering rules. When
ordering options appear independently of key field specifications, they
apply globally to all sort keys. When attached to a specific key (see
-k
), the ordering options override all global
ordering options for the key they are attached to.
-b
,
--ignore-leading-blanks
-d
,
--dictionary-order
-f
,
--ignore-case
-g
,
--general-numeric-sort
,
--sort=general-numeric
-n
,
this option handles general floating points. It has a more permissive
format than that allowed by -n
but it has a
significant performance drawback.-h
,
--human-numeric-sort
,
--sort=human-numeric
-h
or -H
options
(human-readable).-i
,
--ignore-nonprinting
-M
,
--month-sort
,
--sort=month
-n
,
--numeric-sort
,
--sort=numeric
-R
,
--random-sort
,
--sort=random
/dev/random
content, or by file content if it is specified by
--random-source
. Even if multiple sort fields are
specified, the same random hash function is used for all of them.-r
,
--reverse
-V
,
--version-sort
The treatment of field separators can be altered using these options:
-b
,
--ignore-leading-blanks
-k
). If
-b
is specified before the first
-k
option, it applies globally to all key
specifications. Otherwise, -b
can be attached
independently to each field argument of the key
specifications. Note that sort keys specified with the
-k
option may have a variable number of leading
whitespace characters that will affect the result, as described below in
the -t
option description.-k
field1[,field2],
--key
=field1[,field2]-k
option may be specified multiple times, in
which case subsequent keys are compared when earlier keys compare equal.
The -k
option replaces the obsolete options
+
pos1 and
-
pos2, but the old notation
is also supported.-t
char,
--field-separator
=char-t
is not specified,
the default field separator is a sequence of blank space characters, and
consecutive blank spaces do
not
delimit an empty field, however, the initial blank space
is
considered part of a field when determining key offsets. To use NUL as
field separator, use -t
´\0´.-z
,
--zero-terminated
Other options:
--batch-size
=numsort
at once. This option affects behavior when
having many input files or using temporary files. The default value is
16.--compress-program
=PROGRAM-d
it must decompress standard input to
standard output. If PROGRAM fails, sort
must exit
with error. An example of PROGRAM that can be used here is bzip2.--random-source
=filename/dev/random
is used.--debug
--parallel
--files0-from
=filename--radixsort
--mergesort
--qsort
-u
and
-s
.--heapsort
-u
and
-s
.--mmap
The following operands are available:
-
, the
standard input is used.A field is defined as a maximal sequence of characters other than
the field separator and record separator (newline by default). Initial blank
spaces are included in the field unless -b
has been
specified; the first blank space of a sequence of blank spaces acts as the
field separator and is included in the field (unless
-t
is specified). For example, all blank spaces at
the beginning of a line are considered to be part of the first field.
Fields are specified by the
-k
field1[,field2]
command-line option. If field2 is missing, the end of
the key defaults to the end of the line.
The arguments field1 and
field2 have the form m.n
(m,n > 0) and can
be followed by one or more of the modifiers b
,
d
, f
,
i
, n
,
g
, M
and
r
, which correspond to the options discussed above.
When b
is specified it applies only to
field1 or field2 where it is
specified while the rest of the modifiers apply to the whole key field
regardless if they are specified only with field1 or
field2 or both. A field1
position specified by m.n is interpreted as the
nth character from the beginning of the
mth field. A missing .n in
field1 means
‘.1
’, indicating the first character
of the mth field; if the -b
option
is in effect, n is counted from the first non-blank
character in the mth field; m.1b refers
to the first non-blank character in the mth field.
1.n refers to the
nth character from the beginning of the line; if
n is greater than the length of the line, the field is
taken to be empty.
nth positions are always counted from the field beginning, even if the field is shorter than the number of specified positions. Thus, the key can really start from a position in a subsequent field.
A field2 position specified by
m.n is interpreted as the nth character
(including separators) from the beginning of the mth
field. A missing .n indicates the last character of the
mth field; m = 0 designates the end of a
line. Thus the option -k
v.x,w.y is synonymous with the obsolete option
+
v-1.x-1
-
w-1.y; when
y is omitted,
-k
v.x,w is synonymous with
+
v-1.x-1
-
w.0. The obsolete
+
pos1
-
pos2 option is still
supported, except for -
w.0b,
which has no -k
equivalent.
LC_COLLATE
LC_CTYPE
LC_MESSAGES
sort
prints out.LC_NUMERIC
LC_TIME
LC_ALL
LANG
LC_ALL
are set.TMPDIR
TMPDIR
may be overridden by the
-T
option.GNUSORT_NUMERIC_COMPATIBILITY
-t
will not override the locale numeric
symbols, that is, thousand separators and decimal separators. By default,
if we specify -t
with the same symbol as the
thousand separator or decimal point, the symbol will be treated as the
field separator. Older behavior was less definite; the symbol was treated
as both field separator and numeric separator, simultaneously. This
environment variable enables the old behavior.GNUSORT_COMPATIBLE_BLANKS
The sort
utility shall exit with one of
the following values:
-c
or -C
, the input file
already met the sorting criteria.-c
or
-C
options.The sort
utility is compliant with the
IEEE Std 1003.1-2008 (“POSIX.1”)
specification.
The flags [-ghRMSsTVz
] are extensions to
the POSIX specification.
All long options are extensions to the specification, some of them are provided for compatibility with GNU versions and some of them are own extensions.
The old key notations
+
pos1 and
-
pos2 come from older versions
of sort
and are still supported but their use is
highly discouraged.
A sort
command first appeared in
Version 1 AT&T UNIX.
Gabor Kovesdan <gabor@FreeBSD.org>,
Oleg Moskalenko
<mom040267@gmail.com>
This implementation of sort
has no limits
on input line length (other than imposed by available memory) or any
restrictions on bytes allowed within lines.
The performance depends highly on locale settings, efficient
choice of sort keys and key complexity. The fastest sort is with locale C,
on whole lines, with option -s
. In general, locale C
is the fastest, then single-byte locales follow and multi-byte locales as
the slowest but the correct collation order is always respected. As for the
key specification, the simpler to process the lines the faster the search
will be.
When sorting by arithmetic value, using -n
results in much better performance than -g
so its
use is encouraged whenever possible.
September 4, 2019 | Mac OS X 12 |