Linux篇-文本三剑客grep/sed/awk

1.png

背景

在三个工具是我们在linux下处理文本比较常用的工具,grep主打查找功能,sed主要是编辑,awk主要是分割处理。

grep

grep是一个最初用于Unix操作系统的命令行工具。在给出文件列表或标准输入后,grep会对匹配一个或多个正则表达式的文本进行搜索,并只输出匹配(或者不匹配)的行或文本。grep原先是ed下的一个应用程序,名称来自于g/re/p(globally search a regular expression and print,以正则表达式进行全局查找以及打印)。在ed下,输入g/re/p这个命令后,会将所有符合先定义样式的字符串,以行为单位打印出来。

格式

1
2
grep [-abcdDEFGHhIiJLlMmnOopqRSsUVvwXxZz] [-A num] [-B num] [-C[num]] [-e pattern] [-f file] [--binary-files=value] [--color[=when]] [--colour[=when]]
[--context[=num]] [--label] [--line-buffered] [--null] [pattern] [file ...]

参数

  • [-A num] 显示匹配后的行后面num行数

    1
    grep 'a' -A2 text.txt
  • [-B num] 显示匹配后的行前面num行数

    1
    grep 'a' -B2 text.txt
  • [-c[num]] 输出匹配的行数

    1
    grep 'a' -c text.txt
  • [-e pattern] 实现多个匹配的 或 关系

    1
    grep -e 'a' -e 'c' text.txt
  • [-O] 仅仅显示匹配的行

    1
    grep 'a' -O text.txt
  • [-w] 全匹配整个单词

    1
    grep 'aaaaaaaaa' -w text.txt
  • [-f file] 指定文件里面的字符串作为匹配(注意此方式无法使用正则进行匹配)

    1
    grep -f f.txt -w text.txt
  • [–binary-files=value]

  • [–color[=when]] [–colour[=when]] 设置匹配后的字符颜色(需要配置好GREP_COLOR环境变量)

  • [–context[=num]] 显示匹配后的前后多少行

    1
    grep 'aa' --context=1 text.txt
  • [–label] 打印标签作为文件名的标准输入(主要用于管道处理)

    1
    cat text.txt |grep --label=test -H 123
  • [–line-buffered] 刷新输出的每一行

    1
    top | grep '' --line-buffered
  • [–null]

  • [pattern] 正则表达式

    1
    ...
  • [file …] 查询的目标文件

    1
    grep 'a' text.txt 
  • [-i] 不区分大小写

    1
    grep 'a' -i text.txt

案例

man

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
GREP(1) 		     General Commands Manual			 GREP(1)

NAME
grep, egrep, fgrep, rgrep, bzgrep, bzegrep, bzfgrep, zgrep, zegrep, zfgrep
– file pattern searcher

SYNOPSIS
grep [-abcdDEFGHhIiJLlMmnOopqRSsUVvwXxZz] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color[=when]]
[--colour[=when]] [--context[=num]] [--label] [--line-buffered]
[--null] [pattern] [file ...]

DESCRIPTION
The grep utility searches any given input files, selecting lines that match
one or more patterns. By default, a pattern matches an input line if the
regular expression (RE) in the pattern matches the input line without its
trailing newline. An empty expression matches every line. Each input line
that matches at least one of the patterns is written to the standard
output.

grep is used for simple patterns and basic regular expressions (BREs);
egrep can handle extended regular expressions (EREs). See re_format(7) for
more information on regular expressions. fgrep is quicker than both grep
and egrep, but can only handle fixed patterns (i.e., it does not interpret
regular expressions). Patterns may consist of one or more lines, allowing
any of the pattern lines to match a portion of the input.

zgrep, zegrep, and zfgrep act like grep, egrep, and fgrep, respectively,
but accept input files compressed with the compress(1) or gzip(1)
compression utilities. bzgrep, bzegrep, and bzfgrep act like grep, egrep,
and fgrep, respectively, but accept input files compressed with the
bzip2(1) compression utility.

The following options are available:

-A num, --after-context=num
Print num lines of trailing context after each match. See also the
-B and -C options.

-a, --text
Treat all files as ASCII text. Normally grep will simply print
Binary file ... matches” if files contain binary characters. Use
of this option forces grep to output lines matching the specified
pattern.

-B num, --before-context=num
Print num lines of leading context before each match. See also the
-A and -C options.

-b, --byte-offset
The offset in bytes of a matched pattern is displayed in front of
the respective matched line.

-C[num], --context[=num]
Print num lines of leading and trailing context surrounding each
match. The default value of num is2and is equivalent to-A 2
-B 2”. Note: no whitespace may be given between the option and its
argument.

-c, --count
Only a count of selected lines is written to standard output.

--colour=[when], --color=[when]
Mark up the matching text with the expression stored in the
GREP_COLOR environment variable. The possible values of when are
“never”, “always” and “auto”.

-D action, --devices=action
Specify the demanded action for devices, FIFOs and sockets. The
default action is “read”, which means, that they are read as if
they were normal files. If the action is set toskip”, devices
are silently skipped.

-d action, --directories=action
Specify the demanded action for directories. It is “read” by
default, which means that the directories are read in the same
manner as normal files. Other possible values areskipto
silently ignore the directories, and “recurse” to read them
recursively, which has the same effect as the -R and -r option.

-E, --extended-regexp
Interpret pattern as an extended regular expression (i.e., force
grep to behave as egrep).

-e pattern, --regexp=pattern
Specify a pattern used during the search of the input: an input
line is selected if it matches any of the specified patterns. This
option is most useful when multiple -e options are used to specify
multiple patterns, or when a pattern begins with a dash (‘-’).

--exclude pattern
If specified, it excludes files matching the given filename pattern
from the search. Note that --exclude and --include patterns are
processed in the order given. If a name matches multiple patterns,
the latest matching rule wins. If no --include pattern is
specified, all files are searched that are not excluded. Patterns
are matched to the full path specified, not only to the filename
component.

--exclude-dir pattern
If -R is specified, it excludes directories matching the given
filename pattern from the search. Note that --exclude-dir and
--include-dir patterns are processed in the order given. If a name
matches multiple patterns, the latest matching rule wins. If no
--include-dir pattern is specified, all directories are searched
that are not excluded.

-F, --fixed-strings
Interpret pattern as a set of fixed strings (i.e., force grep to
behave as fgrep).

-f file, --file=file
Read one or more newline separated patterns from file. Empty
pattern lines match every input line. Newlines are not considered
part of a pattern. If file is empty, nothing is matched.

-G, --basic-regexp
Interpret pattern as a basic regular expression (i.e., force grep
to behave as traditional grep).

-H Always print filename headers with output lines.

-h, --no-filename
Never print filename headers (i.e., filenames) with output lines.

--help Print a brief help message.

-I Ignore binary files. This option is equivalent to the
--binary-file=without-match” option.

-i, --ignore-case
Perform case insensitive matching. By default, grep is case
sensitive.

--include pattern
If specified, only files matching the given filename pattern are
searched. Note that --include and --exclude patterns are processed
in the order given. If a name matches multiple patterns, the
latest matching rule wins. Patterns are matched to the full path
specified, not only to the filename component.

--include-dir pattern
If -R is specified, only directories matching the given filename
pattern are searched. Note that --include-dir and --exclude-dir
patterns are processed in the order given. If a name matches
multiple patterns, the latest matching rule wins.

-J, --bz2decompress
Decompress the bzip2(1) compressed file before looking for the
text.

-L, --files-without-match
Only the names of files not containing selected lines are written
to standard output. Pathnames are listed once per file searched.
If the standard input is searched, the string “(standard input)” is
written unless a --label is specified.

-l, --files-with-matches
Only the names of files containing selected lines are written to
standard output. grep will only search a file until a match has
been found, making searches potentially less expensive. Pathnames
are listed once per file searched. If the standard input is
searched, the string “(standard input)” is written unless a --label
is specified.

--label
Label to use in place of “(standard input)” for a file name where a
file name would normally be printed. This option applies to -H,
-L, and -l.

--mmap Use mmap(2) instead of read(2) to read input, which can result in
better performance under some circumstances but can cause undefined
behaviour.

-M, --lzma
Decompress the LZMA compressed file before looking for the text.

-m num, --max-count=num
Stop reading the file after num matches.

-n, --line-number
Each output line is preceded by its relative line number in the
file, starting at line 1. The line number counter is reset for
each file processed. This option is ignored if -c, -L, -l, or -q
is specified.

--null Prints a zero-byte after the file name.

-O If -R is specified, follow symbolic links only if they were
explicitly listed on the command line. The default is not to
follow symbolic links.

-o, --only-matching
Prints only the matching part of the lines.

-p If -R is specified, no symbolic links are followed. This is the
default.

-q, --quiet, --silent
Quiet mode: suppress normal output. grep will only search a file
until a match has been found, making searches potentially less
expensive.

-R, -r, --recursive
Recursively search subdirectories listed. (i.e., force grep to
behave as rgrep).

-S If -R is specified, all symbolic links are followed. The default
is not to follow symbolic links.

-s, --no-messages
Silent mode. Nonexistent and unreadable files are ignored (i.e.,
their error messages are suppressed).

-U, --binary
Search binary files, but do not attempt to print them.

-u This option has no effect and is provided only for compatibility
with GNU grep.

-V, --version
Display version information and exit.

-v, --invert-match
Selected lines are those not matching any of the specified
patterns.

-w, --word-regexp
The expression is searched for as a word (as if surrounded by
‘[[:<:]]’ and ‘[[:>:]]’; see re_format(7)). This option has no
effect if -x is also specified.

-x, --line-regexp
Only input lines selected against an entire fixed string or regular
expression are considered to be matching lines.

-y Equivalent to -i. Obsoleted.

-z, --null-data
Treat input and output data as sequences of lines terminated by a
zero-byte instead of a newline.

-X, --xz
Decompress the xz(1) compressed file before looking for the text.

-Z, --decompress
Force grep to behave as zgrep.

--binary-files=value
Controls searching and printing of binary files. Options are:
binary (default) Search binary files but do not print them.
without-match Do not search binary files.
text Treat all files as text.

--line-buffered
Force output to be line buffered. By default, output is line
buffered when standard output is a terminal and block buffered
otherwise.

If no file arguments are specified, the standard input is used.
Additionally, “-” may be used in place of a file name, anywhere that a file
name is accepted, to read from standard input. This includes both -f and
file arguments.

ENVIRONMENT
GREP_OPTIONS May be used to specify default options that will be placed at
the beginning of the argument list. Backslash-escaping is
not supported, unlike the behavior in GNU grep.

EXIT STATUS
The grep utility exits with one of the following values:

0 One or more lines were selected.
1 No lines were selected.
>1 An error occurred.

EXAMPLES
- Find all occurrences of the pattern ‘patricia’ in a file:

$ grep 'patricia' myfile

- Same as above but looking only for complete words:

$ grep -w 'patricia' myfile

- Count occurrences of the exact pattern ‘FOO’ :

$ grep -c FOO myfile

- Same as above but ignoring case:

$ grep -c -i FOO myfile

- Find all occurrences of the pattern ‘.Pp’ at the beginning of a line:

$ grep '^\.Pp' myfile

The apostrophes ensure the entire expression is evaluated by grep
instead of by the user's shell. The caret ‘^’ matches the null string
at the beginning of a line, and the ‘\’ escapes the ‘.’, which would
otherwise match any character.

- Find all lines in a file which do not contain the words ‘foo’ or ‘bar’:

$ grep -v -e 'foo' -e 'bar' myfile

- Peruse the file ‘calendar’ looking for either 19, 20, or 25 using
extended regular expressions:

$ egrep '19|20|25' calendar

- Show matching lines and the name of the ‘*.h’ files which contain the
pattern ‘FIXME’. Do the search recursively from the /usr/src/sys/arm
directory

$ grep -H -R FIXME --include=*.h /usr/src/sys/arm/

- Same as above but show only the name of the matching file:

$ grep -l -R FIXME --include=*.h /usr/src/sys/arm/

- Show lines containing the text ‘foo’. The matching part of the output
is colored and every line is prefixed with the line number and the
offset in the file for those lines that matched.

$ grep -b --colour -n foo myfile

- Show lines that match the extended regular expression patterns read
from the standard input:

$ echo -e 'Free\nBSD\nAll.*reserved' | grep -E -f - myfile

- Show lines from the output of the pciconf(8) command matching the
specified extended regular expression along with three lines of leading
context and one line of trailing context:

$ pciconf -lv | grep -B3 -A1 -E 'class.*=.*storage'

- Suppress any output and use the exit status to show an appropriate
message:

$ grep -q foo myfile && echo File matches

SEE ALSO
bzip2(1), compress(1), ed(1), ex(1), gzip(1), sed(1), xz(1), zgrep(1),
re_format(7)

STANDARDS
The grep utility is compliant with the IEEE Std 1003.1-2008 (“POSIX.1”)
specification.

The flags [-AaBbCDdGHhILmoPRSUVw] are extensions to that specification, and
the behaviour of the -f flag when used with an empty pattern file is left
undefined.

All long options are provided for compatibility with GNU versions of this
utility.

Historic versions of the grep utility also supported the flags [-ruy].
This implementation supports those options; however, their use is strongly
discouraged.

HISTORY
The grep command first appeared in Version 6 AT&T UNIX.

BUGS
The grep utility does not normalize Unicode input, so a pattern containing
composed characters will not match decomposed input, and vice versa.

sed

sed(意为流编辑器,源自英语“stream editor”的缩写)是一个使用简单紧凑的编程语言来解析和转换文本Unix实用程序。

格式

1
2
sed [-Ealnru] command [-I extension] [-i extension] [file ...]
sed [-Ealnru] [-e command] [-f command_file] [-I extension] [-i extension][file ...]

参数

  • [-a]
  • [-e] 多点编辑,对每行处理时,可以有多个Script

案例

man

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
SED(1)			     General Commands Manual			  SED(1)

NAME
sed – stream editor

SYNOPSIS
sed [-Ealnru] command [-I extension] [-i extension] [file ...]
sed [-Ealnru] [-e command] [-f command_file] [-I extension] [-i extension]
[file ...]

DESCRIPTION
The sed utility reads the specified files, or the standard input if no
files are specified, modifying the input as specified by a list of
commands. The input is then written to the standard output.

A single command may be specified as the first argument to sed. Multiple
commands may be specified by using the -e or -f options. All commands are
applied to the input in the order they are specified regardless of their
origin.

The following options are available:

-E Interpret regular expressions as extended (modern) regular
expressions rather than basic regular expressions (BRE's). The
re_format(7) manual page fully describes both formats.

-a The files listed as parameters for the “w” functions are created
(or truncated) before any processing begins, by default. The -a
option causes sed to delay opening each file until a command
containing the related “w” function is applied to a line of input.

-e command
Append the editing commands specified by the command argument to
the list of commands.

-f command_file
Append the editing commands found in the file command_file to the
list of commands. The editing commands should each be listed on a
separate line. The commands are read from the standard input if
command_file is “-”.

-I extension
Edit files in-place, saving backups with the specified extension.
If a zero-length extension is given, no backup will be saved. It
is not recommended to give a zero-length extension when in-place
editing files, as you risk corruption or partial content in
situations where disk space is exhausted, etc.

Note that in-place editing with -I still takes place in a single
continuous line address space covering all files, although each
file preserves its individuality instead of forming one output
stream. The line counter is never reset between files, address
ranges can span file boundaries, and the “$” address matches only
the last line of the last file. (See Sed Addresses.) That can lead
to unexpected results in many cases of in-place editing, where
using -i is desired.

-i extension
Edit files in-place similarly to -I, but treat each file
independently from other files. In particular, line numbers in
each file start at 1, the “$” address matches the last line of the
current file, and address ranges are limited to the current file.
(See Sed Addresses.) The net result is as though each file were
edited by a separate sed instance.

-l Make output line buffered.

-n By default, each line of input is echoed to the standard output
after all of the commands have been applied to it. The -n option
suppresses this behavior.

-r Same as -E for compatibility with GNU sed.

-u Make output unbuffered.

The form of a sed command is as follows:

[address[,address]]function[arguments]

Whitespace may be inserted before the first address and the function
portions of the command.

Normally, sed cyclically copies a line of input, not including its
terminating newline character, into a pattern space, (unless there is
something left after a “D” function), applies all of the commands with
addresses that select that pattern space, copies the pattern space to the
standard output, appending a newline, and deletes the pattern space.

Some of the functions use a hold space to save all or part of the pattern
space for subsequent retrieval.

Sed Addresses
An address is not required, but if specified must have one of the following
formats:

• a number that counts input lines cumulatively across input files
(or in each file independently if a -i option is in effect);

• a dollar (“$”) character that addresses the last line of input
(or the last line of the current file if a -i option was
specified);

• a context address that consists of a regular expression preceded
and followed by a delimiter. The closing delimiter can also
optionally be followed by the “I” character, to indicate that the
regular expression is to be matched in a case-insensitive way.

A command line with no addresses selects every pattern space.

A command line with one address selects all of the pattern spaces that
match the address.

A command line with two addresses selects an inclusive range. This range
starts with the first pattern space that matches the first address. The
end of the range is the next following pattern space that matches the
second address. If the second address is a number less than or equal to
the line number first selected, only that line is selected. The number in
the second address may be prefixed with a (“+”) to specify the number of
lines to match after the first pattern. In the case when the second
address is a context address, sed does not re-match the second address
against the pattern space that matched the first address. Starting at the
first line following the selected range, sed starts looking again for the
first address.

Editing commands can be applied to non-selected pattern spaces by use of
the exclamation character (“!”) function.

Sed Regular Expressions
The regular expressions used in sed, by default, are basic regular
expressions (BREs, see re_format(7) for more information), but extended
(modern) regular expressions can be used instead if the -E flag is given.
In addition, sed has the following two additions to regular expressions:

1. In a context address, any character other than a backslash (“\”) or
newline character may be used to delimit the regular expression. The
opening delimiter needs to be preceded by a backslash unless it is a
slash. For example, the context address \xabcx is equivalent to
/abc/. Also, putting a backslash character before the delimiting
character within the regular expression causes the character to be
treated literally. For example, in the context address \xabc\xdefx,
the RE delimiter is an “x” and the second “x” stands for itself, so
that the regular expression is “abcxdef”.

2. The escape sequence \n matches a newline character embedded in the
pattern space. You cannot, however, use a literal newline character
in an address or in the substitute command.

One special feature of sed regular expressions is that they can default to
the last regular expression used. If a regular expression is empty, i.e.,
just the delimiter characters are specified, the last regular expression
encountered is used instead. The last regular expression is defined as the
last regular expression used as part of an address or substitute command,
and at run-time, not compile-time. For example, the command “/abc/s//XXX/”
will substitute “XXX” for the pattern “abc”.

Sed Functions
In the following list of commands, the maximum number of permissible
addresses for each command is indicated by [0addr], [1addr], or [2addr],
representing zero, one, or two addresses.

The argument text consists of one or more lines. To embed a newline in the
text, precede it with a backslash. Other backslashes in text are deleted
and the following character taken literally.

The “r” and “w” functions take an optional file parameter, which should be
separated from the function letter by white space. Each file given as an
argument to sed is created (or its contents truncated) before any input
processing begins.

The “b”, “r”, “s”, “t”, “w”, “y”, “!”, and “:” functions all accept
additional arguments. The following synopses indicate which arguments have
to be separated from the function letters by white space characters.

Two of the functions take a function-list. This is a list of sed functions
separated by newlines, as follows:

{ function
function
...
function
}

The “{” can be preceded by white space and can be followed by white space.
The function can be preceded by white space. The terminating “}” must be
preceded by a newline, and may also be preceded by white space.

[2addr] function-list
Execute function-list only when the pattern space is selected.

[1addr]a\
text Write text to standard output immediately before each attempt to
read a line of input, whether by executing the “N” function or by
beginning a new cycle.

[2addr]b[label]
Branch to the “:” function with the specified label. If the label
is not specified, branch to the end of the script.

[2addr]c\
text Delete the pattern space. With 0 or 1 address or at the end of a
2-address range, text is written to the standard output.

[2addr]d
Delete the pattern space and start the next cycle.

[2addr]D
Delete the initial segment of the pattern space through the first
newline character and start the next cycle.

[2addr]g
Replace the contents of the pattern space with the contents of the
hold space.

[2addr]G
Append a newline character followed by the contents of the hold
space to the pattern space.

[2addr]h
Replace the contents of the hold space with the contents of the
pattern space.

[2addr]H
Append a newline character followed by the contents of the pattern
space to the hold space.

[1addr]i\
text Write text to the standard output.

[2addr]l
(The letter ell.) Write the pattern space to the standard output
in a visually unambiguous form. This form is as follows:

backslash \\
alert \a
form-feed \f
carriage-return \r
tab \t
vertical tab \v

Nonprintable characters are written as three-digit octal numbers
(with a preceding backslash) for each byte in the character (most
significant byte first). Long lines are folded, with the point of
folding indicated by displaying a backslash followed by a newline.
The end of each line is marked with a “$”.

[2addr]n
Write the pattern space to the standard output if the default
output has not been suppressed, and replace the pattern space with
the next line of input.

[2addr]N
Append the next line of input to the pattern space, using an
embedded newline character to separate the appended material from
the original contents. Note that the current line number changes.

[2addr]p
Write the pattern space to standard output.

[2addr]P
Write the pattern space, up to the first newline character to the
standard output.

[1addr]q
Branch to the end of the script and quit without starting a new
cycle.

[1addr]r file
Copy the contents of file to the standard output immediately before
the next attempt to read a line of input. If file cannot be read
for any reason, it is silently ignored and no error condition is
set.

[2addr]s/regular expression/replacement/flags
Substitute the replacement string for the first instance of the
regular expression in the pattern space. Any character other than
backslash or newline can be used instead of a slash to delimit the
RE and the replacement. Within the RE and the replacement, the RE
delimiter itself can be used as a literal character if it is
preceded by a backslash.

An ampersand (“&”) appearing in the replacement is replaced by the
string matching the RE. The special meaning of “&” in this context
can be suppressed by preceding it by a backslash. The string “\#”,
where “#” is a digit, is replaced by the text matched by the
corresponding backreference expression (see re_format(7)).

A line can be split by substituting a newline character into it.
To specify a newline character in the replacement string, precede
it with a backslash.

The value of flags in the substitute function is zero or more of
the following:

N Make the substitution only for the N'th occurrence of
the regular expression in the pattern space.

g Make the substitution for all non-overlapping matches
of the regular expression, not just the first one.

p Write the pattern space to standard output if a
replacement was made. If the replacement string is
identical to that which it replaces, it is still
considered to have been a replacement.

w file Append the pattern space to file if a replacement was
made. If the replacement string is identical to that
which it replaces, it is still considered to have
been a replacement.

i or I Match the regular expression in a case-insensitive
way.

[2addr]t [label]
Branch to the “:” function bearing the label if any substitutions
have been made since the most recent reading of an input line or
execution of a “t” function. If no label is specified, branch to
the end of the script.

[2addr]w file
Append the pattern space to the file.

[2addr]x
Swap the contents of the pattern and hold spaces.

[2addr]y/string1/string2/
Replace all occurrences of characters in string1 in the pattern
space with the corresponding characters from string2. Any
character other than a backslash or newline can be used instead of
a slash to delimit the strings. Within string1 and string2, a
backslash followed by any character other than a newline is that
literal character, and a backslash followed by an ``n'' is replaced
by a newline character.

[2addr]!function
[2addr]!function-list
Apply the function or function-list only to the lines that are not
selected by the address(es).

[0addr]:label
This function does nothing; it bears a label to which the “b” and
“t” commands may branch.

[1addr]=
Write the line number to the standard output followed by a newline
character.

[0addr]
Empty lines are ignored.

[0addr]#
The “#” and the remainder of the line are ignored (treated as a
comment), with the single exception that if the first two
characters in the file are “#n”, the default output is suppressed.
This is the same as specifying the -n option on the command line.

ENVIRONMENT
The COLUMNS, LANG, LC_ALL, LC_CTYPE and LC_COLLATE environment variables
affect the execution of sed as described in environ(7).

EXIT STATUS
The sed utility exits 0 on success, and >0 if an error occurs.

EXAMPLES
Replace ‘bar’ with ‘baz’ when piped from another command:

echo "An alternate word, like bar, is sometimes used in examples." | sed 's/bar/baz/'

Using backlashes can sometimes be hard to read and follow:

echo "/home/example" | sed 's/\/home\/example/\/usr\/local\/example/'

Using a different separator can be handy when working with paths:

echo "/home/example" | sed 's#/home/example#/usr/local/example#'

Replace all occurances of ‘foo’ with ‘bar’ in the file test.txt, without
creating a backup of the file:

sed -i '' -e 's/foo/bar/g' test.txt

SEE ALSO
awk(1), ed(1), grep(1), regex(3), re_format(7)

STANDARDS
The sed utility is expected to be a superset of the IEEE Std 1003.2
(“POSIX.2”) specification.

The -E, -I, -a and -i options, the special meaning of -f -, the prefixing
“+” in the second member of an address range, as well as the “I” flag to
the address regular expression and substitution command are non-standard
FreeBSD extensions and may not be available on other operating systems.

HISTORY
A sed command, written by L. E. McMahon, appeared in Version 7 AT&T UNIX.

AUTHORS
Diomidis D. Spinellis <dds@FreeBSD.org>

BUGS
Multibyte characters containing a byte with value 0x5C (ASCII ‘\’) may be
incorrectly treated as line continuation characters in arguments to the
“a”, “c” and “i” commands. Multibyte characters cannot be used as
delimiters with the “s” and “y” commands.

awk

AWK是一种处理文本文件的语言。它将文件作为记录序列处理。在一般情况下,文件内容的每行都是一个记录。每行内容都会被分割成一系列的域,因此,我们可以认为一行的第一个词为第一个域,第二个词为第二个,以此类推。AWK程序是由一些处理特定模式的语句块构成的。AWK一次可以读取一个输入行。对每个输入行,AWK解释器会判断它是否符合程序中出现的各个模式,并执行符合的模式所对应的动作。

格式

1
awk [ -F fs ] [ -v var=value ] [ 'prog' | -f progfile ] [ file ...  ]

参数

案例

man

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
AWK(1)			     General Commands Manual			  AWK(1)



NAME
awk - pattern-directed scanning and processing language

SYNOPSIS
awk [ -F fs ] [ -v var=value ] [ 'prog' | -f progfile ] [ file ... ]

DESCRIPTION
Awk scans each input file for lines that match any of a set of patterns
specified literally in prog or in one or more files specified as -f
progfile. With each pattern there can be an associated action that will
be performed when a line of a file matches the pattern. Each line is
matched against the pattern portion of every pattern-action statement;
the associated action is performed for each matched pattern. The file
name - means the standard input. Any file of the form var=value is
treated as an assignment, not a filename, and is executed at the time it
would have been opened if it were a filename. The option -v followed by
var=value is an assignment to be done before prog is executed; any number
of -v options may be present. The -F fs option defines the input field
separator to be the regular expression fs.

An input line is normally made up of fields separated by white space, or
by the regular expression FS. The fields are denoted $1, $2, ..., while
$0 refers to the entire line. If FS is null, the input line is split
into one field per character.

A pattern-action statement has the form:

pattern { action }

A missing { action } means print the line; a missing pattern always
matches. Pattern-action statements are separated by newlines or
semicolons.

An action is a sequence of statements. A statement can be one of the
following:

if( expression ) statement [ else statement ]
while( expression ) statement
for( expression ; expression ; expression ) statement
for( var in array ) statement
do statement while( expression )
break
continue
{ [ statement ... ] }
expression # commonly var = expression
print [ expression-list ] [ > expression ]
printf format [ , expression-list ] [ > expression ]
return [ expression ]
next # skip remaining patterns on this input line
nextfile # skip rest of this file, open next, start at top
delete array[ expression ]# delete an array element
delete array # delete all elements of array
exit [ expression ] # exit immediately; status is expression

Statements are terminated by semicolons, newlines or right braces. An
empty expression-list stands for $0. String constants are quoted " ",
with the usual C escapes recognized within. Expressions take on string
or numeric values as appropriate, and are built using the operators + - *
/ % ^ (exponentiation), and concatenation (indicated by white space).
The operators ! ++ -- += -= *= /= %= ^= > >= < <= == != ?: are also
available in expressions. Variables may be scalars, array elements
(denoted x[i]) or fields. Variables are initialized to the null string.
Array subscripts may be any string, not necessarily numeric; this allows
for a form of associative memory. Multiple subscripts such as [i,j,k]
are permitted; the constituents are concatenated, separated by the value
of SUBSEP.

The print statement prints its arguments on the standard output (or on a
file if > file or >> file is present or on a pipe if | cmd is
present), separated by the current output field separator, and terminated
by the output record separator. file and cmd may be literal names or
parenthesized expressions; identical string values in different
statements denote the same open file. The printf statement formats its
expression list according to the format (see printf(3)). The built-in
function close(expr) closes the file or pipe expr. The built-in function
fflush(expr) flushes any buffered output for the file or pipe expr.

The mathematical functions atan2, cos, exp, log, sin, and sqrt are built
in. Other built-in functions:


length
the length of its argument taken as a string, number of elements in
an array for an array argument, or length of $0 if no argument.
rand random number on [0,1).
srand
sets seed for rand and returns the previous seed.
int truncates to an integer value.
substr(s, m [, n])
the n-character substring of s that begins at position m counted
from 1. If no n, use the rest of the string.
index(s, t)
the position in s where the string t occurs, or 0 if it does not.
match(s, r)
the position in s where the regular expression r occurs, or 0 if it
does not. The variables RSTART and RLENGTH are set to the position
and length of the matched string.
split(s, a [, fs])
splits the string s into array elements a[1], a[2], ..., a[n], and
returns n. The separation is done with the regular expression fs or
with the field separator FS if fs is not given. An empty string as
field separator splits the string into one array element per
character.
sub(r, t [, s])
substitutes t for the first occurrence of the regular expression r
in the string s. If s is not given, $0 is used.
gsub(r, t [, s])
same as sub except that all occurrences of the regular expression
are replaced; sub and gsub return the number of replacements.
sprintf(fmt, expr, ...)
the string resulting from formatting expr ... according to the
printf(3) format fmt.
system(cmd)
executes cmd and returns its exit status. This will be -1 upon
error, cmd's exit status upon a normal exit, 256 + sig upon death-
by-signal, where sig is the number of the murdering signal, or 512 +
sig if there was a core dump.
tolower(str)
returns a copy of str with all upper-case characters translated to
their corresponding lower-case equivalents.
toupper(str)
returns a copy of str with all lower-case characters translated to
their corresponding upper-case equivalents.

The ``function'' getline sets $0 to the next input record from the
current input file; getline < file sets $0 to the next record from file.
getline x sets variable x instead. Finally, cmd | getline pipes the
output of cmd into getline; each call of getline returns the next line of
output from cmd. In all cases, getline returns 1 for a successful input,
0 for end of file, and -1 for an error.

Patterns are arbitrary Boolean combinations (with ! || &&) of regular
expressions and relational expressions. Regular expressions are as
defined in re_format(7). Isolated regular expressions in a pattern apply
to the entire line. Regular expressions may also occur in relational
expressions, using the operators ~ and !~. /re/ is a constant regular
expression; any string (constant or variable) may be used as a regular
expression, except in the position of an isolated regular expression in a
pattern.

A pattern may consist of two patterns separated by a comma; in this case,
the action is performed for all lines from an occurrence of the first
pattern though an occurrence of the second.

A relational expression is one of the following:

expression matchop regular-expression
expression relop expression
expression in array-name
(expr,expr,...) in array-name

where a relop is any of the six relational operators in C, and a matchop
is either ~ (matches) or !~ (does not match). A conditional is an
arithmetic expression, a relational expression, or a Boolean combination
of these.

The special patterns BEGIN and END may be used to capture control before
the first input line is read and after the last. BEGIN and END do not
combine with other patterns. They may appear multiple times in a program
and execute in the order they are read by awk.

Variable names with special meanings:


ARGC argument count, assignable.
ARGV argument array, assignable; non-null members are taken as filenames.
CONVFMT
conversion format used when converting numbers (default %.6g).
ENVIRON
array of environment variables; subscripts are names.
FILENAME
the name of the current input file.
FNR ordinal number of the current record in the current file.
FS regular expression used to separate fields; also settable by option
-Ffs.
NF number of fields in the current record.
NR ordinal number of the current record.
OFMT output format for numbers (default %.6g).
OFS output field separator (default space).
ORS output record separator (default newline).
RLENGTH
the length of a string matched by match.
RS input record separator (default newline). If empty, blank lines
separate records. If more than one character long, RS is treated as
a regular expression, and records are separated by text matching the
expression.
RSTART
the start position of a string matched by match.
SUBSEP
separates multiple subscripts (default 034).

Functions may be defined (at the position of a pattern-action statement)
thus:

function foo(a, b, c) { ...; return x }

Parameters are passed by value if scalar and by reference if array name;
functions may be called recursively. Parameters are local to the
function; all other variables are global. Thus local variables may be
created by providing excess parameters in the function definition.

ENVIRONMENT VARIABLES
If POSIXLY_CORRECT is set in the environment, then awk follows the POSIX
rules for sub and gsub with respect to consecutive backslashes and
ampersands.

EXAMPLES
length($0) > 72
Print lines longer than 72 characters.
{ print $2, $1 }
Print first two fields in opposite order.

BEGIN { FS = ",[ \t]*|[ \t]+" }
{ print $2, $1 }

Same, with input fields separated by comma and/or spaces and tabs.

{ s += $1 }
END { print "sum is", s, " average is", s/NR }

Add up first column, print sum and average.
/start/, /stop/
Print all lines between start/stop pairs.

BEGIN { # Simulate echo(1)
for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
printf "\n"
exit }

SEE ALSO
grep(1), lex(1), sed(1)
A. V. Aho, B. W. Kernighan, P. J. Weinberger, The AWK Programming
Language, Addison-Wesley, 1988. ISBN 0-201-07981-X.

BUGS
There are no explicit conversions between numbers and strings. To force
an expression to be treated as a number add 0 to it; to force it to be
treated as a string concatenate "" to it.

The scope rules for variables in functions are a botch; the syntax is
worse.

Only eight-bit characters sets are handled correctly.



2020-11-24 AWK(1)

资料

  1. https://zh.wikipedia.org/wiki/Grep
  2. https://zh.wikipedia.org/wiki/Sed
  3. https://zh.wikipedia.org/wiki/AWK

Linux篇-文本三剑客grep/sed/awk
https://mikeygithub.github.io/2020/10/10/yuque/Linux篇-文本三剑客grep!sed!awk/
作者
Mikey
发布于
2020年10月10日
许可协议