背景

在三个工具是我们在linux下处理文本比较常用的工具，grep主打查找功能，sed主要是编辑，awk主要是分割处理。

grep

grep是一个最初用于Unix操作系统的命令行工具。在给出文件列表或标准输入后，grep会对匹配一个或多个正则表达式的文本进行搜索，并只输出匹配（或者不匹配）的行或文本。grep原先是ed下的一个应用程序，名称来自于g/re/p（globally search a regular expression and print，以正则表达式进行全局查找以及打印）。在ed下，输入g/re/p这个命令后，会将所有符合先定义样式的字符串，以行为单位打印出来。

格式

1
2

grep [-abcdDEFGHhIiJLlMmnOopqRSsUVvwXxZz] [-A num] [-B num] [-C[num]] [-e pattern] [-f file] [--binary-files=value] [--color[=when]] [--colour[=when]]
          [--context[=num]] [--label] [--line-buffered] [--null] [pattern] [file ...]

参数

[-A num] 显示匹配后的行后面num行数
1
grep 'a' -A2 text.txt
[-B num] 显示匹配后的行前面num行数
1
grep 'a' -B2 text.txt
[-c[num]] 输出匹配的行数
1
grep 'a' -c text.txt
[-e pattern] 实现多个匹配的或关系
1
grep -e 'a' -e 'c' text.txt
[-O] 仅仅显示匹配的行
1
grep 'a' -O text.txt
[-w] 全匹配整个单词
1
grep 'aaaaaaaaa' -w text.txt
[-f file] 指定文件里面的字符串作为匹配（注意此方式无法使用正则进行匹配）
1
grep -f f.txt -w text.txt
[–binary-files=value]
[–color[=when]] [–colour[=when]] 设置匹配后的字符颜色（需要配置好GREP_COLOR环境变量）
[–context[=num]] 显示匹配后的前后多少行
1
grep 'aa' --context=1 text.txt
[–label] 打印标签作为文件名的标准输入(主要用于管道处理)
1
cat text.txt |grep --label=test -H 123
[–line-buffered] 刷新输出的每一行
1
top | grep '' --line-buffered
[–null]
[pattern] 正则表达式
1
...
[file …] 查询的目标文件
1
grep 'a' text.txt
[-i] 不区分大小写
1
grep 'a' -i text.txt

案例

man

GREP(1) 		     General Commands Manual			 GREP(1)

NAME
     grep, egrep, fgrep, rgrep, bzgrep, bzegrep, bzfgrep, zgrep, zegrep, zfgrep
     – file pattern searcher

SYNOPSIS
     grep [-abcdDEFGHhIiJLlMmnOopqRSsUVvwXxZz] [-A num] [-B num] [-C[num]]
	  [-e pattern] [-f file] [--binary-files=value] [--color[=when]]
	  [--colour[=when]] [--context[=num]] [--label] [--line-buffered]
	  [--null] [pattern] [file ...]

DESCRIPTION
     The grep utility searches any given input files, selecting lines that match
     one or more patterns.  By default, a pattern matches an input line if the
     regular expression (RE) in the pattern matches the input line without its
     trailing newline.	An empty expression matches every line.  Each input line
     that matches at least one of the patterns is written to the standard
     output.

     grep is used for simple patterns and basic regular expressions (BREs);
     egrep can handle extended regular expressions (EREs).  See re_format(7) for
     more information on regular expressions.  fgrep is quicker than both grep
     and egrep, but can only handle fixed patterns (i.e., it does not interpret
     regular expressions).  Patterns may consist of one or more lines, allowing
     any of the pattern lines to match a portion of the input.

     zgrep, zegrep, and zfgrep act like grep, egrep, and fgrep, respectively,
     but accept input files compressed with the compress(1) or gzip(1)
     compression utilities.  bzgrep, bzegrep, and bzfgrep act like grep, egrep,
     and fgrep, respectively, but accept input files compressed with the
     bzip2(1) compression utility.

     The following options are available:

     -A num, --after-context=num
	     Print num lines of trailing context after each match.  See also the
	     -B and -C options.

     -a, --text
	     Treat all files as ASCII text.  Normally grep will simply print
	     “Binary file ... matches” if files contain binary characters.  Use
	     of this option forces grep to output lines matching the specified
	     pattern.

     -B num, --before-context=num
	     Print num lines of leading context before each match.  See also the
	     -A and -C options.

     -b, --byte-offset
	     The offset in bytes of a matched pattern is displayed in front of
	     the respective matched line.

     -C[num], --context[=num]
	     Print num lines of leading and trailing context surrounding each
	     match.  The default value of num is “2” and is equivalent to “-A 2
	     -B 2”.  Note: no whitespace may be given between the option and its
	     argument.

     -c, --count
	     Only a count of selected lines is written to standard output.

     --colour=[when], --color=[when]
	     Mark up the matching text with the expression stored in the
	     GREP_COLOR environment variable.  The possible values of when are
	     “never”, “always” and “auto”.

     -D action, --devices=action
	     Specify the demanded action for devices, FIFOs and sockets.  The
	     default action is “read”, which means, that they are read as if
	     they were normal files.  If the action is set to “skip”, devices
	     are silently skipped.

     -d action, --directories=action
	     Specify the demanded action for directories.  It is “read” by
	     default, which means that the directories are read in the same
	     manner as normal files.  Other possible values are “skip” to
	     silently ignore the directories, and “recurse” to read them
	     recursively, which has the same effect as the -R and -r option.

     -E, --extended-regexp
	     Interpret pattern as an extended regular expression (i.e., force
	     grep to behave as egrep).

     -e pattern, --regexp=pattern
	     Specify a pattern used during the search of the input: an input
	     line is selected if it matches any of the specified patterns.  This
	     option is most useful when multiple -e options are used to specify
	     multiple patterns, or when a pattern begins with a dash (‘-’).

     --exclude pattern
	     If specified, it excludes files matching the given filename pattern
	     from the search.  Note that --exclude and --include patterns are
	     processed in the order given.  If a name matches multiple patterns,
	     the latest matching rule wins.  If no --include pattern is
	     specified, all files are searched that are not excluded.  Patterns
	     are matched to the full path specified, not only to the filename
	     component.

     --exclude-dir pattern
	     If -R is specified, it excludes directories matching the given
	     filename pattern from the search.	Note that --exclude-dir and
	     --include-dir patterns are processed in the order given.  If a name
	     matches multiple patterns, the latest matching rule wins.	If no
	     --include-dir pattern is specified, all directories are searched
	     that are not excluded.

     -F, --fixed-strings
	     Interpret pattern as a set of fixed strings (i.e., force grep to
	     behave as fgrep).

     -f file, --file=file
	     Read one or more newline separated patterns from file.  Empty
	     pattern lines match every input line.  Newlines are not considered
	     part of a pattern.  If file is empty, nothing is matched.

     -G, --basic-regexp
	     Interpret pattern as a basic regular expression (i.e., force grep
	     to behave as traditional grep).

     -H      Always print filename headers with output lines.

     -h, --no-filename
	     Never print filename headers (i.e., filenames) with output lines.

     --help  Print a brief help message.

     -I      Ignore binary files.  This option is equivalent to the
	     “--binary-file=without-match” option.

     -i, --ignore-case
	     Perform case insensitive matching.  By default, grep is case
	     sensitive.

     --include pattern
	     If specified, only files matching the given filename pattern are
	     searched.	Note that --include and --exclude patterns are processed
	     in the order given.  If a name matches multiple patterns, the
	     latest matching rule wins.  Patterns are matched to the full path
	     specified, not only to the filename component.

     --include-dir pattern
	     If -R is specified, only directories matching the given filename
	     pattern are searched.  Note that --include-dir and --exclude-dir
	     patterns are processed in the order given.  If a name matches
	     multiple patterns, the latest matching rule wins.

     -J, --bz2decompress
	     Decompress the bzip2(1) compressed file before looking for the
	     text.

     -L, --files-without-match
	     Only the names of files not containing selected lines are written
	     to standard output.  Pathnames are listed once per file searched.
	     If the standard input is searched, the string “(standard input)” is
	     written unless a --label is specified.

     -l, --files-with-matches
	     Only the names of files containing selected lines are written to
	     standard output.  grep will only search a file until a match has
	     been found, making searches potentially less expensive.  Pathnames
	     are listed once per file searched.  If the standard input is
	     searched, the string “(standard input)” is written unless a --label
	     is specified.

     --label
	     Label to use in place of “(standard input)” for a file name where a
	     file name would normally be printed.  This option applies to -H,
	     -L, and -l.

     --mmap  Use mmap(2) instead of read(2) to read input, which can result in
	     better performance under some circumstances but can cause undefined
	     behaviour.

     -M, --lzma
	     Decompress the LZMA compressed file before looking for the text.

     -m num, --max-count=num
	     Stop reading the file after num matches.

     -n, --line-number
	     Each output line is preceded by its relative line number in the
	     file, starting at line 1.	The line number counter is reset for
	     each file processed.  This option is ignored if -c, -L, -l, or -q
	     is specified.

     --null  Prints a zero-byte after the file name.

     -O      If -R is specified, follow symbolic links only if they were
	     explicitly listed on the command line.  The default is not to
	     follow symbolic links.

     -o, --only-matching
	     Prints only the matching part of the lines.

     -p      If -R is specified, no symbolic links are followed.  This is the
	     default.

     -q, --quiet, --silent
	     Quiet mode: suppress normal output.  grep will only search a file
	     until a match has been found, making searches potentially less
	     expensive.

     -R, -r, --recursive
	     Recursively search subdirectories listed.	(i.e., force grep to
	     behave as rgrep).

     -S      If -R is specified, all symbolic links are followed.  The default
	     is not to follow symbolic links.

     -s, --no-messages
	     Silent mode.  Nonexistent and unreadable files are ignored (i.e.,
	     their error messages are suppressed).

     -U, --binary
	     Search binary files, but do not attempt to print them.

     -u      This option has no effect and is provided only for compatibility
	     with GNU grep.

     -V, --version
	     Display version information and exit.

     -v, --invert-match
	     Selected lines are those not matching any of the specified
	     patterns.

     -w, --word-regexp
	     The expression is searched for as a word (as if surrounded by
	     ‘[[:<:]]’ and ‘[[:>:]]’; see re_format(7)).  This option has no
	     effect if -x is also specified.

     -x, --line-regexp
	     Only input lines selected against an entire fixed string or regular
	     expression are considered to be matching lines.

     -y      Equivalent to -i.	Obsoleted.

     -z, --null-data
	     Treat input and output data as sequences of lines terminated by a
	     zero-byte instead of a newline.

     -X, --xz
	     Decompress the xz(1) compressed file before looking for the text.

     -Z, --decompress
	     Force grep to behave as zgrep.

     --binary-files=value
	     Controls searching and printing of binary files.  Options are:
	     binary (default)  Search binary files but do not print them.
	     without-match     Do not search binary files.
	     text	       Treat all files as text.

     --line-buffered
	     Force output to be line buffered.	By default, output is line
	     buffered when standard output is a terminal and block buffered
	     otherwise.

     If no file arguments are specified, the standard input is used.
     Additionally, “-” may be used in place of a file name, anywhere that a file
     name is accepted, to read from standard input.  This includes both -f and
     file arguments.

ENVIRONMENT
     GREP_OPTIONS  May be used to specify default options that will be placed at
		   the beginning of the argument list.	Backslash-escaping is
		   not supported, unlike the behavior in GNU grep.

EXIT STATUS
     The grep utility exits with one of the following values:

     0	   One or more lines were selected.
     1	   No lines were selected.
     >1    An error occurred.

EXAMPLES
     -	 Find all occurrences of the pattern ‘patricia’ in a file:

	       $ grep 'patricia' myfile

     -	 Same as above but looking only for complete words:

	       $ grep -w 'patricia' myfile

     -	 Count occurrences of the exact pattern ‘FOO’ :

	       $ grep -c FOO myfile

     -	 Same as above but ignoring case:

	       $ grep -c -i FOO myfile

     -	 Find all occurrences of the pattern ‘.Pp’ at the beginning of a line:

	       $ grep '^\.Pp' myfile

	 The apostrophes ensure the entire expression is evaluated by grep
	 instead of by the user's shell.  The caret ‘^’ matches the null string
	 at the beginning of a line, and the ‘\’ escapes the ‘.’, which would
	 otherwise match any character.

     -	 Find all lines in a file which do not contain the words ‘foo’ or ‘bar’:

	       $ grep -v -e 'foo' -e 'bar' myfile

     -	 Peruse the file ‘calendar’ looking for either 19, 20, or 25 using
	 extended regular expressions:

	       $ egrep '19|20|25' calendar

     -	 Show matching lines and the name of the ‘*.h’ files which contain the
	 pattern ‘FIXME’.  Do the search recursively from the /usr/src/sys/arm
	 directory

	       $ grep -H -R FIXME --include=*.h /usr/src/sys/arm/

     -	 Same as above but show only the name of the matching file:

	       $ grep -l -R FIXME --include=*.h /usr/src/sys/arm/

     -	 Show lines containing the text ‘foo’.	The matching part of the output
	 is colored and every line is prefixed with the line number and the
	 offset in the file for those lines that matched.

	       $ grep -b --colour -n foo myfile

     -	 Show lines that match the extended regular expression patterns read
	 from the standard input:

	       $ echo -e 'Free\nBSD\nAll.*reserved' | grep -E -f - myfile

     -	 Show lines from the output of the pciconf(8) command matching the
	 specified extended regular expression along with three lines of leading
	 context and one line of trailing context:

	       $ pciconf -lv | grep -B3 -A1 -E 'class.*=.*storage'

     -	 Suppress any output and use the exit status to show an appropriate
	 message:

	       $ grep -q foo myfile && echo File matches

SEE ALSO
     bzip2(1), compress(1), ed(1), ex(1), gzip(1), sed(1), xz(1), zgrep(1),
     re_format(7)

STANDARDS
     The grep utility is compliant with the IEEE Std 1003.1-2008 (“POSIX.1”)
     specification.

     The flags [-AaBbCDdGHhILmoPRSUVw] are extensions to that specification, and
     the behaviour of the -f flag when used with an empty pattern file is left
     undefined.

     All long options are provided for compatibility with GNU versions of this
     utility.

     Historic versions of the grep utility also supported the flags [-ruy].
     This implementation supports those options; however, their use is strongly
     discouraged.

HISTORY
     The grep command first appeared in Version 6 AT&T UNIX.

BUGS
     The grep utility does not normalize Unicode input, so a pattern containing
     composed characters will not match decomposed input, and vice versa.

sed

sed（意为流编辑器，源自英语“stream editor”的缩写）是一个使用简单紧凑的编程语言来解析和转换文本Unix实用程序。

格式

1 2	`sed [-Ealnru] command [-I extension] [-i extension] [file ...] sed [-Ealnru] [-e command] [-f command_file] [-I extension] [-i extension][file ...]`

参数

[-a]
[-e] 多点编辑，对每行处理时，可以有多个Script

案例

man

SED(1)			     General Commands Manual			  SED(1)

NAME
     sed – stream editor

SYNOPSIS
     sed [-Ealnru] command [-I extension] [-i extension] [file ...]
     sed [-Ealnru] [-e command] [-f command_file] [-I extension] [-i extension]
	 [file ...]

DESCRIPTION
     The sed utility reads the specified files, or the standard input if no
     files are specified, modifying the input as specified by a list of
     commands.	The input is then written to the standard output.

     A single command may be specified as the first argument to sed.  Multiple
     commands may be specified by using the -e or -f options.  All commands are
     applied to the input in the order they are specified regardless of their
     origin.

     The following options are available:

     -E      Interpret regular expressions as extended (modern) regular
	     expressions rather than basic regular expressions (BRE's).  The
	     re_format(7) manual page fully describes both formats.

     -a      The files listed as parameters for the “w” functions are created
	     (or truncated) before any processing begins, by default.  The -a
	     option causes sed to delay opening each file until a command
	     containing the related “w” function is applied to a line of input.

     -e command
	     Append the editing commands specified by the command argument to
	     the list of commands.

     -f command_file
	     Append the editing commands found in the file command_file to the
	     list of commands.	The editing commands should each be listed on a
	     separate line.  The commands are read from the standard input if
	     command_file is “-”.

     -I extension
	     Edit files in-place, saving backups with the specified extension.
	     If a zero-length extension is given, no backup will be saved.  It
	     is not recommended to give a zero-length extension when in-place
	     editing files, as you risk corruption or partial content in
	     situations where disk space is exhausted, etc.

	     Note that in-place editing with -I still takes place in a single
	     continuous line address space covering all files, although each
	     file preserves its individuality instead of forming one output
	     stream.  The line counter is never reset between files, address
	     ranges can span file boundaries, and the “$” address matches only
	     the last line of the last file.  (See Sed Addresses.) That can lead
	     to unexpected results in many cases of in-place editing, where
	     using -i is desired.

     -i extension
	     Edit files in-place similarly to -I, but treat each file
	     independently from other files.  In particular, line numbers in
	     each file start at 1, the “$” address matches the last line of the
	     current file, and address ranges are limited to the current file.
	     (See Sed Addresses.) The net result is as though each file were
	     edited by a separate sed instance.

     -l      Make output line buffered.

     -n      By default, each line of input is echoed to the standard output
	     after all of the commands have been applied to it.  The -n option
	     suppresses this behavior.

     -r      Same as -E for compatibility with GNU sed.

     -u      Make output unbuffered.

     The form of a sed command is as follows:

	   [address[,address]]function[arguments]

     Whitespace may be inserted before the first address and the function
     portions of the command.

     Normally, sed cyclically copies a line of input, not including its
     terminating newline character, into a pattern space, (unless there is
     something left after a “D” function), applies all of the commands with
     addresses that select that pattern space, copies the pattern space to the
     standard output, appending a newline, and deletes the pattern space.

     Some of the functions use a hold space to save all or part of the pattern
     space for subsequent retrieval.

Sed Addresses
     An address is not required, but if specified must have one of the following
     formats:

	   •   a number that counts input lines cumulatively across input files
	       (or in each file independently if a -i option is in effect);

	   •   a dollar (“$”) character that addresses the last line of input
	       (or the last line of the current file if a -i option was
	       specified);

	   •   a context address that consists of a regular expression preceded
	       and followed by a delimiter.  The closing delimiter can also
	       optionally be followed by the “I” character, to indicate that the
	       regular expression is to be matched in a case-insensitive way.

     A command line with no addresses selects every pattern space.

     A command line with one address selects all of the pattern spaces that
     match the address.

     A command line with two addresses selects an inclusive range.  This range
     starts with the first pattern space that matches the first address.  The
     end of the range is the next following pattern space that matches the
     second address.  If the second address is a number less than or equal to
     the line number first selected, only that line is selected.  The number in
     the second address may be prefixed with a (“+”) to specify the number of
     lines to match after the first pattern.  In the case when the second
     address is a context address, sed does not re-match the second address
     against the pattern space that matched the first address.	Starting at the
     first line following the selected range, sed starts looking again for the
     first address.

     Editing commands can be applied to non-selected pattern spaces by use of
     the exclamation character (“!”) function.

Sed Regular Expressions
     The regular expressions used in sed, by default, are basic regular
     expressions (BREs, see re_format(7) for more information), but extended
     (modern) regular expressions can be used instead if the -E flag is given.
     In addition, sed has the following two additions to regular expressions:

     1.   In a context address, any character other than a backslash (“\”) or
	  newline character may be used to delimit the regular expression.  The
	  opening delimiter needs to be preceded by a backslash unless it is a
	  slash.  For example, the context address \xabcx is equivalent to
	  /abc/.  Also, putting a backslash character before the delimiting
	  character within the regular expression causes the character to be
	  treated literally.  For example, in the context address \xabc\xdefx,
	  the RE delimiter is an “x” and the second “x” stands for itself, so
	  that the regular expression is “abcxdef”.

     2.   The escape sequence \n matches a newline character embedded in the
	  pattern space.  You cannot, however, use a literal newline character
	  in an address or in the substitute command.

     One special feature of sed regular expressions is that they can default to
     the last regular expression used.	If a regular expression is empty, i.e.,
     just the delimiter characters are specified, the last regular expression
     encountered is used instead.  The last regular expression is defined as the
     last regular expression used as part of an address or substitute command,
     and at run-time, not compile-time.  For example, the command “/abc/s//XXX/”
     will substitute “XXX” for the pattern “abc”.

Sed Functions
     In the following list of commands, the maximum number of permissible
     addresses for each command is indicated by [0addr], [1addr], or [2addr],
     representing zero, one, or two addresses.

     The argument text consists of one or more lines.  To embed a newline in the
     text, precede it with a backslash.  Other backslashes in text are deleted
     and the following character taken literally.

     The “r” and “w” functions take an optional file parameter, which should be
     separated from the function letter by white space.  Each file given as an
     argument to sed is created (or its contents truncated) before any input
     processing begins.

     The “b”, “r”, “s”, “t”, “w”, “y”, “!”, and “:” functions all accept
     additional arguments.  The following synopses indicate which arguments have
     to be separated from the function letters by white space characters.

     Two of the functions take a function-list.  This is a list of sed functions
     separated by newlines, as follows:

	   { function
	     function
	     ...
	     function
	   }

     The “{” can be preceded by white space and can be followed by white space.
     The function can be preceded by white space.  The terminating “}” must be
     preceded by a newline, and may also be preceded by white space.

     [2addr] function-list
	     Execute function-list only when the pattern space is selected.

     [1addr]a\
     text    Write text to standard output immediately before each attempt to
	     read a line of input, whether by executing the “N” function or by
	     beginning a new cycle.

     [2addr]b[label]
	     Branch to the “:” function with the specified label.  If the label
	     is not specified, branch to the end of the script.

     [2addr]c\
     text    Delete the pattern space.	With 0 or 1 address or at the end of a
	     2-address range, text is written to the standard output.

     [2addr]d
	     Delete the pattern space and start the next cycle.

     [2addr]D
	     Delete the initial segment of the pattern space through the first
	     newline character and start the next cycle.

     [2addr]g
	     Replace the contents of the pattern space with the contents of the
	     hold space.

     [2addr]G
	     Append a newline character followed by the contents of the hold
	     space to the pattern space.

     [2addr]h
	     Replace the contents of the hold space with the contents of the
	     pattern space.

     [2addr]H
	     Append a newline character followed by the contents of the pattern
	     space to the hold space.

     [1addr]i\
     text    Write text to the standard output.

     [2addr]l
	     (The letter ell.)	Write the pattern space to the standard output
	     in a visually unambiguous form.  This form is as follows:

		   backslash	      \\
		   alert	      \a
		   form-feed	      \f
		   carriage-return    \r
		   tab		      \t
		   vertical tab       \v

	     Nonprintable characters are written as three-digit octal numbers
	     (with a preceding backslash) for each byte in the character (most
	     significant byte first).  Long lines are folded, with the point of
	     folding indicated by displaying a backslash followed by a newline.
	     The end of each line is marked with a “$”.

     [2addr]n
	     Write the pattern space to the standard output if the default
	     output has not been suppressed, and replace the pattern space with
	     the next line of input.

     [2addr]N
	     Append the next line of input to the pattern space, using an
	     embedded newline character to separate the appended material from
	     the original contents.  Note that the current line number changes.

     [2addr]p
	     Write the pattern space to standard output.

     [2addr]P
	     Write the pattern space, up to the first newline character to the
	     standard output.

     [1addr]q
	     Branch to the end of the script and quit without starting a new
	     cycle.

     [1addr]r file
	     Copy the contents of file to the standard output immediately before
	     the next attempt to read a line of input.	If file cannot be read
	     for any reason, it is silently ignored and no error condition is
	     set.

     [2addr]s/regular expression/replacement/flags
	     Substitute the replacement string for the first instance of the
	     regular expression in the pattern space.  Any character other than
	     backslash or newline can be used instead of a slash to delimit the
	     RE and the replacement.  Within the RE and the replacement, the RE
	     delimiter itself can be used as a literal character if it is
	     preceded by a backslash.

	     An ampersand (“&”) appearing in the replacement is replaced by the
	     string matching the RE.  The special meaning of “&” in this context
	     can be suppressed by preceding it by a backslash.	The string “\#”,
	     where “#” is a digit, is replaced by the text matched by the
	     corresponding backreference expression (see re_format(7)).

	     A line can be split by substituting a newline character into it.
	     To specify a newline character in the replacement string, precede
	     it with a backslash.

	     The value of flags in the substitute function is zero or more of
	     the following:

		   N	   Make the substitution only for the N'th occurrence of
			   the regular expression in the pattern space.

		   g	   Make the substitution for all non-overlapping matches
			   of the regular expression, not just the first one.

		   p	   Write the pattern space to standard output if a
			   replacement was made.  If the replacement string is
			   identical to that which it replaces, it is still
			   considered to have been a replacement.

		   w file  Append the pattern space to file if a replacement was
			   made.  If the replacement string is identical to that
			   which it replaces, it is still considered to have
			   been a replacement.

		   i or I  Match the regular expression in a case-insensitive
			   way.

     [2addr]t [label]
	     Branch to the “:” function bearing the label if any substitutions
	     have been made since the most recent reading of an input line or
	     execution of a “t” function.  If no label is specified, branch to
	     the end of the script.

     [2addr]w file
	     Append the pattern space to the file.

     [2addr]x
	     Swap the contents of the pattern and hold spaces.

     [2addr]y/string1/string2/
	     Replace all occurrences of characters in string1 in the pattern
	     space with the corresponding characters from string2.  Any
	     character other than a backslash or newline can be used instead of
	     a slash to delimit the strings.  Within string1 and string2, a
	     backslash followed by any character other than a newline is that
	     literal character, and a backslash followed by an ``n'' is replaced
	     by a newline character.

     [2addr]!function
     [2addr]!function-list
	     Apply the function or function-list only to the lines that are not
	     selected by the address(es).

     [0addr]:label
	     This function does nothing; it bears a label to which the “b” and
	     “t” commands may branch.

     [1addr]=
	     Write the line number to the standard output followed by a newline
	     character.

     [0addr]
	     Empty lines are ignored.

     [0addr]#
	     The “#” and the remainder of the line are ignored (treated as a
	     comment), with the single exception that if the first two
	     characters in the file are “#n”, the default output is suppressed.
	     This is the same as specifying the -n option on the command line.

ENVIRONMENT
     The COLUMNS, LANG, LC_ALL, LC_CTYPE and LC_COLLATE environment variables
     affect the execution of sed as described in environ(7).

EXIT STATUS
     The sed utility exits 0 on success, and >0 if an error occurs.

EXAMPLES
     Replace ‘bar’ with ‘baz’ when piped from another command:

	   echo "An alternate word, like bar, is sometimes used in examples." | sed 's/bar/baz/'

     Using backlashes can sometimes be hard to read and follow:

	   echo "/home/example" | sed  's/\/home\/example/\/usr\/local\/example/'

     Using a different separator can be handy when working with paths:

	   echo "/home/example" | sed 's#/home/example#/usr/local/example#'

     Replace all occurances of ‘foo’ with ‘bar’ in the file test.txt, without
     creating a backup of the file:

	   sed -i '' -e 's/foo/bar/g' test.txt

SEE ALSO
     awk(1), ed(1), grep(1), regex(3), re_format(7)

STANDARDS
     The sed utility is expected to be a superset of the IEEE Std 1003.2
     (“POSIX.2”) specification.

     The -E, -I, -a and -i options, the special meaning of -f -, the prefixing
     “+” in the second member of an address range, as well as the “I” flag to
     the address regular expression and substitution command are non-standard
     FreeBSD extensions and may not be available on other operating systems.

HISTORY
     A sed command, written by L. E. McMahon, appeared in Version 7 AT&T UNIX.

AUTHORS
     Diomidis D. Spinellis <dds@FreeBSD.org>

BUGS
     Multibyte characters containing a byte with value 0x5C (ASCII ‘\’) may be
     incorrectly treated as line continuation characters in arguments to the
     “a”, “c” and “i” commands.  Multibyte characters cannot be used as
     delimiters with the “s” and “y” commands.

awk

AWK是一种处理文本文件的语言。它将文件作为记录序列处理。在一般情况下，文件内容的每行都是一个记录。每行内容都会被分割成一系列的域，因此，我们可以认为一行的第一个词为第一个域，第二个词为第二个，以此类推。AWK程序是由一些处理特定模式的语句块构成的。AWK一次可以读取一个输入行。对每个输入行，AWK解释器会判断它是否符合程序中出现的各个模式，并执行符合的模式所对应的动作。

格式

1	`awk [ -F fs ] [ -v var=value ] [ 'prog' \| -f progfile ] [ file ... ]`

参数

案例

man

AWK(1)			     General Commands Manual			  AWK(1)



NAME
       awk - pattern-directed scanning and processing language

SYNOPSIS
       awk [ -F fs ] [ -v var=value ] [ 'prog' | -f progfile ] [ file ...  ]

DESCRIPTION
       Awk scans each input file for lines that match any of a set of patterns
       specified literally in prog or in one or more files specified as -f
       progfile.  With each pattern there can be an associated action that will
       be performed when a line of a file matches the pattern.	Each line is
       matched against the pattern portion of every pattern-action statement;
       the associated action is performed for each matched pattern.  The file
       name - means the standard input.  Any file of the form var=value is
       treated as an assignment, not a filename, and is executed at the time it
       would have been opened if it were a filename.  The option -v followed by
       var=value is an assignment to be done before prog is executed; any number
       of -v options may be present.  The -F fs option defines the input field
       separator to be the regular expression fs.

       An input line is normally made up of fields separated by white space, or
       by the regular expression FS.  The fields are denoted $1, $2, ..., while
       $0 refers to the entire line.  If FS is null, the input line is split
       into one field per character.

       A pattern-action statement has the form:

	      pattern { action }

       A missing { action } means print the line; a missing pattern always
       matches.  Pattern-action statements are separated by newlines or
       semicolons.

       An action is a sequence of statements.  A statement can be one of the
       following:

	      if( expression ) statement [ else statement ]
	      while( expression ) statement
	      for( expression ; expression ; expression ) statement
	      for( var in array ) statement
	      do statement while( expression )
	      break
	      continue
	      { [ statement ... ] }
	      expression	      # commonly var = expression
	      print [ expression-list ] [ > expression ]
	      printf format [ , expression-list ] [ > expression ]
	      return [ expression ]
	      next		      # skip remaining patterns on this input line
	      nextfile		      # skip rest of this file, open next, start at top
	      delete array[ expression ]# delete an array element
	      delete array	      # delete all elements of array
	      exit [ expression ]     # exit immediately; status is expression

       Statements are terminated by semicolons, newlines or right braces.  An
       empty expression-list stands for $0.  String constants are quoted " ",
       with the usual C escapes recognized within.  Expressions take on string
       or numeric values as appropriate, and are built using the operators + - *
       / % ^ (exponentiation), and concatenation (indicated by white space).
       The operators ! ++ -- += -= *= /= %= ^= > >= < <= == != ?: are also
       available in expressions.  Variables may be scalars, array elements
       (denoted x[i]) or fields.  Variables are initialized to the null string.
       Array subscripts may be any string, not necessarily numeric; this allows
       for a form of associative memory.  Multiple subscripts such as [i,j,k]
       are permitted; the constituents are concatenated, separated by the value
       of SUBSEP.

       The print statement prints its arguments on the standard output (or on a
       file if > file  or >> file  is present or on a pipe if | cmd  is
       present), separated by the current output field separator, and terminated
       by the output record separator.	file and cmd may be literal names or
       parenthesized expressions; identical string values in different
       statements denote the same open file.  The printf statement formats its
       expression list according to the format (see printf(3)).  The built-in
       function close(expr) closes the file or pipe expr.  The built-in function
       fflush(expr) flushes any buffered output for the file or pipe expr.

       The mathematical functions atan2, cos, exp, log, sin, and sqrt are built
       in.  Other built-in functions:


       length
	    the length of its argument taken as a string, number of elements in
	    an array for an array argument, or length of $0 if no argument.
       rand random number on [0,1).
       srand
	    sets seed for rand and returns the previous seed.
       int  truncates to an integer value.
       substr(s, m [, n])
	    the n-character substring of s that begins at position m counted
	    from 1.  If no n, use the rest of the string.
       index(s, t)
	    the position in s where the string t occurs, or 0 if it does not.
       match(s, r)
	    the position in s where the regular expression r occurs, or 0 if it
	    does not.  The variables RSTART and RLENGTH are set to the position
	    and length of the matched string.
       split(s, a [, fs])
	    splits the string s into array elements a[1], a[2], ..., a[n], and
	    returns n.	The separation is done with the regular expression fs or
	    with the field separator FS if fs is not given.  An empty string as
	    field separator splits the string into one array element per
	    character.
       sub(r, t [, s])
	    substitutes t for the first occurrence of the regular expression r
	    in the string s.  If s is not given, $0 is used.
       gsub(r, t [, s])
	    same as sub except that all occurrences of the regular expression
	    are replaced; sub and gsub return the number of replacements.
       sprintf(fmt, expr, ...)
	    the string resulting from formatting expr ...  according to the
	    printf(3) format fmt.
       system(cmd)
	    executes cmd and returns its exit status. This will be -1 upon
	    error, cmd's exit status upon a normal exit, 256 + sig upon death-
	    by-signal, where sig is the number of the murdering signal, or 512 +
	    sig if there was a core dump.
       tolower(str)
	    returns a copy of str with all upper-case characters translated to
	    their corresponding lower-case equivalents.
       toupper(str)
	    returns a copy of str with all lower-case characters translated to
	    their corresponding upper-case equivalents.

       The ``function'' getline sets $0 to the next input record from the
       current input file; getline < file  sets $0 to the next record from file.
       getline x sets variable x instead.  Finally, cmd | getline  pipes the
       output of cmd into getline; each call of getline returns the next line of
       output from cmd.  In all cases, getline returns 1 for a successful input,
       0 for end of file, and -1 for an error.

       Patterns are arbitrary Boolean combinations (with ! || &&) of regular
       expressions and relational expressions.	Regular expressions are as
       defined in re_format(7).  Isolated regular expressions in a pattern apply
       to the entire line.  Regular expressions may also occur in relational
       expressions, using the operators ~ and !~.  /re/ is a constant regular
       expression; any string (constant or variable) may be used as a regular
       expression, except in the position of an isolated regular expression in a
       pattern.

       A pattern may consist of two patterns separated by a comma; in this case,
       the action is performed for all lines from an occurrence of the first
       pattern though an occurrence of the second.

       A relational expression is one of the following:

	      expression matchop regular-expression
	      expression relop expression
	      expression in array-name
	      (expr,expr,...) in array-name

       where a relop is any of the six relational operators in C, and a matchop
       is either ~ (matches) or !~ (does not match).  A conditional is an
       arithmetic expression, a relational expression, or a Boolean combination
       of these.

       The special patterns BEGIN and END may be used to capture control before
       the first input line is read and after the last.  BEGIN and END do not
       combine with other patterns.  They may appear multiple times in a program
       and execute in the order they are read by awk.

       Variable names with special meanings:


       ARGC argument count, assignable.
       ARGV argument array, assignable; non-null members are taken as filenames.
       CONVFMT
	    conversion format used when converting numbers (default %.6g).
       ENVIRON
	    array of environment variables; subscripts are names.
       FILENAME
	    the name of the current input file.
       FNR  ordinal number of the current record in the current file.
       FS   regular expression used to separate fields; also settable by option
	    -Ffs.
       NF   number of fields in the current record.
       NR   ordinal number of the current record.
       OFMT output format for numbers (default %.6g).
       OFS  output field separator (default space).
       ORS  output record separator (default newline).
       RLENGTH
	    the length of a string matched by match.
       RS   input record separator (default newline).  If empty, blank lines
	    separate records.  If more than one character long, RS is treated as
	    a regular expression, and records are separated by text matching the
	    expression.
       RSTART
	    the start position of a string matched by match.
       SUBSEP
	    separates multiple subscripts (default 034).

       Functions may be defined (at the position of a pattern-action statement)
       thus:

	      function foo(a, b, c) { ...; return x }

       Parameters are passed by value if scalar and by reference if array name;
       functions may be called recursively.  Parameters are local to the
       function; all other variables are global.  Thus local variables may be
       created by providing excess parameters in the function definition.

ENVIRONMENT VARIABLES
       If POSIXLY_CORRECT is set in the environment, then awk follows the POSIX
       rules for sub and gsub with respect to consecutive backslashes and
       ampersands.

EXAMPLES
       length($0) > 72
       Print lines longer than 72 characters.
       { print $2, $1 }
       Print first two fields in opposite order.

       BEGIN { FS = ",[ \t]*|[ \t]+" }
	     { print $2, $1 }

	      Same, with input fields separated by comma and/or spaces and tabs.

	    { s += $1 }
       END  { print "sum is", s, " average is", s/NR }

	      Add up first column, print sum and average.
	      /start/, /stop/
	      Print all lines between start/stop pairs.

       BEGIN	 {    # Simulate echo(1)
	    for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
	    printf "\n"
	    exit }

SEE ALSO
       grep(1), lex(1), sed(1)
       A. V. Aho, B. W. Kernighan, P. J. Weinberger, The AWK Programming
       Language, Addison-Wesley, 1988.	ISBN 0-201-07981-X.

BUGS
       There are no explicit conversions between numbers and strings.  To force
       an expression to be treated as a number add 0 to it; to force it to be
       treated as a string concatenate "" to it.

       The scope rules for variables in functions are a botch; the syntax is
       worse.

       Only eight-bit characters sets are handled correctly.



				   2020-11-24				  AWK(1)

资料

服务器篇 > 开发工具

Linux篇-文本三剑客grep/sed/awk

https://mikeygithub.github.io/2020/10/10/yuque/Linux篇-文本三剑客grep!sed!awk/

作者

Mikey

发布于

2020年10月10日

许可协议

刷题笔记-位运算/进制转换上一篇

算法学习-动态规划下一篇