AWK Lenguaje
Enviado por E1000IO • 3 de Agosto de 2014 • 7.137 Palabras (29 Páginas) • 215 Visitas
Awk -- A Pattern Scanning and Processing Language USD:19-1
Awk -- A Pattern Scanning and Processing Language
(Second Edition)
Alfred V. Aho
Brian W. Kernighan
Peter J. Weinberger
AT&T Bell Laboratories
Murray Hill, New Jersey 07974
ABSTRACT
Awk is a programming language whose basic opera-
tion is to search a set of files for patterns, and to
perform specified actions upon lines or fields of lines
which contain instances of those patterns. Awk makes
certain data selection and transformation operations
easy to express; for example, the awk program
length > 72
prints all input lines whose length exceeds 72 charac-
ters; the program
NF % 2 == 0
prints all lines with an even number of fields; and the
program
{ $1 = log($1); print }
replaces the first field of each line by its logarithm.
Awk patterns may include arbitrary boolean combi-
nations of regular expressions and of relational opera-
tors on strings, numbers, fields, variables, and array
elements. Actions may include the same pattern-
matching constructions as in patterns, as well as
arithmetic and string expressions and assignments, if-
else, while, for statements, and multiple output
USD:19-2 Awk -- A Pattern Scanning and Processing Language
streams.
This report contains a user's guide, a discussion
of the design and implementation of awk, and some tim-
ing statistics.
1. Introduction
Awk is a programming language designed to make many common
information retrieval and text manipulation tasks easy to state
and to perform.
The basic operation of awk is to scan a set of input lines
in order, searching for lines which match any of a set of pat-
terns which the user has specified. For each pattern, an action
can be specified; this action will be performed on each line that
matches the pattern.
Readers familiar with the UNIX program grep[1] will recog-
nize the approach, although in awk the patterns may be more gen-
eral than in grep, and the actions allowed are more involved than
merely printing the matching line. For example, the awk program
{print $3, $2}
prints the third and second columns of a table in that order.
The program
$2 ~ /A|B|C/
prints all input lines with an A, B, or C in the second field.
The program
--------------------------------------------------
UNIX is a trademark of AT&T Bell Laboratories.
Awk -- A Pattern Scanning and Processing Language USD:19-3
$1 != prev { print; prev = $1 }
prints all lines in which the first field is different from the
previous first field.
1.1. Usage
The command
awk program [files]
executes the awk commands in the string program on the set of
named files, or on the standard input if there are no files. The
statements can also be placed in a file pfile, and executed by
the command
awk -f pfile [files]
1.2. Program Structure
An awk program is a sequence of statements of the form:
pattern { action }
pattern { action }
...
Each line of input is matched against each of the patterns in
turn. For each pattern that matches, the associated action is
executed. When all the patterns have been tested, the next line
is fetched and the matching starts over.
Either the pattern or the action may be left out, but not
USD:19-4 Awk -- A Pattern Scanning and Processing Language
both. If there is no action for a pattern, the matching line is
simply copied to the output. (Thus a line which matches several
patterns can be printed several times.) If there is no pattern
for an action, then the action is performed for every input line.
A line which matches no pattern is ignored.
Since patterns and actions are both optional, actions must
be enclosed in braces to distinguish them from patterns.
1.3. Records and Fields
Awk input is divided into ``records'' terminated by a record
separator. The default record separator is a newline, so by
default awk processes its input a line at a time. The number of
the current record is available in a variable named NR.
Each input record is considered to be divided into
``fields.'' Fields are normally separated by white space --
blanks or tabs -- but the input field separator may be changed,
as described below. Fields are referred to as $1, $2, and so
forth, where $1 is the first field, and $0 is the whole input
...