Getting Started with Awk Command

Awk is a powerful data processing programming language built in to almost every *nix system. It looks like a general purpose programming language on the surface, but it’s built to take input and run actions based on that input. If you need to process text based on certain conditions, awk will almost always get the job done more quickly than a general purpose language like C. It’s also interpreted, so you avoid the long-winded compilation and debugging process of compiled languages.

Extra fun fact: the program’s odd name is an initialism of its programmers’ names: Alfred Aho, Peter Weinberger, and Brian Kernighan

Awk’s basic syntax

When invoked on the command line, awk follows the basic pattern below:

pattern { action } file
pattern { action } file
...

Awk will execute the action whenever the pattern is matched within the file specified. If you don’t specify a file, awk will run on the standard output. When matching patterns, awk can take regular expressions as well as programmatic input. Let’s consider this basic example below:

awk '/com/ { print $0 }' emails

awk-command-getting-started-1

This one-line program will print each line from the file “emails” that contain the characters com. In awk $0 refers to the current line, which is also the default behavior. The line could have been written without $0, and it would have functioned identically.

Printing fields

Because awk can identify and parse field separators, it’s useful for printing out specific columns or rows of data. We will use the “/etc/passwd” file for this example.

awk -F":" '{ print $1 }' /etc/passwd

awk-command-getting-started-2

This one-line program does a few things. The flag -F indicates that the next character (: in this example) should be interpreted as the field separator. Awk them prints the first field, specified by $1.

We can also print more than one field at a time by specifying the fields sequentially:

awk -F":" '{ print $4 " " $5}' /etc/passw

It will produce output that looks like the following.

awk-command-getting-started-4

This prints the fourth and fifth fields of the passwd file with a space between them. Note that the space is between double quotes. This specifies it as a literal character within the print command, so it’s printed as written. We can also add more complicated literals to clean up our output:

awk -F":" '{ print "process: " $5 "\t\t " "directory: "$6}' /etc/passwd

awk-command-getting-started-5

This will print the output with labels for identification. And we can output all of this to a new file using a caret (>).

awk -F":" '{ print "process: " $5 "\t\t " "directory: "$6}' /etc/passwd > processes.txt

We can combine what we know so far to process data extensively. For example, we can use regular expressions to print all lines from a document that contains a valid US phone number.

awk '/^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}$/ { print }' contacts

Expanding the Awk command’s matching power

Awk can also process information using a variety of operands. This includes standard operands like ==, <, >, <=, >=, and !=, as well as awk-specific operands ~ and !~, which mean “matches” and “does not match” respectively. These operands are used when comparing regular expressions with Boolean logic, as well as more standard programmatic phrases.

Awk Command Examples

awk 'length($0) > 80' data

Prints all lines longer than eighty characters in the file “data.” Note the lack of a print statement: in the absence of a specified action, awk will print the full line whenever a pattern matches.

$1 == "user" { print }

Prints all lines where the first field equals the string “user.” Without an -F flag, awk will use white space as the default field separator. Also, note that awk and the file are not specified. This is for use in scripts in separate files, as covered below.

$5 ~ /root/ { print $3 }

Prints the third field whenever the fifth field matches the regular expression /root/.

{ 
  if ( $5 !~ /root/ ) { 
          print $3 
  } 
}

When field 5 does not match /root/, print field three. This uses the C-like if statement, which is also compatible with awk. This format allows for more flexibility for programmers familiar with general purpose languages.

Saving scripts in files

Awk scripts can also be saved in files which allow you to save more complex programs:

awk -f ~/scripts/program.awk data

When using the -f flag, awk runs the script in the specified file path, namely program.awk. The commands in that program will process the file “data.”

Actions can also be run before and after the program, using BEGIN and END:

BEGIN { FS=":" } # indicates that : is the field separator for the program.
 
#operations
 
END   { print "You're done" } # prints a joyful message for the user

As you can see above, the # symbol starts a comment, which lasts until the end of the line.

Conclusion

This guide only touches on the most basic elements of awk. There’s far more to build and explore beyond this. Examine the GNU documentation for awk or The Awk Programming Language, and the awk textbook written by the developers of the program.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Alexander Fox

Alexander Fox is a tech and science writer based in Philadelphia, PA with one cat, three Macs and more USB cables than he could ever use.