Awk

Table of Contents

About Awk

Awk is a powerful text-processing utility on GNU/Linux. It can perform complex text processing tasks, such as pulling out certain columns of information or doing substitutions. One thing to note is that there are different variants of Awk. On GNU/Linux distros, Awk is actually Gawk (GNU Awk). So the package name is actually Gawk, but when you run the commands it’s just “awk”.

Basic Awk Commands

awk options ’selection _criteria {action }’ input-file > output-file

Print out file (similar to cat)

  • awk ’{print}’ test.sh
  • awk ’{print $0}’ test.sh
  • cat /etc/shells | awk …
  • awk … /etc/shells =FIELD SEPARATORS
  • awk ’BEGIN{FS=“:”; OFS=“-”} {print $1,$6,$7}’ /etc/passwd
  • cat etc/shells | awk ’^\ {print $0}’
  • cat etc/shells | awk ’^\ {print $NF}’ awk -F “/” ’/^\//{print $NF}’ /etc/shells | uniq
  • awk -F “:” ’ {print $2}’ /etc/passwd

&& and ||

ps -ef | awk '/root/ && $2<100'
ps -ef | awk '/root/ || /dt/ && (length($NF) < 30)'

Printing columns

  • ps -ef | head | awk ’{print $1“ ”$8}’
  • show \t \n as alternative to just “ ”
  • show $NF instead of $8

Filter results for search pattern

/\[/
  • df | awk ’\/dev\/loop {print $1“\t”$2“\t”$3}’
  • df | awk ’\/dev\/loop {print $1“\t”$2 - $3}’

Filter results by length of line (character numbers)

awk 'length($0) > 7' /etc/shells

Find a specific string in any columns

ps -ef | awk '{ if($NF == "/bin/fish") print $0};'

Increment/Decrement

Post-increment

awk 'BEGIN { for(i=1;i<=10;i++) print "square of", i, "is",i*i; }'

Regex

The ~ is the regular expression match operator. It checks if a string matches the provided regular expression.

awk '$1 ~ /^[b,c]/ {print $0}' .bashrc

Print substr()

We use the substr() function. It prints a substring from the given string. We apply the function on each line, skipping the first three characters. In other words, we print each record from the fourth character till its end.

awk '{print substr($0, 4)}'  numbered.txt

Match and RSTART

The match() function sets the RSTART variable; it is the index of the start of the matching pattern.

awk 'match($0, /o/) {print $0 " has \"o\" character at " RSTART}' numbered.txt

Builtin Variables in Awk

NR

NR command keeps a current count of the number of input records. Remember that records are usually lines. Awk command performs the pattern/action statements once for each record in a file.

Print range of lines

df | awk 'NR==7, NR==11 {print NR, $0}'
NOTE: NR prints the line numbers, delete the NR if line numbers not needed!

Print the line count

awk 'END {print NR}' /etc/shells
awk 'END {print NR}' /etc/shells /etc/passwd

NF

NF command keeps a count of the number of fields within the current input record. We’ve already shown using $NF for last field.

Using NF to print all non-empty lines

awk 'NF > 0' /etc/shells

FS

FS command contains the field separator character which is used to divide fields on the input line. The default is “white space”, meaning space and tab characters. FS can be reassigned to another character (typically in BEGIN) to change the field separator.

RS

RS command stores the current record separator character. Since, by default, an input line is the input record, the default record separator character is a newline.

OFS

OFS command stores the output field separator, which separates the fields when Awk prints them. The default is a blank space. Whenever print has several parameters separated with commas, it will print the value of OFS in between each parameter.

ORS

ORS command stores the output record separator, which separates the output lines when Awk prints them. The default is a newline character. print automatically outputs the contents of ORS at the end of whatever it is given to print.

Footer

Copyright © 2020-2021 Derek Taylor (DistroTube)

This page is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (CC-BY-ND 4.0).

The source code for distro.tube can be found on GitLab. User-submitted contributions to the site are welcome, as long as the contributor agrees to license their submission with the CC-BY-ND 4.0 license.

Author: dt

Created: 2022-02-20 Sun 10:16