grep
, cut
, sed
, and awk
grep
grep
stands for "global regular expression print", which basically searches for the lines that contains the pattern given. Let the file hosts
be given below, copied from here.
# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file.
#
# For example:
#
# 102.54.94.97 rhino.acme.com # source server
# 38.25.63.10 x.acme.com # x client host
# localhost name resolution is handled within DNS itself.
# 127.0.0.1 localhost
# ::1 localhost
192.168.168.200 doug.LoweWriter.com # Doug’s computer
192.168.168.201 server1.LoweWriter.com s1 # Main server
192.168.168.202 debbie.LoweWriter.com # Debbie’s computer
192.168.168.203 printer1.LoweWriter.com p1 # HP Laser Printer
192.168.168.204 www.google.com # Google
Now we are going to find out all the lines that contains .com
. The following command
cat hosts | grep ".com"
prints
# 102.54.94.97 rhino.acme.com # source server
# 38.25.63.10 x.acme.com # x client host
192.168.168.200 doug.LoweWriter.com # Doug’s computer
192.168.168.201 server1.LoweWriter.com s1 # Main server
192.168.168.202 debbie.LoweWriter.com # Debbie’s computer
192.168.168.203 printer1.LoweWriter.com p1 # HP Laser Printer
192.168.168.204 www.google.com # Google
It accidentally includes the lines commented out as well. To exclude these, we use the regular expression with grep
by the -E
option.
cat hosts | grep -E "^[^#].*\.com"
The ouput now becomes
192.168.168.200 doug.LoweWriter.com # Doug’s computer
192.168.168.201 server1.LoweWriter.com s1 # Main server
192.168.168.202 debbie.LoweWriter.com # Debbie’s computer
192.168.168.203 printer1.LoweWriter.com p1 # HP Laser Printer
192.168.168.204 www.google.com # Google
cut
Now we want the IP addresses and the domain names only, leaving out the comments at the end of lines. The following cut
command does this job,
cat hosts | grep -E "^[^#].*\.com" | cut -d " " -f 1-2
which prints
192.168.168.200 doug.LoweWriter.com
192.168.168.201 server1.LoweWriter.com
192.168.168.202 debbie.LoweWriter.com
192.168.168.203 printer1.LoweWriter.com
192.168.168.204 www.google.com
cut -d " "
doesn't work like String.split()
in most programming languages.
It should be noted that, with cut -d " "
, every single space is counted as a delimiter. For example,
cat hosts | grep -E "^[^#].*\.com" | cut -d " " -f 1-3
prints
192.168.168.200 doug.LoweWriter.com
192.168.168.201 server1.LoweWriter.com s1
192.168.168.202 debbie.LoweWriter.com
192.168.168.203 printer1.LoweWriter.com p1
192.168.168.204 www.google.com
sed
sed
stands for "stream editor", with which we are able to make certain replacements. For example
cat hosts | grep -E "^[^#].*\.com" | cut -d " " -f 1-2 | sed 's/192\.168/255.255/'
prints
255.255.168.200 doug.LoweWriter.com
255.255.168.201 server1.LoweWriter.com
255.255.168.202 debbie.LoweWriter.com
255.255.168.203 printer1.LoweWriter.com
255.255.168.204 www.google.com
awk
awk
stands for "Aho, Weinberger, Kernighan", i.e. the authors. It's among the most powerful text processing tools. We may swap the two columns with it, using
cat hosts | grep -E "^[^#].*\.com" | cut -d " " -f 1-2 | awk '{ t = $1; $1 = $2; $2 = t; print }'
which prints
doug.LoweWriter.com 192.168.168.200
server1.LoweWriter.com 192.168.168.201
debbie.LoweWriter.com 192.168.168.202
printer1.LoweWriter.com 192.168.168.203
www.google.com 192.168.168.204