Unit 7.2
Unix utilities

Presenter Notes

本节幻灯片

Unit objectives

After completing this unit, you should be able to:

  • Use the find and locate commands to search for files
  • Use the head and tail commands to view specific lines in a file
  • Use the type, which, and whereis commands to find commands
  • Use the file command to find out the content of a file
  • Use the grep command to search text files for patterns
  • Use the cut command to list specific columns of a file
  • Use the sort command to sort the contents of a file
  • Use the join and paste commands to combine files

Presenter Notes

find

Presenter Notes

find

Search one or more directory structures for files that meet specified criteria. Display the names of matching files or execute commands against those files.

$ find <where> <what> <how>

$ cd /home/tux1
$ find . -name orange
./orange
./color/orange
./shape/orange
./size/orange

Presenter Notes

find命令是一个无处不在命令,是linux中最有用的命令之一。find命令用于:在一个目录(及子目录)中搜索文件,你可以指定一些匹配条件,如按文件名、文件类型、用户甚至是时间戳查找文件。下面就通过实例来体验下find命令的强大。

find

The -exec option executes a command on each of the file names found.

$ find . -name 'b*' -exec ls -i {} \;
187787  ./color/blue
187792  ./shape/box
202083  ./size/big

A set of curly brackets ({ }) is a placeholder for each file name. The backslash (\) escapes the following semicolon (;).

The -ok option also causes command execution but on an interactive basis.

$ find  .  -name  o\*  -ok  rm {}  \;
< rm ... ./orange > ? y
< rm ... ./color/orange > ? y
< rm ... ./shape/orange > ? n
< rm ... ./size/orange > ? y

Presenter Notes

find [-H] [-L] [-P] [-D debugopts] [-Olevel] [path...] [expression]

其实[-H] [-L] [-P] [-D debugopts] [-Olevel]这几个选项并不常用,上面的find命令的常用形式可以简化为: find [path...] [expression]

find

Presenter Notes

find

$ find . -perm 777
./size/giant

File matches expr 1 and expr 2:
$ find . -name 's*' -type f -a -size +2\
>-exec ls -i {} \;
187787  ./sample
187792  ./shape/square
202083  ./size/small

File matches expr 1 or expr 2:
$ find . -name circle -o -name 'b*'
./color/blue
./size/big
./shape/box
./shape/circle

Presenter Notes

-name 按照文件名查找文件。 find /dir -name filename 在/dir目录及其子目录下面查找名字为filename的文件 find . -name "*.c" 在当前目录及其子目录(用“.”表示)中查找任何扩展名为“c”的文件 -perm 按照文件权限来查找文件。 find . -perm 755 –print 在当前目录下查找文件权限位为755的文件,即文件属主可以读、写、执行,其他用户可以读、执行的文件 -prune 使用这一选项可以使find命令不在当前指定的目录中查找,如果同时使用-depth选项,那么-prune将被find命令忽略。 find /apps -path "/apps/bin" -prune -o –print 在/apps目录下查找文件,但不希望在/apps/bin目录下查找 find /usr/sam -path "/usr/sam/dir1" -prune -o –print 在/usr/sam目录下查找不在dir1子目录之内的所有文件 -user 按照文件属主来查找文件。 find ~ -user sam –print 在$HOME目录中查找文件属主为sam的文件 -group 按照文件所属的组来查找文件。 find /apps -group gem –print 在/apps目录下查找属于gem用户组的文件 -mtime -n +n 按照文件的更改时间来查找文件, - n表示文件更改时间距现在n天以内,+ n表示文件更改时间距现在n天以前。 find / -mtime -5 –print 在系统根目录下查找更改时间在5日以内的文件 find /var/adm -mtime +3 –print 在/var/adm目录下查找更改时间在3日以前的文件 -nogroup 查找无有效所属组的文件,即该文件所属的组在/etc/groups中不存在。 find / –nogroup -print -nouser 查找无有效属主的文件,即该文件的属主在/etc/passwd中不存在。 find /home -nouser –print -newer file1 ! file2 查找更改时间比文件file1新但比文件file2旧的文件。 -

The type, which, and whereis

To find out what the path to a command is, use type.

$ type find echo
find is /usr/bin/find
echo is a shell builtin

To find out where the binary is located, use which.

$ which find echo
/usr/bin/find
/bin/echo

To locate the binary, source, and manual page files of a command, use whereis.

$ whereis find echo
find: /usr/bin/find /usr/share/man/man1/find.1.gz
echo: /bin/echo /usr/share/man/man1/echo.1.gz

Presenter Notes

type命令其实不能算查找命令,它是用来区分某个命令到底是由shell自带的,还是由shell外部的独立二进制文件提供的。如果一个命令是外部命令,那么使用-p参数,会显示该命令的路径,相当于which命令。 whereis命令只能用于程序名的搜索,而且只搜索二进制文件(参数-b)、man说明文件(参数-m)和源代码文件(参数-s)。如果省略参数,则返回所有信息。 which命令的作用是,在PATH变量指定的路径中,搜索某个系统命令的位置,并且返回第一个搜索结果。也就是说,使用which命令,就可以看到某个系统命令是否存在,以及执行的到底是哪一个位置的命令。

The file command

With the file command, you can find out what the type of data is in the file.

$ file /etc/passwd /bin/ls /home/tux1 /tmp/fake.jpg
/etc/passwd:   ASCII text
/bin/ls:       ELF 32-bit LSB executable
/home/tux1:    directory
/tmp/fake.jpg: PDF document, version 1.5

Presenter Notes

功能说明: 辨识文件类型。

Common filters

grep: Only displays lines that match a pattern
sed: Allows string substitutions
awk: Pattern scanning and processing
fmt: Insert line wraps so that text looks pretty
expand, unexpand: Change tabs to spaces and vice versa
tr: Substitute characters
nl: Number lines
pr: Format for printer    
tee: reads standard input and sends the data to both 
     standard output and a file
sort: Sort the lines in the fil
cut: list specific columns of a file
head and tail  : view specific lines in a file
join and paste : combine files
tac: Display lines in reverse order
rev: reverse the order of characters in every line

Presenter Notes

一些常用的过滤

grep

Searches one or more files or standard input for lines matching a pattern Simple match or regular expression.

grep  [options]  pattern  [file1 ...]

-v      Print lines that do not match
-c      Print only a count of matching lines
-l      Print only the names of files with matching lines
-n      Number the matching lines
-i      Ignore case of letters when making comparisons
-w      Do a whole word search
-f <file>   Read expressions from file instead command line

Presenter Notes

grep (global search regular expression(RE) and print out the line,全面搜索正则表达式并把行打印出来)是一种强大的文本搜索工具,它能使用正则表达式搜索文本,并把匹配的行打印出来。

Sample data

Phone1:

Chris                   10300       internal
Jan                     20500       internal
Lee                     30500       external
Pat                     40599       external
Robin                   50599       external
Terry                   60300       internal

Phone2:

Chris                   1342        internal
Jan                     2083        internal
Lee                     3139        external
Pat                     4200        internal
Robin                   5200        internal
Terry                   6342        external

Presenter Notes

Basic grep

$ grep 20 phone1
Jan     20500       internal

$ grep 20 phone*
phone1:Jan      20500       internal
phone2:Jan      2083        internal    
phone2:Pat      4200        internal
phone2:Robin    5200        internal

$ grep -v Jan phone2
Chris       1342        internal
Lee         3139        external
Pat         4200        internal
Robin       5200        internal
Terry       6342        external

Presenter Notes

grep with regular expressions

Patterns with metacharacters should be in single quotation marks (') so that the shell will leave it alone. Valid metacharacters with grep include $ . * ^ [ - ].

Match a single character:

.       Any single character
[a-f]   Any one of the characters in the range a through f
[^a-f]  Any one of the characters not in the range a through f

Example:

$ grep '[1-3].[^3-5]' phone1
Chris                   10300       internal
Terry                   60300       internal

Presenter Notes

正则表达式只是字符串的一种描述,只有和支持正则表达式的工具相结合才能进行字符串处理。

Regular Expressions : metacharacters

Backslash is used for sepial metacharacters

\.  \[  \]  \\   
\t  : tab  
\d  : any digit, equals [0-9]  
\D  : not \d, equals [^0-9]  
\w  : any letter or digit or underscore, equals [a-zA-Z0-9_]
\W  : not \w, equals [^a-zA-Z0-9_]

Presenter Notes

Repetition

A regular expression may be followed by one of several repetition operators:

   ?      The preceding item is optional and matched at most once.
   *      The preceding item will be matched zero or more times.
   +      The preceding item will be matched one or more times.
   {n}    The preceding item is matched exactly n times.
   {n,}   The preceding item is matched n or more times.
   {,m}   The preceding item is matched at most m times.  This  is  a  GNU
          extension.
   {n,m}  The  preceding  item  is  matched at least n times, but not more
          than m times.

Presenter Notes

Position

^a      Any line that starts with a
z$      Any line that ends with z
\b      boundary of the word. (that is, between \w and \W)

Example:

Display only the lines of phone1 that contain an e and end in a 0:  
$ grep ___________ phone1

Presenter Notes

Other greps

fgrep allows only fixed strings (no regular expressions). egrep allows for multiple (alternate) patterns.

$ egrep '20500|40599|50599' phone1
Jan         20500   internal
Pat         40599   external
Robin       50599   external

What does the following command do?

$ grep 30 phone1 | grep intern
????????

Presenter Notes

Unix的grep家族包括grep、egrep和fgrep。egrep和fgrep的命令只跟grep有很小不同。egrep是grep的扩展,支持更多的re元字符, fgrep就是fixed grep或fast grep,它们把所有的字母都看作单词,也就是说,正则表达式中的元字符表示回其自身的字面意义,不再特殊。linux使用GNU版本的grep。它功能更强,可以通过-G、-E、-F命令行选项来使用egrep和fgrep的功能。

cut

Pull selected columns or fields from one or more files
Syntax:

cut -f(ields) -d(elimiter) file(s)
cut -c(haracters) file(s)

Example:

$ cat /etc/passwd
root:x:0:0:Big Brother:/root:/bin/bash
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
tux1:x:500:500:Tux the Penguin(1):/home/tux1:/bin/bash
tux2:x:501:501:Tux the Penguin(2):/home/tux2:/bin/bash

$ cut -f1,3,6,7 -d: /etc/passwd
root:0:/root:/bin/bash
shutdown:6:/sbin:/sbin/shutdown
tux1:500:/home/tux1:/bin/bash
tux2:501:/home/tux2:/bin/bash

Presenter Notes

cut是一个选取命令,就是将一段数据经过分析,取出我们想要的。一般来说,选取信息通常是针对“行”来进行分析的,并不是整篇信息分析的。

cut

$ ps
  PID TTY STAT TIME COMMAND
 9374 p0  S    0:00 -bash
14460 p0  R    0:00 ps

$ ps | cut -c-5,20-
  PID COMMAND
 9374 -bash
14471 ps

Presenter Notes

sort

The sort command sorts the lines in the file specified and writes the result to standard output.

sort -tdelimiter -kfield -options file

$ cat animals
dog.2
cat.4
penguin.10

$ sort animals
cat.4
dog.2
penguin.10

Presenter Notes

命令sort是用来排序的,我们用命令sort –n 参数n是以数字排列顺序,参数t是以什么为分隔符,参数k是选择第几列,通过命令sort –n –t: -k3 /etc/passwd来对/etc/目录下passwd文件已冒号为分界符第3列用数字排序。我们通过man sort可以查看更多的参数使用方法。

sort

$ sort -k1.2 animals
cat.4
penguin.10
dog.2
$ sort -t. -k2 animals
penguin.10
dog.2
cat.4
$ sort -t. -n -k2 animals
dog.2
cat.4
penguin.10

Options

-d  Sorts in dictionary order (only letters, digits, spaces)
-r  Reverses the order of the specified sort
-n  Sorts numeric fields in arithmetic value

Presenter Notes

head

The head command can be used to view the first few lines of a file or files. The command syntax is:

$ head  [-lines] file(s)

$ head -5 myfile
$ ls -l | head -12

The head command shows you the first ten lines of a file by default.

Presenter Notes

命令head默认是查看文件的前10行,通过命令head /etc/passwd查看文件的前10行,也可以增加参数-n 15查看前15行,参数n在这里是指定行数的意思。我们可以通过命令man head查看更多的参数使用方法。

tail

The tail command displays the last few lines of a file or files. The command syntax is:

$ tail [{-lines|-n lines|-n +lines|-f}] file(s)
$ tail -20 file
$ tail -n +20 file
$ tail -f file

-n Indicates the number of lines to read beginning from the end of the file. This displays the last n lines of the file.
+n Indicates the number of the line where you want to start displaying the lines.
The -f option causes the tail command to continue to read additional lines from the input file as they become available. For example: tail -f logfile

Question: How to show middle 10 lines of a 30-line file?

Presenter Notes

命令tail默认是查看文件的最后10行,通过命令tail /etc/passwd查看文件的最后10行,也可以增加参数-n 5查看最后5行,参数n在这里是指定行数的意思。我们可以通过man tail查看更多的参数使用方法。命令tail经常被用来查看最新的日志信息。可以用tailf命令或者tail –f来实时查看日志信息。

join and paste

The join and paste commands combine files.

$ cat one
a apple another
b bee beast
c cat

$ cat two
a ape
b broken
d dog

$ join one two
a apple another ape
b bee beast broken

$ paste one two
a apple another a ape
b bee beast     b broken
c cat   d dog

Presenter Notes

join看字面上的意义 (加入/参加) 就可以知道,他是在处理两个文件之间的数据, 而且,主要是在处理『两个文件当中,有 "相同数据" 一行,才将他加在一起』的意思 paste命令用于将多个文件按照列队列进行合并。

Unit summary

Having completed this unit, you should be able to:

  • Use the find and locate commands to search for files
  • Use the type, which, and whereis commands to find commands
  • Use the file command to find out the content of a file
  • Use the grep command to search text files for patterns
  • Use the cut command to list specific columns of a file
  • Use the sort command to sort the contents of a file
  • Use the head and tail commands to view specific lines in a file
  • Use the join and paste commands to combine files

Presenter Notes

References

  • Unit 10: Linux utilities, Linux Basics and Installation , ERC 7.2, IBM

Presenter Notes