Chapter 2. File Management
- Unit 2.1 Files and Directories
- Unit 2.2 File permissions
- Unit 2.3 File Systems
- Unit 2.4 Storage
Presenter Notes
文件管理
计算机要处理的数据,如果保存在内存中,随着机器断电,数据就会消失。若想长期保存,数据则必须存放在外部存储设备之上,如硬盘、光盘或者U盘等。所以我们将对数据在外部存储设备上存取的管理叫做存储管理(storage management)。为了使用数据方便,我们需要将相关的数据组成文件,通过文件名来“按名存取”文件,所以存储管理又称为文件管理。
补充说明一点,我们所说的“存储管理”,对应英文为“storage management”,指的是对外部存储器上的数据进行管理。而“memory management”,对应的是“内存管理”。
文件管理,或者叫存储管理,涉及到的概念包括对文件、目录、文件系统以及对文件系统所在的存储设备的管理,本章将按照这个顺序,从小到大,依次介绍这些管理方法。
Unit 2.1
Files and Directories
Presenter Notes
What is a file?
Presenter Notes
知识点:什么是文件?
Objectives
- What is a file?
- How many types of file?
- What are the parts of a file?
What is a file?
A collection of data.
A stream of characters or a "byte stream".
No structure is imposed on a file by the operating system。
In the unit, we take a typical Unix file as the example.
Presenter Notes
什么是文件
在Linux中,一个文件是一组数据的集合,可以被理解为一个字符流或字节流,Linux并不为文件设计内部结构,而把文件都理解为无结构的数据集合。
File Types
Presenter Notes
在Linux中,“文件”这个概念包含了三类概念,即普通文件、目录和特殊文件。普通文件即我们平时说的狭义的“文件”,往往是一组特定数据的集合,像一张图片,一首MP3,一部电影或一个文档等。目录就是Windows中的“文件夹”,作用是存放其他目录或文件,Linux将所有的文件都存放在各个目录下,组成一棵目录树。在Linux中,所有的设备都有文件与之对应,这些文件就属于特殊文件。以上三类在Linux中均称为文件,即广义上的“文件”。
Filenames
- Files are accessed by name.
- Should be descriptive of the content
- Should use only alphanumeric characters:
UPPERCASE, lowercase, number, @, _ - Special characters may have special meanings:
(blank) + - * ? > < / ; &! | \ ` ' " [ ] ( ) { } - In UNIX, filenames are case sensitive. In windows, on the contrary.
- No filename extension. File type is identified by the content.
- In UNIX, files are hidden if the first character is a. (period)
- Most filesystem support maximum of 255 characters
Presenter Notes
Linux中文件是通过文件名来访问的。文件名应该反映文件的内容,并且要满足下列的条件:
以大写字母、小写字母、数字和一些特殊符号组成。
不可以包含 * ? < > ( ) [ ] { } / | ' " ` & ; ! 等符号
+-可以出现在文件名中,但是不能作为文件名中的第一个字母。
文件名中一般不应该包含空格。
更重要的是,Linux的文件名是大小写敏感的,即ABC和abc是不同的二个文件,而在DOS/Windows中,二个文件名如果只有大小写上的差异,就被认为是同一个文件。另外,Linux中文件名没有扩展名的概念,例如Linux中的可执行文件不一定要叫做*.exe。还有Linux中文件没有“隐藏”属性,在Linux中,如果一个文件的文件名的第一个字母是“.”,那么这个文件就是隐藏文件,在一般命令中是不查看隐藏文件的。
File structure
- File has logical structure.
- Operating system does not care the structure, applications do.
(a) Byte sequence. (b) Record sequence. (c) Tree.
Presenter Notes
文件可以有多种构造方式,图中列出了常用三种方式。分别是:
- 无结构的字节序列,操作系统实际上不关心文件内容是什么,其所见到的就是字节,其任何含义都在用户程序中解释
- 在第一种结构上的改进。在这个结构中文件是具有固定长度记录的序列,每个记录都有其内部结构。
- 文件结构中由一颗记录树构成,每个记录并不具有同样的长度,而记录的固定位置上有一个“键”字段。这棵树按照“键”字段进行排序,从而可以对特定“键”进行快速查找
File Attributes
Presenter Notes
文件都有文件名和数据。另外,所有的操作系统还会保存其他与文件相关的信息,如文件创建的日期和时间、文件大小等。这些附加信息称为文件属性,有些人称之为元数据。文件的属性在不同系统中差异很大。
很多操作系统支持多种文件类型。如Unix和Windows中都有普通文件和目录
- 简单的可执行二进制文件。尽管这个文件只是一个字节序列,但只有文件的格式正确时,操作系统才会执行这个文件
- 存档文件。它由已偏移但是没有连接的库过程(模块)集合而成,每个文件以模块头开始,其中记录了名称、创建日期、所有者、保护码和文件大小。
Inode
- Unix stores file attributes in inode.
- Each file has a unique i-node, and i-node refers to a unique file.
- UNIX filesystem put i-nodes together.
- Each inode has a unique number, i-number, in the scope of filesystem.
- Contains information about a file: owner, group, type, size, permissions, ctime, atime, mtime, ...
Presenter Notes
inode是指在许多类Unix文件系统中的一种数据结构,每个inode保存了文件系统中的一个文件系统对象(包括文件、目录、设备文件、socket、管道, 等等)的元信息数据,但不包括数据内容或者文件名
Summary
- File is the data collection.
- Three types of file:
- ordinary (regular) file
- directory
- special file (device)
- Three parts of a file:
- filenames
- file attributes
- file contents
File contents: data blocks
Presenter Notes
知识点:文件的内容:数据块
Objectives
- How to allocate room for the data of a file?
- How to find the data of a file?
- How to modify a file?
Contiguous Allocation
(a) Contiguous allocation of disk space for 7 files.
(b) The state of the disk after files D and F have been removed.
Presenter Notes
连续分配方式 :指为一个文件分配一片连续的硬盘存储空间。 使用文件的目的是存储信息并方便以后的检索。对于存储和检索,不同系统提供了不同的操作。
Linked List Allocation
Storing a file as a linked list of disk blocks.
Presenter Notes
与连续分配方案不同,这一方法可以充分利用每个磁盘块,不会因为磁盘碎片(除了最后一块中的内部碎片)而浪费存储空间。
Multi-level Index Allocation
- Inode and data blocks
- Direct pointer points to the data block, for small file, quickly.
- Indirect, double indirect and triple indirect pointers point to a large amount of data blocks, for big file.
Presenter Notes
对于大文件,当分配的盘快号已装满一个索引块时,必须另分配索引块,各索引块通过指针连结起来。
Example: ext2
Ext2 inode
- Block size: 1KB
- Inode:128 bytes (8 per block of 1024 bytes)
- 15 pointers per inode. One pointer size: 4B
- Pointers to data blocks :
- The 1st-12th (direct) pointers: 12 * 1KB = 12KB
- Pointers to an indirect block, a double indirect block, and a triple indirect block
- The 13th (indirect) pointer: 1 (1KB/4B) 1KB = 256KB
- The 14th (double indirect) pointer: 1 (1KB/4B)^2 1KB = 64MB
- The 15th (triple indirect) pointer: 1 (1KB/4B)^3 1KB = 16GB
Summary
- Ways to allocate data :
- contiguous allocation
- linked list allocation
- multi-level index allocation
- Pros and cons :
- contiguous allocation: easy to random read, difficult to modify
- linked list allocation: difficult to random read, easy to modify
- multi-level index allocation: quick for small file, large for big file
Directory
Objectives
- Hierarchical Directory
- Directory contents
- Shared files
Hierarchical Directory Systems (1)
- A single-level directory system containing four files.
Presenter Notes
目录系统最简单的形式是在一个目录中包含所有的文件
Hierarchical Directory Systems (2)
- A hierarchical directory system.
Presenter Notes
用户可以创建任意数量的子目录,这种能力为用户组织其工作提供了强大的结构化工具。
Path Names
- A UNIX directory tree.
Presenter Notes
用户用树组织文件系统时,需要有某种方法指明文件名。
Directory Contents
Presenter Notes
文件系统通常提供目录用于记录文件,在很多系统中目录本身也是文件
Contents of directory
- The data of directory is the list of file names and inodes in that directory
- Multiple file names may point to the same inode
- That means a file may have multiple names
Presenter Notes
文件系统创建(格式化)时,就把存储区域分为两大连续的存储区域。一个用来保存文件系统对象的元信息数据,这是由inode组成的表,每个inode默认是256字节或者128字节。另一个用来保存“文件系统对象”的内容数据,划分为512字节的扇区,以及由8个扇区组成的4K字节的块。块是读写时的基本单位。
Shared Files
- File system containig a shared file
Presenter Notes
当几个用户在一个项目里工作时,他们常常需要共享文件。其结果是,如果一个共享文件同时出现在属于不同用户的不同目录下,工作起来就很方便
Summary
- Hierarchical directory tree
- Directory records the inode number and filename
- To share a file, use multi-filename for one inode
Working with directories
Objectives
- Browse in directories
- Where am I?
- Change current directory
- Creating/removing a directory
- List the contents of directories
Directory tree
- Unix directories are contained in one virtual, unified file system.
Presenter Notes
绝大多数UNIT操作系统都使用虚拟文件系统概念尝试将多种文件系统统一成一个有序的框架
Path names
- Full path names: Start from / (the root directory)
- Relative path names: Start from the present working directory
- Examples (working directory is /home/tux1):
- /home/tux1/doc/mon_report
- doc/mon_report
- ../tux3/pgms/suba
- ./test
- ~/test
Presenter Notes
常用的方法有两种。第一种是,每个文件都赋予一个绝对路径名,它由从根目录到文件的路径组成。另一种指定文件名的方法是使用相对路径名。它尝尝和工作目录一起使用。
Where am I?
The pwd command (Print Working Directory) can be used to find out what your current working directory is.
$ pwd
/home/tux1
Presenter Notes
pwd是print working directory的缩写,打印当前工作目录,就是显示当前所在目录的意思。 通常提示符中就显示有当前目录,但是它只显示目录名而不是全路径,比如当前目录是/usr/local/etc和当前目录是/etc,提示符中都显示etc,这个时候用pwd命令就可以看清楚到底是哪个路径
Change current directory
With the cd (Change Directory) command:
$ cd dir_name
Examples:
$ cd doc (relative)
$ cd /home/tux1/doc (full)
$ cd ~tux1/doc (home)
$ cd (Go to your home directory)
$ cd .. (Go one directory up)
$ cd - (Go to previous directory)
Presenter Notes
cd命令是linux的最基本命令之一,其它命令的操作都是建立在cd命令之上的。要学习linux基本命令,首先要掌握cd命令的使用方法技巧。 1、命令格式 cd [目录名] 2、命令功能 切换当前目录至dirName
Working with directories
Create directories.
With the mkdir (Make Directory) command:$ mkdir dir_name
Removing directories. (Must be empty directories) With the rmdir (Remove Directory) command:
$ rmdir dir_name
Working with multiple directories.
Create and remove multiple directories simultaneously with the -p flag.$ mkdir -p dir1/dir2/dir3
$ rmdir -p dir1/dir2/dir3
Presenter Notes
linux mkdir 命令用来创建指定的名称的目录,要求创建目录的用户在当前目录中具有写权限,并且指定的目录名不能是当前目录中已有的目录。 rmdir是一个与mkdir相对应的命令。mkdir是建立目录,而rmdir是删除命令。rm命令可以同时删除文件或目录
List the contents of directories
With the ls command:
$ ls [ dir/file ]
Important options:
-l long listing (more information)
-a lists all files (including hidden)
-t lists files sorted by change date
-R lists contents recursively
-i show inode number
eg.
$ ls -l
-rw-rw-r-- 1 tux1 penguins 512 Jan 1 11:10 docs
Presenter Notes
ls命令用于列出文件和目录。默认上,他会列出当前目录的内容。带上参数后,我们可以用ls做更多的事情
Summary
- Browse in directories
- Where am I? (pwd)
- Change current directory (cd)
- Creating/removing a directory (mkdir/rmdir)
- List the contents of directories (ls)
Working with files
Objectives
- Create a file
- Copying a file
- Moving a file
- Linking a file
- Removing a file
- List contents of a file
Create a file
The touch command creates an empty file or updates the modification time of an existing file.
$ touch newfile
$ touch oldfile
Presenter Notes
linux的touch命令不常用,一般在使用make的时候可能会用到,用来修改文件时间戳,或者新建一个不存在的文件。
Copying files
The cp command copies files.
$ cp source[s] [target]
Important usage:
1. If the target is a directory.
multi source files copied to a directory with their original names.
eg. cp file1 file2 /tmp
2. If the target is not a directory or not exist.
only one source file allowed, copied to the target name.
eg. cp file1 newfile
- cp can recursively copy directories with the -R flag.
- To prevent cp from overwriting existing files, use -i flag.
Presenter Notes
linux的cp该命令的功能是将给出的文件或目录拷贝到另一文件或目录中,同MSDOS下的copy命令一样,功能十分强大。
Moving or renaming files
With the mv command:
$ mv source[s] [target]
Important usage:
1. If the target is a directory.
multi source files moved to a directory with their original names.
eg. mv file1 file2 /tmp
2. If the target is not a directory or not exist.
only one source file allowed, renamed to the target name.
eg. mv file1 newfile
- mv is recursive by default. No need of the -R flag.
- To prevent mv from overwriting existing files, use -i flag.
Presenter Notes
mv命令来为文件或目录改名或将文件由一个目录移入另一个目录中
Linking Files
Hard Link
$ ln source_file target_file
- Allows files to have more than one name in the directory structure
- Both files reference the same i-node
- Cannot be used with directories, cannot span file systems
Symbolic (Soft) Link
$ ln -s source_file target_file
- Creates an indirect reference to a file (symbolic link)
- Name references the original file’s name and path
- Can be used with directories and span file systems
Presenter Notes
ln的功能是为某一个文件在另外一个位置建立一个同步的链接
Removing files
You can move files with the rm command.
$ rm file[s]
- use -f for force remove.
- use -r of -R flag for directories, meaning recursively.
- Recursively remove a directory even if it is NOT empty.
- VERY DANGEROUS to use -rf.
- Think
rm -rf / tmp
- Think
- In unsure, use -i.
- rm is unlink indeed. No erase of file unless last link removed.
Presenter Notes
rm是常用的命令,主要功能是删除一个目录中的一个或多个文件或者目录,它也可以将某个目录下的所有文件及子目录均删除。对于链接文件,只删除链接文件,原文件保留
Listing file contents
- With the cat (Concatenate) command.
- $ cat file[s]
- To show one page content at once, use more or less.
- od : octal display. use -x for hex display.
- strings : show the strings in a binary file.
Presenter Notes
cat命令是一个文本输出命令,通常用来查看某个文档的内容。
Summary
- Create a file (touch)
- Copying a file (cp)
- Moving a file (mv)
- Linking a file (ln)
- Removing a file (rm)
- List contents of a file (cat, more)
Unit summary
In this unit:
- What's a file.
- File types.
- Inode and data.
- Shared files: links.
- Commands for managing files and directories.
Presenter Notes
总结:
- 什么是文件
- 文件类型
- inode与数据
- 共享文件与连接
- 一些常用的命令
References
- Chapter 4: Filesystems, Modern Operating Systems . Forth Edition, Andrew S. Tanenbaum
- Unit 4: Working with Files and Directories, Linux Basicis and Installation , ERC 7.2, IBM