Unit 2.4 Storage

Presenter Notes

本节幻灯片

Hard disks

Presenter Notes

Objectives

  • Components of storage
  • The hard disk types
  • The disk inside
  • Disk Arm Scheduling Algorithms

Presenter Notes

Components of storage

  • Files
  • Directories
  • File systems
  • Physical storage & Logical storage
  • Logical Volume Manager (LVM)

Presenter Notes

linux系统中所有的硬件设备都是通过文件的方式来表现和使用的,设备文件又分为字符设备文件和块设备文件

  • 字符设备文件的存取是以字符流的方式来进行的,如打印机,终端(TTY)
  • 块设备文件是以数据块的方式来存取的,最常见的设备就是磁盘。

Hard disk types

  • SATA :
    • SATA (Serial Advanced Technology Attachment)
    • 6GB/s
  • SAS
    • SAS (Serial Attached SCSI)
    • SCSI (Small Computer System Interface)
    • 12GB/s
  • 5400, 7200, 10000, 15000 RPM
  • Cache: 8 ~ 256 MB
  • SSD (Solid State Disk)

Presenter Notes

现在常见的磁盘类型有IDE并口硬盘、STAT串口硬盘以及SCSI硬盘,不同类型的硬盘在linux下对应的设备文件名称不尽相同,linux下磁盘设备常用的表示方案有两种:

  • 主设备号+次设备号+磁盘分区编号
  • 对于IDE硬盘:hd[a-z]x
  • 对于SCSI硬盘:sd[a-z]x
  • 主设备号+[0-n],y
  • 对于IDE硬盘:(hd[0-n],y)
  • 对于SCSI硬盘:(sd[0-n],y)

Disk inside

  • An illustration of cylinder skew.
  • CHS: Cylinder, Head, Sector
  • CHS tuples can be mapped onto LBA (Logical Block Addressing) addresses

Presenter Notes

硬盘的物理概念:硬盘其实由许多的圆形硬盘片组成,按照硬盘片能容纳的数据量,分为单盘(一块硬盘里面只有一个硬盘片)或者多盘(一块硬盘里含有多个硬盘片)的硬盘。

Disk formatting

  • (a) No interleaving.
  • (b) Single interleaving.
  • (c) Double interleaving.

Presenter Notes

硬盘里有磁头(head)在硬盘片上读写,磁头固定在机械手臂上,机械手臂上有多个磁头,可以进行读取。当磁头固定不动时(假设机械手臂不动),硬盘片转一圈所画出来的圆就是磁道。一块硬盘可能有多个硬盘片,所有硬盘片上相同半径的那个磁道就组成了柱面。

Disk formatting

  • (a) Physical geometry of a disk with two zones.
  • (b) A possible virtual geometry for this disk.

Presenter Notes

两个硬盘片上的同一磁道就是一个柱面。这个柱面也是分区时最小的单位;由圆心想外画直线,可以将磁道再细分为扇区,扇区就是硬盘片上最小的存储物理量。通常一个扇区的大小为512字节。这些就是硬盘的基本组成。

Disk Arm Scheduling Algorithms

Read/write time factors:
- Seek time (the time to move the arm to the proper cylinder).
- Rotational delay (the time for the proper sector to rotate under the head).
- Actual data transfer time.

Algorithms:
- First Come-First Serve (FCFS)
- Shortest Seek Time First (SSTF)
- Elevator (SCAN)
- Circular SCAN (C-SCAN)

Presenter Notes

定位到所要的磁盘位置通常需要以下三个参数:

  • 寻道时间:定位到柱面的时间
  • 旋转延迟:定位到扇区的时间
  • 传输时间:读写数据的时间

其中寻道时间占据了主要地位,因此也就有了磁盘臂调度算法。磁盘IO任务是以柱面为队列组织的,调度其实就是决定执行哪个柱面的任务。

Shortest Seek Time First (SSTF)

Presenter Notes

Elevator (SCAN)

Presenter Notes

Summary

  • Components of storage
    • Files and directories
    • Filesystems
    • Block devices, such as disks, partitions, logical volumns.
  • The hard disk types
    • SATA vs SAS
  • The disk inside
    • CHS and LBA
  • Disk Arm Scheduling Algorithms
    • FCFS
    • SSTF
    • SCAN
    • CSCAN

Presenter Notes

Disk Partitions

Presenter Notes

Objectives

  • Disk partition table
  • Disk partition types
  • Name the disk partitions

Presenter Notes

Partition

  • Hard disks can be partitioned. MBR-based Partitioning :

    • First sector: Master Boot Record
    • Maximum of four primary partitions
    • One primary partition may be an extended partition
    • An extended partition can hold an unlimited amount of logical partitions

Presenter Notes

磁盘分区意即指定分割槽的启始与结束磁柱,相当于告诉操作系统“磁盘在此分割槽可以存取的区域是由 A 磁柱到 B 磁柱之间的区块”, 如此一来操作系统就能够知道他可以在所指定的区块内进行文件数据的读/写/搜寻等动作了。

Partition Table

  • MBR – Master Boot Record

    • Work with BIOS
    • 32bit, max partition size: 2TB
  • GPT – GUID Partition Table

    • GUID: Globally Unique Identifier
    • Support unlimited partitions
    • 64bit, max partition max size: 18EB
    • Protective MBR
    • Need OS support and mainboard support (EFI/UEFI)

Presenter Notes

MRB在硬盘的第0轨上的,这是计算机启动后要去使用硬盘时必须首先读取的第一个区域,一般为512Byte,因此仅能提供最多4个分区的记忆,这就是主分区P(primary)与扩展分区E(Extended)最多只能有4个的原因,如果超过4个分区就要使用3P+E的方式:3个主分区+一个扩展分区,再由一个扩展分区划分成若干个逻辑分区使用。

Summary

  • Disk partition table
    • MBR vs GPT
  • Name the disk partitions
    • Primary partitions
    • Extented partitions
    • Logical partitions
  • Use fdisk to display or modify the disk partition table

Presenter Notes

Logical Volumn Manager

Presenter Notes

Objectives

  • Benefits of the LVM
  • Component of the LVM

Presenter Notes

Traditional disk storage

PROBLEMS:

  • Fixed partitions
  • Expanding size of the partition
  • Limitation on size of a file system and a file
  • Contiguous data requirement
  • Time and effort required in planning ahead

Presenter Notes

Benefits of the LVM

  • Logical volumes solve noncontiguous space problems
  • Logical volumes can span disks
  • Logical volume sizes can be dynamically increased
  • Logical volumes can be mirrored
  • Physical volumes are easily added to the system
  • Logical volumes can be relocated
  • Volume group and logical volume statistics can be collected

These tasks can be performed dynamically!

Logical volume management solves the disadvantages of traditional disk storage.

Presenter Notes

LVM 是一种可用在 Linux 内核的逻辑分卷管理器;可用于管理磁盘驱动器或其他类似的大容量存储设备。LVM 可以实现存储空间的抽象化并在上面建立虚拟分区(virtual partitions),可以更简便地扩大和缩小分区,可以增删分区时无需担心某个硬盘上没有足够的连续空间。

Logical volume management

  • One or more physical volumes (hard disks, partitions) are assigned to a volume group (VG)
  • All physical volumes (PV) are split into physical extents (PE) of identical size (default 4 MB)
  • Logical extents (LE) can be combined into logical volumes (LV). LE and PE share the same size. LEs are stored in PEs in a VG.
  • LVs can be used like any block device
    • An LV can span multiple disks
    • To increase the size of an LV, add PEs
    • To increase the size of a VG, add PVs
  • In some OS, like AIX:
    • physical extent (PE) is called physical patition (PP)
    • logical extent (LE) is called logical patition (LP)

Presenter Notes

  • 物理卷Physical volume (PV):可以在上面建立卷组的媒介,可以是硬盘分区,也可以是硬盘本身或者回环文件(loopback file),物理卷包括一个特殊的header,其余部分被切割为一块块物理区域(physical extents)
  • 卷组Volume group (VG):将一组物理卷收集为一个管理单元
  • 逻辑卷Logical volume (LV):虚拟分区,由物理区域(physical extents)组成
  • 物理区域Physical extent (PE):硬盘可供指派给逻辑卷的最小单位(通常为4MB)

Logical volume management

Presenter Notes

每一个物理卷PV被划分为称为PE(Physical Extents)的基本单元,具有唯一编号的PE是可以被LVM寻址的最小单元。PE的大小是可配置的,默认为4MB。所以物理卷(PV)由大小等同的基本单元PE组成。

Logical volume management

Presenter Notes

LVM 无需重新启动服务,就可以将服务中用到的逻辑卷(LV)在线(online)/ 动态(live)迁移至别的硬盘上。允许创建快照,可以保存文件系统的备份,同时使服务的下线时间(downtime)降低到最小。

Useful commands for LVM

  • Physical Volumn

    • pvs, pvcreate, pvremove ...
  • Volumn group

    • vgs, vgcreate, vgextend, vgreduce ...
  • Logical Volumn

    • lvs, lvcreate, lvextend, lvreduce ...

Presenter Notes

Summary

  • Logical Volumn Manager
    • Volumn Group
    • Physical Volumn
    • Logical Volumn
    • Physical Extention
    • Logical Extention

Presenter Notes

RAID

Presenter Notes

Objectives

  • Redundant Array of Independent Disks
  • RAID levels

Presenter Notes

Redundant Array of Independent Disks

  • Typical PC hard disks are:

    • Slower
    • Less reliable
    • Smaller
    • But less expensive
  • RAID uses multiple hard disks in an array to create a logical device that is:

    • Faster
    • More reliable
    • Or larger
    • And still relatively inexpensive

Presenter Notes

独立硬盘冗余阵列,简称磁盘阵列。其基本思想就是把多个相对便宜的硬盘组合起来,成为一个硬盘阵列组,使性能达到甚至超过一个价格昂贵、容量巨大的硬盘。RAID比单颗硬盘有以下一个或多个方面的好处:增强数据集成度,增强容错功能,增加处理量或容量。另外,磁盘阵列对于电脑来说,看起来就像一个单独的硬盘或逻辑存储单元。

RAID 0,1

Presenter Notes

  • RAID 0 将两个以上的磁盘并联起来,成为一个大容量的磁盘。在存放数据时,分段后分散存储在这些磁盘中,因为读写时都可以并行处理,所以在所有的级别中,RAID 0的速度是最快的
  • RAID 1 是两组以上的N个磁盘相互作镜像,在一些多线程操作系统中能有很好的读取速度,理论上读取速度等于硬盘数量的倍数,与RAID 0相同。另外写入速度有微小的降低。只要一个磁盘正常即可维持运作,可靠性最高。

RAID 10,01

Presenter Notes

  • RAID 1+0 是先镜射再分区数据,再将所有硬盘分为两组,视为是RAID 0的最低组合,然后将这两组各自视为RAID 1运作。
  • RAID 0+1 则是跟RAID 10的程序相反,是先分区再将数据镜射到两组硬盘。它将所有的硬盘分为两组,变成RAID 1的最低组合,而将两组硬盘各自视为RAID 0运作。

RAID 2,3,4

  • RAID levels 0 through 5. Backup and parity drives are shown shaded.

Presenter Notes

RAID 5

Presenter Notes

RAID Level 5是一种储存性能、数据安全和存储成本兼顾的存储解决方案。它使用的是Disk Striping(硬盘分区)技术。RAID 5至少需要三块硬盘,RAID 5不是对存储的数据进行备份,而是把数据和相对应的奇偶校验信息存储到组成RAID5的各个磁盘上,并且奇偶校验信息和相对应的数据分别存储于不同的磁盘上。当RAID5的一个磁盘数据发生损坏后,可以利用剩下的数据和相应的奇偶校验信息去恢复被损坏的数据。

Summary

  • Redundant Array of Independent Disks
  • RAID levels
    • RAID 0
    • RAID 1
    • RAID 10 / 01
    • RAID 5

Presenter Notes

References

  • Chapter 5: Input/Output, Modern Operating Systems . Forth Edition, Andrew S. Tanenbaum
  • Unit 9: Disk Management, RAID, and LVM, Linux System Administration, ERC 7.2, IBM

Presenter Notes