Linux I/O (二):A Process Of Reading Disk File
What’s IO?
IO(Input/Output) is a system of communication for information processing systems.
structure of information processing system, as follows:
In computing, IO is the communication between an information processing system , such as a computer, and the outside world, possibly a human or another information processing systems, such as keyboards and mouse are input-only devices while devices such as printers are output-only. A writable CD-ROM is both an input and an output device.
IO is a system of communication, structure of IO system in linux, as follows:
由上图可看出:设备驱动作为中间联络人,负责操作系统与设备之间信息交流,不同设备对应不同的设备驱动。
IO subsystem:[向下] 通过安装设备驱动,横向扩展可访问设备种类;[向上]通过封装统一访问入口,降低上层应用访问各设备的复杂度。
IO subsystem在linux操作系统中扮演承上启下的角色:“上承应用程序,下启设备驱动”。没了它,计算机就是一潭死水。
下面以“一次硬盘文件读取”为例,介绍操作系统与设备之间如何进行IO交流。
How to computer communicate with disk?
linux系统中一次硬盘文件的读取流程,如下图:
Steps, as follows:
- a process issues a read call which executes a system call
- system call code checks for correctness
- if it needs to perform I/O, it will issue a device driver call
- device driver allocates a buffer for read and schedules I/O
- controller performs DMA data transfer
- block the current process and schedule a ready process
- device generates an interrupt on completion
- interrupt handler stores any data and notifies completion
- move data from kernel buffer to user buffer
- wakeup blocked process (make it ready)
- user process continues when it is scheduled to run
一次硬盘文件读过程纵跨三界:“用户空间、内核空间、硬盘空间”,主要涉案人员:“用户进程、CPU、设备控制器”。我们按图索骥,探究linux IO在此过程中都做了哪些性能优化:
Optimization
① 启用非阻塞模式:解放用户进程
常见设备大致为两种类型:block devices 和 character devices,block devices接收、发送一组字符;character devices接收、发送单个字符。
disk驱动属于block devices,其提供给上层应用的接口(如: read、write)默认是阻塞模式。用户进程call system.read()后将一直阻塞。为了提高工作效率,linux增加”O_NONBLOCK“开关,开启非阻塞模式。这样,用户进程不必“busy-wait”,可以被解放出来干其他事情。
在非阻塞模式下,用户进程如何被通知读取的数据已经就绪? 传统方式是:CPU轮询查看文件状态,找出就绪的文件,其缺点是:占用CPU大部分时间,导致CPU的利用率降低(花费在执行进程的时间减少)。于是,linux引入中断机制(interrupt mechanism),解放CPU。
②
中断机制:解放CPU
引入中断机制之前:CPU发出读指令,将一直等待disk FD数据就绪,白白浪费高性能的CPU资源。
引入中断机制之后:CPU不必“busy-wait”,disk FD数据就绪时,发送中断请求给CPU。这样,高性能的CPU可以被解放出来干其他事情。
③
DMA:提速块数据读写
DMA(Direct Memory Access): DMA allows special purpose hardware to read or write main memory without involving the CPU. which greatly reduces CPU consumption and also eliminates redundant data copies between device and kernel buffers.
CPU把工作委托给DMA控制器,只在数据读取完毕后,由DMA控制器向CPU发送中断请求,告知CPU数据读取完毕。这样CPU被解脱出来。DMA工作原理图,如下:
DMA详细介绍,参见:Linux I/O (一) : Nio Is Real ‘Zero-Copy’?
④
IO bufferring:调和设备间速度不匹配
IO buffering aim to accommodate speed mismatch between the producer and consumer(设备延时对照,参照附录二), processes not must wait for I/O to complete before proceeding. IO buffering categories:
- single buffering: operating system assigns a buffer in main memory for an I/O request.
- double buffering: use two system buffers instead of one. a process can transfer data to or from one buffer while the operating system empties or fills the other buffer. (适用于video数据读取)
- circular buffering:more than two buffers are used,each individual buffer is one unit in a circular buffer. (针对爆发式网络请求)
从memory读1M数据耗时0.25ms,从disk读1M数据耗时30ms。disk的延时是memory的100倍,导致内核进程大部分时间都在等待。因此在memory中增加缓存区,缓存来自disk的数据,一并做处理。
Appendix
latency comparison numbers:
0.5 ns - CPU L1 dCACHE reference 1 ns - speed-of-light (a photon) travel a 1 ft (30.5cm) distance 5 ns - CPU L1 iCACHE Branch mispredict 7 ns - CPU L2 CACHE reference 100 ns - MUTEX lock/unlock 100 ns - Main DDR MEMORY reference 10,000 ns - Compress 1K bytes with Zippy PROCESS 20,000 ns - Send 2K bytes over 1 Gbps NETWORK 250,000 ns - Read 1 MB sequentially from MEMORY 500,000 ns - Round trip within a same DataCenter 10,000,000 ns - DISK seek 10,000,000 ns - Read 1 MB sequentially from NETWORK 30,000,000 ns - Read 1 MB sequentially from DISK 150,000,000 ns - Send a NETWORK packet CA -> Netherlands