文件I/O (File I/O)
3.1 Introduction
unbuffered I/O每一次调用read或write都深入到了unix的内核。unbuffered I/O函数不是ISO C的标准,但却是POSIX.1和Single Unix Specification 标准的一部分。
需要注意的是,当我们打开一个文件时,在多进程中,原子操作是很有必要的。函数dup、fcntl、sync、fsync和ioctl 这些设计多进程的函数将会进一步表述
3.2 File Description
文件描述符(0 ~ OPEN_MAX-1),起初时,每个进程的可以使用的文件最大文件数是20(即OPEN_MAX==20),随后大量的系统开始增加到了64.使用我的CentOS7,利用sysconf(_SC_OPEN_MAX)的返回值是4096(前缀_SC_是system configuration的首字母),说明可以打开4096个文件。
3.3 Open and openat Functions
#include<fcntl.h> int open(const char* path, int oflag/*, mode_t mode*/); int openat(int fd, const char* path, int oflag/*, mode_t mode*/); // Both return : file descriptor if OK, -1 on error
O_RDONLY | Open for reading only |
O_WRONLY | Open for writiing only |
O_RDWR | Open for reading and writing. |
O_EXEC | Open for execute only |
O_SEARCH | Open for search only(applies to directions) |
*其中大部分系统为了兼容以前的版本,定义O_RDONLY = 0, O_WRONLY= 1, ORDWR = 2;
O_APPEND | Append to the end of file on each write. |
Set the FD_CLOEXEC file descriptor flag. |
Create the file if it doesn’t exist.This option requires a third argument to the open function (a fourth argument to the openat function) — the mode, which specifies the access permission bits of the new file. (When we describe a file’s access permission bits in Section 4.5, we’ll see how to specify the mode and how it can be modified by the umask value of a process.)
Generate an error if path doesn’t refer to a directory. |
Generate an error if O_CREAT is also specified and the file already exists. This test for whether the file already exists and the creation of the file if it doesn’t exist is an atomic operation We describe atomic operations in more detail in Section 3.11.
If path refers to a terminal device, do not allocate the device as the controlling terminal for this process. We talk about controlling terminals in Section 9.6.
Generate an error if path refers to a symbolic link. We discuss symbolic links in Section 4.17
If path refers to a FIFO, a block special file, or a character special file, this option sets the nonblocking mode for both the opening of the file and subsequent I/O. We describe this mode in Section 14.2.
Have each write wait for physical I/O to complete, including I/O necessary to update file attributes modified as a result of the write. We use this option in Section 3.14.
If the file exists and if it is successfully opened for either write-only or read–write, truncate its length to 0.
When opening a terminal device that is not already open, set the nonstandard termios parameters to values that result in behavior that conforms to the Single UNIX Specification. We discuss the termios structure when we discuss terminal I/O in Chapter 18. |
下面的两个选线也是可选的,它们是Single UNIX Specification和POSIX.1中的同步输入输出选项中的的一部分:
Have each write wait for physical I/O to complete, but don’t wait for file attributes to be updated if they don’t affect the ability to read the data just written
Have each read operation on the file descriptor wait until any pending writes for the same portion of the file are complete.
*O_DSYNC 和 O_SYNC 标志位相似但却有一点微小的不同, O_DSYNC 被置位时,文件的属性不会被同步更新,只有需要更新时才更新。当O_SYNC被置位时,文件的属性是一直同步更新。比如当写入一个以O_DSYNC置位的方式打开的文件时,文件的时间不会被同步更新。相反,如果写入一个以O_SYNC置位的方式打开的文件时,在write return前,每次write这个文件,都会更新文件的时间,而不管我们是否覆盖原有字节的写入还是追加写入。
所谓TOCTTOU error, 为了正确理解意思,英文原文如下:
The basic idea behind TOCTTOU errors is that a program is vulnerable if it makes two file-based function calls where the second call depends on the results of the first call. Because the two calls are not atomic, the file can change between the two calls, thereby invalidating the results of the first call, leading to a program error. TOCTTOU errors in the file system namespace generally deal with attempts to subvert file system permissions by tricking a privileged program into either reducing permissions on a privileged file or modifying a privileged file to open up a security hole. Wei and Pu[2005] discuss TOCTTOU weaknesses in the UNIX file system interface.
在POSIX.1中,常量_POSIX_NO_TRUNC 决定了是否长文件名被截断或报错。我们可以使用fpathconf或者pathconf函数来获取NAME_MAX的值。
当_POSIX_NO_TRUNC 被置位时, 如果文件名超过了 NAME_MAX. errno is set to ENAMETOOLONG, and an error status is returned。
3.4 Creat Function
#include <fcntl.h> int creat(const char *path, mode_t mode); //Returns: file descriptor opened for write-only if OK, −1 on error // this function is equivalent to the follow one open(path, O_WRONLY | O_CREAT | O_TRUNC, mode);
creat函数是由于之前的open函数的标志位只有0,1,2即O_RDONLY、 O_WRONLY和ORDWR,那么当一个文件不存在时,就只能通过creat函数来创建一个函数,然后再close这个文件,然后载open文件。
3.5 close Function
#include <unistd.h> int close(int fd); //Returns: 0 if OK, −1 on error
close 函数,是用来关闭文件的。Closing a file also releases any record locks that the process may have on the file. We’ll discuss this point further in Section 14.3.
3.6 lseek Function
当打开一个文件时,此文件将绑定一个非负整数current file offset,用来确定当前读取或写入的位置。此offset随着文件读取和写入逐渐增大,当打开一个文件时,若O_APPEND没有置位,则offset为0.
#include <unistd.h> off_t lseek(int fd, off_t offset, int whence); //Returns: new file offset if OK, −1 on error
whence 参数,表示offset在何处进行偏移:
1.如果whence 是 SEEK_SET,则在文件的开始处,进行计算偏移。
2.如果whence 是SEEK_CUR, 则在当前的位置,进行计算偏移。
3.如果whence 是 SEEK_END, 则在文件的size字节处(我的理解也就是结尾处),进行计算偏移量,此时的偏移量可以是正数也可以是负数。
off_t currpos; currpos = lseek(fd, 0, SEEK_CUR);
可以使用上式来确定当前的offset。也可以用它来测试一个文件是否支持lseek函数。如果此文件描述符指向的是pipe、FIFO或者Socket,那么上式会返回-1,并且errno 会被设置成ESPIPE。
lseek的首字母l,表示long integer的意思。在offset_t数据类型引入之前,offset的的类型和lseek的返回值类型都是long integer。当long int 的数据类型被引入到C语言后,lseek也被引入到了UNIX的Version7中。(在 Version6中有seek和tell函数)
#include "apue.h" int main(void) { if (lseek(STDIN_FILENO, 0, SEEK_CUR) == -1) printf("cannot seek\n"); // the result here else printf("seek OK\n"); exit(0); }
由于对于普通文件来说,整体偏移量offset的值是个非负值,然而对一些设备文件来说,可能会使用负值(The /dev/kmem device on FreeBSD for the Intel x86 processor supports negative offsets. )。所以在使用普通文件时,要时刻提防offset变为负数的情况,我们一般推荐通过判断lseek的返回值是否是-1,来判断是否出现错误,而不是看offset的值是否是负数的方式。
#include"apue.h" #include<stdio.h> #include<fcntl.h> #include<sys/types.h> #include<sys/stat.h> int main(void){ int fd; char buf1[11] = "abcdefghij\0"; char buf2[11] = "ABCDEFGHIJ\0"; if((fd = open("file.hole", O_CREAT|O_RDWR, S_IRWXU)) < 0){ err_sys("Creat error"); } if(write(fd, buf1, 10) != 10){ err_sys("buf1 write error"); } // offset now is 10; if(lseek(fd,16384,SEEK_SET) == -1){ err_sys("lseek error"); }// offset now is 16384; if(write(fd, buf2, 10) < 0){ err_sys("buf2 write error"); }// offset new is 16394; exit(0); }
其中是file.nohole是创建的同等大小的没有hole的文件,我们发现两者的disk blocks数是不一样的,一个是8,一个是20.关于file holed的问题会在Section4.12中详细介绍。
off_t类型的字节数:不同操作系统的off_t的字节数是不一样的。Most platforms today provide two sets of interfaces to manipulate file offsets: one set that uses 32-bit file offsets and another set that uses 64-bit file offsets.
Note that even though you might enable 64-bit file offsets, your ability to create a file larger than 2 GB (231-1 bytes) depends on the underlying file system type.
3.7 read Function
#include <unistd.h> ssize_t read(int fd, void *buf, size_t nbytes); //Returns: number of bytes read, 0 if end of file, −1 on error
1.在读一个普通文件遇到end of file时。
2.当读一个Terminal Device(终端设备)时,通常一次最多只能读一行,在Chapter18详细介绍如何改变不一次读一行。
5.当从record-oriented device中读取内容, Some record-oriented devices,such as magnetic tape, can return up to a single record at a time.
6.当被一个信号,或部分已读的数据打断时。discuss this further in Section 10.5.
3.8 write Function
#include <unistd.h> ssize_t write(int fd, const void *buf, size_t nbytes); //Returns: number of bytes written if OK, −1 on error
The return value is usually equal to the nbytes argument; otherwise, an error has occurred. A common cause for a write error is either filling up a disk or exceeding the file size limit for a given process (Section 7.11 and Exercise 10.11).
3.9 I/O Efficiency
#include "apue.h" #define BUFFSIZE 4096 int main(void) { int n; char buf[BUFFSIZE]; while ((n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0) if (write(STDOUT_FILENO, buf, n) != n) err_sys("write error"); if (n < 0) err_sys("read error"); exit(0); }
当BUFFERSIZE的大小为32时,再增加BUFFERSIZE,对总的clock 时间来说,几乎没哟影响。这是因为Most file systems support some kind of read-ahead to improve performance。When sequential reads are detected, the system tries to read in more data than an application requests, assuming that the application will read it shortly. 当BUFFERSIZE的大小为4096时,再增加BUFFERSIZE,对系统时间的减少效应已经很小,这是因为本次测试的文件系统是块大小为4096字节的linux ext4文件系统。
3.10 File Sharing
以上图中的fd flags即file descriptor flags,包含close-on-exec标志位。
File status flags 的内容如下表如下表所示,主要是对文件的一些操作权限、操作方式、以及文件的性质。
3.11 Atomic Operations
if (lseek(fd, 0L, 2) < 0) /* position to EOF */ err_sys("lseek error"); if (write(fd, buf, 100) != 100) /* and write */ err_sys("write error");
pread and pwrite Functions
Single UNIX Specification 含有两个函数来允许应用程序自动seek和perform I/O, 那就是pread和pwrite。
#include <unistd.h> ssize_t pread(int fd, void *buf, size_t nbytes, off_t offset); //Returns: number of bytes read, 0 if end of file, −1 on error ssize_t pwrite(int fd, const void *buf, size_t nbytes, off_t offset); //Returns: number of bytes written if OK, −1 on error
Creating a File
if ((fd = open(path, O_WRONLY)) < 0) { if (errno == ENOENT) { if ((fd = creat(path, mode)) < 0) err_sys("creat error"); } else { err_sys("open error"); } }
3.12 dup and dup2 Functions
#include <unistd.h> int dup(int fd); int dup2(int fd, int fd2); //Both return: new file descriptor if OK, −1 on error
dup和dup2函数可以返回一个新的文件描述符,dup返回的是系统中可用的文件表述符中的最小的那个最为此文件的描述符。dup2则可以指定新的文件描述符为fd2,如果fd2之前已经被打开,那么会先关闭fd2;如果fd2等于fd,那么返回fd2,并且不关闭此文件描述符,否则,FD_CLOEXEC file descriptor flag is cleared for fd2目的是为了当进程调用exec是,fd2仍然保持open状态。
每一个文件描述符都有自己的文件表述符标志位(set of file descriptor flags). the close-on-exec file descriptor for new descriptor is always cleared by dup functions, except the dup2 function while fd2 equivalent to fd.
另一种复制文件描述符的方法是fcntl 函数。
dup(fd) is equivalent to fcntl(fd, F_DUPPFD, 0).
dup2(fd, fd2); is equivalent to close(fd2); fcntl(fd, F_DUPPFD,fd2);不过dup2并不完全与close后加fcntl相等。
3.13 sync, fsync, and fdatasync Function
通常UNIX系统内核有一个buffer cache或者 page cache,大部分的I/O操作都会通过这些缓存进行。
When we write data to a file, the data is normally copied by the kernel into one of its buffers and queued for writing to disk at
some later time. This is called delayed write.
The kernel eventually writes all the delayed-write blocks to disk, normally when it needs to reuse the buffer for some other disk block. To ensure consistency of the file system on disk with the contents of the buffer cache, the sync, fsync, and fdatasync functions are provided.
#include <unistd.h> int fsync(int fd); int fdatasync(int fd); //Returns: 0 if OK, −1 on error void sync(void);
sync 函数把所有的被修改的block buffers放入写入硬盘队列,并return,sync并不等待写入硬盘完成才return。
The function sync is normally called periodically (usually every 30 seconds) from a system daemon, often called update .
fsync 函数仅仅涉及一个文件,就是文件描述符指定的那个。它等待文件写入硬盘完成才return. fsync function is used when
an application, such as a database, needs to be sure that the modified blocks have been written to the disk.
The fdatasync function is similar to fsync, but it affects only the data portions of a file. With fsync, the file’s attributes are also updated synchronously.
3.14 fcntl Function
The fcntl function can change the properties of a file that is already open.
#include <fcntl.h> int fcntl(int fd, int cmd, ... /* int arg */ ); //Returns: depends on cmd if OK (see following), −1 on error
The fcntl function is used for five different purposes.
1. Duplicate an existing descriptor (cmd = F_DUPFD or F_DUPFD_CLOEXEC)
2. Get/set file descriptor flags (cmd = F_GETFD or F_SETFD)
3. Get/set file status flags (cmd = F_GETFL or F_SETFL)
4. Get/set asynchronous I/O ownership (cmd = F_GETOWN or F_SETOWN)
5. Get/set record locks (cmd = F_GETLK, F_SETLK, or F_SETLKW)
3.15 ioctl Function
#include <unistd.h> /* System V */ #include <sys/ioctl.h> /* BSD and Linux */ int ioctl(int fd, int request, ...); //Returns: −1 on error, something else if OK
具体使用可通过man 2 ioctl 查看。
3.16 /dev/fd
fd = open("/dev/fd/0",mode) 等价于 fd = dup(0)
