背景介绍
最近看了linux系统编程(linux system programming)一书,结合深入理解linux内核(understanding the linux kernel)一书,深入理解了linux关闭文件还有删除文件的整个过程,并且本人第一次学着查看源码来仔细理解过程,这个过程中感觉自己也学习了很多,下次再要看源码的时候应该会轻松些(下次估计会学着用vim+ctags了),万事开头难呀,O(∩_∩)O~。
书上的内容加上一些自己的理解
关闭文件
当程序完成对某个文件的操作之后,可以通过系统调用close()来取消文件描述符到对应文件的映射,
调用后,先前给定的文件描述符fd不再有效,内核可以随时重用它,当后续有open()或者create()调用时,重新把它作为返回值。close()成功返回0,错误返回-1。
值得一提的是,关闭文件并非意味着该文件的数据已经写到磁盘了。如果应用希望在关闭文件之前数据已经写入磁盘,它需要调用同步。(sync()是同步的一种方法)
关闭文件虽然操作简单,但是也会带来一些影响。当关闭指向某个文件的最后一个文件描述符时,内核中表示该文件的数据结构也就释放了。如果释放了数据结构,会清除和文件相关的索引节点的内存拷贝。如果已经没有内存和索引节点关联,该索引节点也会从内存中清除(出于性能考虑,也可能会保存在内核中,这就是linux内存管理的概念了)。如果文件已经从磁盘上解除链接,但是解除之前还一直打开,在文件关闭并且索引节点从内存中删除后,该文件才会真正从物理磁盘上删除,因此调用close()可能会使一个已解除链接的文件最终从磁盘上删除。
其他相关的知识
- 在对文件进行读写操作之前,首先要打开文件,内核会为每个进程维护一个打开文件的列表,这个列表是由一些非负整数进行索引,这些非负整数称为文件描述符
- 文件虽然通过文件名访问,但文件本身其实并没有和文件名直接关联。与文件关联的是inode,索引节点,inode是文件系统为该文件分配的唯一整数值。(但整个系统中不一定唯一)。索引节点保存元数据,如各种时间戳,类型,长度,文件数据的位置,但是不包含文件名!索引节点就是unix文件在磁盘上的实际物理对象,也是linux内核中通过数据结构表示的实体
- 目录用于提供访问文件需要的名称,目录是可读名称到索引编号之间的映射,名称和索引节点之间的配对成为链接(link),从概念上讲可以把目录看成普通文件,区别在于它包含文件名称到索引节点的映射。内核直接通过该映射报文件名解析为索引节点。
深入理解文件从物理磁盘上删除的过程
下面我们从内核角度来说一下这个过程。(下面的代码是我从http://www.kernel.org/下载的4.9.1版本的内核源码)
首先我们知道每个文件都有inode,我们通过inode来访问文件,但是要删除文件需要从superblock中删除这个inode,
super_block struct
如下(简单列出来几行):
这个super_block
在 include/linux/fs.h
struct super_block {
struct list_head s_inodes; /* all inodes */
spinlock_t s_inode_wblist_lock;
struct list_head s_inodes_wb; /* writeback inodes */
};
这个list_head
在 include/linux/types.h
struct list_head {
struct list_head *next, *prev;
};
可以看出s_inodes
是一个类型为list_head
结构体的双向链表保存了所有的inodes,实际删除文件就是删除这个inode对应的链表节点。这个删除方法就在include/linux/list.h
这里面,这里需要说的是list_head
本身是一个这么简单的双向链表,怎么保存inode的其他信息。
list_head双向链表介绍
本文也是有很多参考这两篇文章,大家可以先看这两篇或者至少cnblog的这一篇再来看下面的内容
- http://www.cnblogs.com/bastard/archive/2012/10/19/2731107.html
- http://www.ibm.com/developerworks/cn/linux/kernel/l-chain/
我要说的是我都是去我下载内核文件找的代码,然后复制到这个blog里面的。
其实像super_block
这种结构体中的list_head struct
是通过list_entry
这个宏定义函数来通过这个双向链表找到inode的实际地址的,说的太无力了,上代码
这个list_entry
在include/linux/list.h
#define list_entry(ptr, type, member) \
container_of(ptr, type, member)
这个container_of
在 include/linux/kernel.h
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
这个offsetof
在 include/linux/stddef.h
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
这里我们就以super_block
为例来说,super_block
保存了inodes的链表,下面我们来说如何通过这个链表头来访问下一个inode,假设我们已经访问到了super_block
,super_block_addr
为指向这个super_block
的指针,
首先我说下,写到这的时候,我发现我不知道superblock
指针指过来的list_head struct
在inode struct
里面是具体哪个?下面我们来分析下,我看inode结构体就下面五个list_head struct
,
struct list_head i_io_list; /* backing dev IO list */
struct list_head i_lru; /* inode LRU list */
struct list_head i_sb_list;
struct list_head i_wb_list; /* backing dev writeback list */
struct list_head i_devices;
从名字上来分析应该superblock
的指针应该指向i_sb_list
这个(i_sb意思inode superblock嘛)这个不太确定哈,本文就暂定这个是吧(深入理解Linux内核中对这个i_sb_list
的解释是用于超级块的索引节点链表的指针,可以肯定这个就是superblock
的指针),通过下面的代码就可以访问inode了。
inode *obj = super_block_addr;
inode *nextObj = (inode *)list_entry(obj->s_inodes->next,struct inode,i_sb_list);
这样就nextObj
就是下一个inode的指针了.
C语言中0
的用法
还有要说的是我看到这个#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
的时候不太理解0
在这里面的用法,仔细查了一下文档,0
这里也可以说成是一个空指针,NULL pointer
,可以变成各种对象,我还查了C11文档如下:
C11 6.3.2.3
A pointer to void may be converted to or from a pointer to any object type. A pointer to
any object type may be converted to a pointer to void and back again; the result shall
compare equal to the original pointer.
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function
inode
与super_block
源码(内核4.9.1)
inode
源码
/*
* Keep mostly read-only and often accessed (especially for
* the RCU path lookup and 'stat' data) fields at the beginning
* of the 'struct inode'
*/
struct inode {
umode_t i_mode;
unsigned short i_opflags;
kuid_t i_uid;
kgid_t i_gid;
unsigned int i_flags;
#ifdef CONFIG_FS_POSIX_ACL
struct posix_acl *i_acl;
struct posix_acl *i_default_acl;
#endif
const struct inode_operations *i_op;
struct super_block *i_sb;
struct address_space *i_mapping;
#ifdef CONFIG_SECURITY
void *i_security;
#endif
/* Stat data, not accessed from path walking */
unsigned long i_ino;
/*
* Filesystems may only read i_nlink directly. They shall use the
* following functions for modification:
*
* (set|clear|inc|drop)_nlink
* inode_(inc|dec)_link_count
*/
union {
const unsigned int i_nlink;
unsigned int __i_nlink;
};
dev_t i_rdev;
loff_t i_size;
struct timespec i_atime;
struct timespec i_mtime;
struct timespec i_ctime;
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
unsigned short i_bytes;
unsigned int i_blkbits;
blkcnt_t i_blocks;
#ifdef __NEED_I_SIZE_ORDERED
seqcount_t i_size_seqcount;
#endif
/* Misc */
unsigned long i_state;
struct rw_semaphore i_rwsem;
unsigned long dirtied_when; /* jiffies of first dirtying */
unsigned long dirtied_time_when;
struct hlist_node i_hash;
struct list_head i_io_list; /* backing dev IO list */
#ifdef CONFIG_CGROUP_WRITEBACK
struct bdi_writeback *i_wb; /* the associated cgroup wb */
/* foreign inode detection, see wbc_detach_inode() */
int i_wb_frn_winner;
u16 i_wb_frn_avg_time;
u16 i_wb_frn_history;
#endif
struct list_head i_lru; /* inode LRU list */
struct list_head i_sb_list;
struct list_head i_wb_list; /* backing dev writeback list */
union {
struct hlist_head i_dentry;
struct rcu_head i_rcu;
};
u64 i_version;
atomic_t i_count;
atomic_t i_dio_count;
atomic_t i_writecount;
#ifdef CONFIG_IMA
atomic_t i_readcount; /* struct files open RO */
#endif
const struct file_operations *i_fop; /* former ->i_op->default_file_ops */
struct file_lock_context *i_flctx;
struct address_space i_data;
struct list_head i_devices;
union {
struct pipe_inode_info *i_pipe;
struct block_device *i_bdev;
struct cdev *i_cdev;
char *i_link;
unsigned i_dir_seq;
};
__u32 i_generation;
#ifdef CONFIG_FSNOTIFY
__u32 i_fsnotify_mask; /* all events this inode cares about */
struct hlist_head i_fsnotify_marks;
#endif
#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
struct fscrypt_info *i_crypt_info;
#endif
void *i_private; /* fs or device private pointer */
};
super_block
源码
struct super_block {
struct list_head s_list; /* Keep this first */
dev_t s_dev; /* search index; _not_ kdev_t */
unsigned char s_blocksize_bits;
unsigned long s_blocksize;
loff_t s_maxbytes; /* Max file size */
struct file_system_type *s_type;
const struct super_operations *s_op;
const struct dquot_operations *dq_op;
const struct quotactl_ops *s_qcop;
const struct export_operations *s_export_op;
unsigned long s_flags;
unsigned long s_iflags; /* internal SB_I_* flags */
unsigned long s_magic;
struct dentry *s_root;
struct rw_semaphore s_umount;
int s_count;
atomic_t s_active;
#ifdef CONFIG_SECURITY
void *s_security;
#endif
const struct xattr_handler **s_xattr;
const struct fscrypt_operations *s_cop;
struct hlist_bl_head s_anon; /* anonymous dentries for (nfs) exporting */
struct list_head s_mounts; /* list of mounts; _not_ for fs use */
struct block_device *s_bdev;
struct backing_dev_info *s_bdi;
struct mtd_info *s_mtd;
struct hlist_node s_instances;
unsigned int s_quota_types; /* Bitmask of supported quota types */
struct quota_info s_dquot; /* Diskquota specific options */
struct sb_writers s_writers;
char s_id[32]; /* Informational name */
u8 s_uuid[16]; /* UUID */
void *s_fs_info; /* Filesystem private info */
unsigned int s_max_links;
fmode_t s_mode;
/* Granularity of c/m/atime in ns.
Cannot be worse than a second */
u32 s_time_gran;
/*
* The next field is for VFS *only*. No filesystems have any business
* even looking at it. You had been warned.
*/
struct mutex s_vfs_rename_mutex; /* Kludge */
/*
* Filesystem subtype. If non-empty the filesystem type field
* in /proc/mounts will be "type.subtype"
*/
char *s_subtype;
/*
* Saved mount options for lazy filesystems using
* generic_show_options()
*/
char __rcu *s_options;
const struct dentry_operations *s_d_op; /* default d_op for dentries */
/*
* Saved pool identifier for cleancache (-1 means none)
*/
int cleancache_poolid;
struct shrinker s_shrink; /* per-sb shrinker handle */
/* Number of inodes with nlink == 0 but still referenced */
atomic_long_t s_remove_count;
/* Being remounted read-only */
int s_readonly_remount;
/* AIO completions deferred from interrupt context */
struct workqueue_struct *s_dio_done_wq;
struct hlist_head s_pins;
/*
* Owning user namespace and default context in which to
* interpret filesystem uids, gids, quotas, device nodes,
* xattrs and security labels.
*/
struct user_namespace *s_user_ns;
/*
* Keep the lru lists last in the structure so they always sit on their
* own individual cachelines.
*/
struct list_lru s_dentry_lru ____cacheline_aligned_in_smp;
struct list_lru s_inode_lru ____cacheline_aligned_in_smp;
struct rcu_head rcu;
struct work_struct destroy_work;
struct mutex s_sync_lock; /* sync serialisation lock */
/*
* Indicates how deep in a filesystem stack this SB is
*/
int s_stack_depth;
/* s_inode_list_lock protects s_inodes */
spinlock_t s_inode_list_lock ____cacheline_aligned_in_smp;
struct list_head s_inodes; /* all inodes */
spinlock_t s_inode_wblist_lock;
struct list_head s_inodes_wb; /* writeback inodes */
};