背景介绍

最近看了linux系统编程(linux system programming)一书,结合深入理解linux内核(understanding the linux kernel)一书,深入理解了linux关闭文件还有删除文件的整个过程,并且本人第一次学着查看源码来仔细理解过程,这个过程中感觉自己也学习了很多,下次再要看源码的时候应该会轻松些(下次估计会学着用vim+ctags了),万事开头难呀,O(∩_∩)O~。

书上的内容加上一些自己的理解

关闭文件

当程序完成对某个文件的操作之后,可以通过系统调用close()来取消文件描述符到对应文件的映射,
调用后,先前给定的文件描述符fd不再有效,内核可以随时重用它,当后续有open()或者create()调用时,重新把它作为返回值。close()成功返回0,错误返回-1。
值得一提的是,关闭文件并非意味着该文件的数据已经写到磁盘了。如果应用希望在关闭文件之前数据已经写入磁盘,它需要调用同步。(sync()是同步的一种方法)

关闭文件虽然操作简单,但是也会带来一些影响。当关闭指向某个文件的最后一个文件描述符时,内核中表示该文件的数据结构也就释放了。如果释放了数据结构,会清除和文件相关的索引节点的内存拷贝。如果已经没有内存和索引节点关联,该索引节点也会从内存中清除(出于性能考虑,也可能会保存在内核中,这就是linux内存管理的概念了)。如果文件已经从磁盘上解除链接,但是解除之前还一直打开,在文件关闭并且索引节点从内存中删除后,该文件才会真正从物理磁盘上删除,因此调用close()可能会使一个已解除链接的文件最终从磁盘上删除。

其他相关的知识

  • 在对文件进行读写操作之前,首先要打开文件,内核会为每个进程维护一个打开文件的列表,这个列表是由一些非负整数进行索引,这些非负整数称为文件描述符
  • 文件虽然通过文件名访问,但文件本身其实并没有和文件名直接关联。与文件关联的是inode,索引节点,inode是文件系统为该文件分配的唯一整数值。(但整个系统中不一定唯一)。索引节点保存元数据,如各种时间戳,类型,长度,文件数据的位置,但是不包含文件名!索引节点就是unix文件在磁盘上的实际物理对象,也是linux内核中通过数据结构表示的实体
  • 目录用于提供访问文件需要的名称,目录是可读名称到索引编号之间的映射,名称和索引节点之间的配对成为链接(link),从概念上讲可以把目录看成普通文件,区别在于它包含文件名称到索引节点的映射。内核直接通过该映射报文件名解析为索引节点。

深入理解文件从物理磁盘上删除的过程

下面我们从内核角度来说一下这个过程。(下面的代码是我从http://www.kernel.org/下载的4.9.1版本的内核源码)
首先我们知道每个文件都有inode,我们通过inode来访问文件,但是要删除文件需要从superblock中删除这个inode,
super_block struct如下(简单列出来几行):
这个super_blockinclude/linux/fs.h

struct super_block {
	struct list_head	s_inodes;	/* all inodes */
	spinlock_t		s_inode_wblist_lock;
	struct list_head	s_inodes_wb;	/* writeback inodes */
};

这个list_headinclude/linux/types.h

struct list_head {
	struct list_head *next, *prev;
};

可以看出s_inodes是一个类型为list_head结构体的双向链表保存了所有的inodes,实际删除文件就是删除这个inode对应的链表节点。这个删除方法就在include/linux/list.h这里面,这里需要说的是list_head本身是一个这么简单的双向链表,怎么保存inode的其他信息。

list_head双向链表介绍

本文也是有很多参考这两篇文章,大家可以先看这两篇或者至少cnblog的这一篇再来看下面的内容

我要说的是我都是去我下载内核文件找的代码,然后复制到这个blog里面的。
其实像super_block这种结构体中的list_head struct是通过list_entry这个宏定义函数来通过这个双向链表找到inode的实际地址的,说的太无力了,上代码

这个list_entryinclude/linux/list.h

#define list_entry(ptr, type, member) \
	container_of(ptr, type, member)

这个container_ofinclude/linux/kernel.h

#define container_of(ptr, type, member) ({			\
	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
	(type *)( (char *)__mptr - offsetof(type,member) );})

这个offsetofinclude/linux/stddef.h

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

这里我们就以super_block为例来说,super_block保存了inodes的链表,下面我们来说如何通过这个链表头来访问下一个inode,假设我们已经访问到了super_block,super_block_addr为指向这个super_block的指针,
首先我说下,写到这的时候,我发现我不知道superblock指针指过来的list_head structinode struct里面是具体哪个?下面我们来分析下,我看inode结构体就下面五个list_head struct,

struct list_head	i_io_list;	/* backing dev IO list */
struct list_head	i_lru;		/* inode LRU list */
struct list_head	i_sb_list;
struct list_head	i_wb_list;	/* backing dev writeback list */
struct list_head	i_devices;

从名字上来分析应该superblock的指针应该指向i_sb_list这个(i_sb意思inode superblock嘛)这个不太确定哈,本文就暂定这个是吧(深入理解Linux内核中对这个i_sb_list的解释是用于超级块的索引节点链表的指针,可以肯定这个就是superblock的指针),通过下面的代码就可以访问inode了。

inode *obj  = super_block_addr;
inode *nextObj = (inode *)list_entry(obj->s_inodes->next,struct inode,i_sb_list);

这样就nextObj就是下一个inode的指针了.

C语言中0的用法

还有要说的是我看到这个#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)的时候不太理解0在这里面的用法,仔细查了一下文档,0这里也可以说成是一个空指针,NULL pointer,可以变成各种对象,我还查了C11文档如下:

C11 6.3.2.3

A pointer to void may be converted to or from a pointer to any object type. A pointer to
any object type may be converted to a pointer to void and back again; the result shall
compare equal to the original pointer.

An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function

inodesuper_block源码(内核4.9.1)

inode源码

/*
 * Keep mostly read-only and often accessed (especially for
 * the RCU path lookup and 'stat' data) fields at the beginning
 * of the 'struct inode'
 */
struct inode {
	umode_t			i_mode;
	unsigned short		i_opflags;
	kuid_t			i_uid;
	kgid_t			i_gid;
	unsigned int		i_flags;

#ifdef CONFIG_FS_POSIX_ACL
	struct posix_acl	*i_acl;
	struct posix_acl	*i_default_acl;
#endif

	const struct inode_operations	*i_op;
	struct super_block	*i_sb;
	struct address_space	*i_mapping;

#ifdef CONFIG_SECURITY
	void			*i_security;
#endif

	/* Stat data, not accessed from path walking */
	unsigned long		i_ino;
	/*
	 * Filesystems may only read i_nlink directly.  They shall use the
	 * following functions for modification:
	 *
	 *    (set|clear|inc|drop)_nlink
	 *    inode_(inc|dec)_link_count
	 */
	union {
		const unsigned int i_nlink;
		unsigned int __i_nlink;
	};
	dev_t			i_rdev;
	loff_t			i_size;
	struct timespec		i_atime;
	struct timespec		i_mtime;
	struct timespec		i_ctime;
	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
	unsigned short          i_bytes;
	unsigned int		i_blkbits;
	blkcnt_t		i_blocks;

#ifdef __NEED_I_SIZE_ORDERED
	seqcount_t		i_size_seqcount;
#endif

	/* Misc */
	unsigned long		i_state;
	struct rw_semaphore	i_rwsem;

	unsigned long		dirtied_when;	/* jiffies of first dirtying */
	unsigned long		dirtied_time_when;

	struct hlist_node	i_hash;
	struct list_head	i_io_list;	/* backing dev IO list */
#ifdef CONFIG_CGROUP_WRITEBACK
	struct bdi_writeback	*i_wb;		/* the associated cgroup wb */

	/* foreign inode detection, see wbc_detach_inode() */
	int			i_wb_frn_winner;
	u16			i_wb_frn_avg_time;
	u16			i_wb_frn_history;
#endif
	struct list_head	i_lru;		/* inode LRU list */
	struct list_head	i_sb_list;
	struct list_head	i_wb_list;	/* backing dev writeback list */
	union {
		struct hlist_head	i_dentry;
		struct rcu_head		i_rcu;
	};
	u64			i_version;
	atomic_t		i_count;
	atomic_t		i_dio_count;
	atomic_t		i_writecount;
#ifdef CONFIG_IMA
	atomic_t		i_readcount; /* struct files open RO */
#endif
	const struct file_operations	*i_fop;	/* former ->i_op->default_file_ops */
	struct file_lock_context	*i_flctx;
	struct address_space	i_data;
	struct list_head	i_devices;
	union {
		struct pipe_inode_info	*i_pipe;
		struct block_device	*i_bdev;
		struct cdev		*i_cdev;
		char			*i_link;
		unsigned		i_dir_seq;
	};

	__u32			i_generation;

#ifdef CONFIG_FSNOTIFY
	__u32			i_fsnotify_mask; /* all events this inode cares about */
	struct hlist_head	i_fsnotify_marks;
#endif

#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
	struct fscrypt_info	*i_crypt_info;
#endif

	void			*i_private; /* fs or device private pointer */
};

super_block源码

struct super_block {
	struct list_head	s_list;		/* Keep this first */
	dev_t			s_dev;		/* search index; _not_ kdev_t */
	unsigned char		s_blocksize_bits;
	unsigned long		s_blocksize;
	loff_t			s_maxbytes;	/* Max file size */
	struct file_system_type	*s_type;
	const struct super_operations	*s_op;
	const struct dquot_operations	*dq_op;
	const struct quotactl_ops	*s_qcop;
	const struct export_operations *s_export_op;
	unsigned long		s_flags;
	unsigned long		s_iflags;	/* internal SB_I_* flags */
	unsigned long		s_magic;
	struct dentry		*s_root;
	struct rw_semaphore	s_umount;
	int			s_count;
	atomic_t		s_active;
#ifdef CONFIG_SECURITY
	void                    *s_security;
#endif
	const struct xattr_handler **s_xattr;

	const struct fscrypt_operations	*s_cop;

	struct hlist_bl_head	s_anon;		/* anonymous dentries for (nfs) exporting */
	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
	struct block_device	*s_bdev;
	struct backing_dev_info *s_bdi;
	struct mtd_info		*s_mtd;
	struct hlist_node	s_instances;
	unsigned int		s_quota_types;	/* Bitmask of supported quota types */
	struct quota_info	s_dquot;	/* Diskquota specific options */

	struct sb_writers	s_writers;

	char s_id[32];				/* Informational name */
	u8 s_uuid[16];				/* UUID */

	void 			*s_fs_info;	/* Filesystem private info */
	unsigned int		s_max_links;
	fmode_t			s_mode;

	/* Granularity of c/m/atime in ns.
	   Cannot be worse than a second */
	u32		   s_time_gran;

	/*
	 * The next field is for VFS *only*. No filesystems have any business
	 * even looking at it. You had been warned.
	 */
	struct mutex s_vfs_rename_mutex;	/* Kludge */

	/*
	 * Filesystem subtype.  If non-empty the filesystem type field
	 * in /proc/mounts will be "type.subtype"
	 */
	char *s_subtype;

	/*
	 * Saved mount options for lazy filesystems using
	 * generic_show_options()
	 */
	char __rcu *s_options;
	const struct dentry_operations *s_d_op; /* default d_op for dentries */

	/*
	 * Saved pool identifier for cleancache (-1 means none)
	 */
	int cleancache_poolid;

	struct shrinker s_shrink;	/* per-sb shrinker handle */

	/* Number of inodes with nlink == 0 but still referenced */
	atomic_long_t s_remove_count;

	/* Being remounted read-only */
	int s_readonly_remount;

	/* AIO completions deferred from interrupt context */
	struct workqueue_struct *s_dio_done_wq;
	struct hlist_head s_pins;

	/*
	 * Owning user namespace and default context in which to
	 * interpret filesystem uids, gids, quotas, device nodes,
	 * xattrs and security labels.
	 */
	struct user_namespace *s_user_ns;

	/*
	 * Keep the lru lists last in the structure so they always sit on their
	 * own individual cachelines.
	 */
	struct list_lru		s_dentry_lru ____cacheline_aligned_in_smp;
	struct list_lru		s_inode_lru ____cacheline_aligned_in_smp;
	struct rcu_head		rcu;
	struct work_struct	destroy_work;

	struct mutex		s_sync_lock;	/* sync serialisation lock */

	/*
	 * Indicates how deep in a filesystem stack this SB is
	 */
	int s_stack_depth;

	/* s_inode_list_lock protects s_inodes */
	spinlock_t		s_inode_list_lock ____cacheline_aligned_in_smp;
	struct list_head	s_inodes;	/* all inodes */

	spinlock_t		s_inode_wblist_lock;
	struct list_head	s_inodes_wb;	/* writeback inodes */
};