fclose后断电引起的数据丢失问题

问题背景:

客户反馈,设备断电以后,重新启动,原有配置丢失变砖

问题分析:

变砖的直接原因是配置丢失,配置丢失的原因是启动后flash上的数据已经被破坏,读取失败;

进一步分析,主要是flash数据未完全写入导致;

为何先前发布的yaffs2文件系统没有问题?目前的ubi文件系统会存在问题?

分析app层对于flash数据的操作流程,主要是以下步骤:

fopen -> fwrite -> fsync -> fclose

然而,实际的应该是如下步骤:

fopen -> fwrite -> fflush -> fsync -> fclose

fopen是带有缓冲的io操作,fflush的调用,可以使c库缓冲中的数据刷新到os层,而fsync只是将os层数据同步到介质中;

因此再缺失fflush的情况下,只是fsync再fclose,立即断电,会导致刷新数据不全。 

至于yaffs2文件系统为什么没有问题,内核方面给出解释是:yaffs2文件系统是不带缓冲的,fclose可以触发将缓冲中残留数据刷新到介质;

结合man手册的走读,有如下结论: 

1. 如果需要再描述符关闭前,将数据同步刷新到介质,需要调用fync接口,尤其针对一些关键的数据(丢失会引起严重问题的数据);

2. fopen方式打开的,如果需要调用fsync,正确的调用顺序为:fopen -> fwrite -> fflush -> fsync -> fclose

3. open方式打开的,如果需要调用到fsync,正确的调用顺序为:open -> write -> fsync -> close

问题修复:

1. 写配置文件的接口中,fsync前用fflush,出临时版本

2. 检索工程中,所有fopen打开文件,调用fsync前,增加fflush的调用

3. 鉴于业务的特殊情况,检索工程中,所有fclose或者close前,没有调用fsync的接口,需要补充fsync的调用

 

以下是man手册上摘录相关接口的一些注意点:

 

close()调用的理解(来自https://linux.die.net/man/2/close):

Not checking the return value of close() is a common but nevertheless serious programming error. It is quite possible that errors on a previous write(2) operation are first reported at the final close(). Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and with disk quota.

A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)

It is probably unwise to close file descriptors while they may be in use by system calls in other threads in the same process. Since a file descriptor may be reused, there are some obscure race conditions that may cause unintended side effects

有可能有些write的错误,是报在close调用的时候,close的返回值不判断可能会不知情的情况下,已经丢失了数据。尤其是在带有磁盘配额的NFS文件系统上;

close函数不保证数据写到介质上的,要保证刷新到介质,需要调用fsync进行刷新,再去看fsync的接口手册,对于自身带有缓冲的介质,fsync也是无法保证真正写入的。

 

fclose()调用的理解(来自man手册):
Note that fclose() only flushes the user space buffers provided by the C library. To ensure that the data is physically stored on disk the kernel buffers must be flushed too,
for example, with sync(2) or fsync(2).

fclose只刷新C库提供的用户空间buf,数据到物理介质的写入还需要sync或者fsync来保证;

 

posted @ 2018-11-23 16:18  doctorJ  阅读(2384)  评论(2编辑  收藏  举报