阿里案例-服务器掉电后ssd数据不一致1(O_DIRECT与O_SYNC)

业务方使用pwrite写数据, 使用O_DIRECT, 没有加O_SYNC.

这是一个非常大的怀疑点, 如果没有O_SYNC, 文件系统的元数据可能在掉电时未写入ssd. 

O_SYNC在每个写操作,会等待磁盘返回结果才返回,确保了数据一定落盘.

但要验证这个猜想,需要证明数据在PCIE-SSD上, 仅是文件系统读不到这个数据(因为美哟文件系统元数据).

 

使用pwrite写数据, 加上O_DIRECT标识, 只能保证数据直接落盘(忽略buffer cache), 而文件系统元数据仍然存储在inode cache(内存)中,

异常断电时, 即使存储设备可以做到掉电保护(power loss protection, 企业级ssd特性), 但服务器内存里的inode cache还没来得及写入ssd, 造成元数据丢失.

当加上O_SYNC, 写操作变为同步写(synchronous I/O),此时可保证元数据同步落盘. 见man手册说明:

man 2 open

```

O_DIRECT (since Linux 2.4.10)
Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications
do their own caching. File I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give
the guarantees of the O_SYNC flag that data and necessary metadata are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT. See NOTES
below for further discussion.

A semantically similar (but deprecated) interface for block devices is described in raw(8).

```

posted @ 2022-07-08 16:35  leo21sun  阅读(156)  评论(0编辑  收藏  举报