【BUG FIX】NFS 写文件延迟
问题
对现有模块进行重构,在上线之后,发现重构的时候有BUG1(某块逻辑写得不对),在调查的时候,发现了另一个BUG2。这个BUG2,就和NFS延迟有关。
我发现: NFS 写文件的时候有延迟,实际文件最终修改时间戳要晚于日志中打印出的文件写操作结束时间。而这个问题,会导致 -> 后续读文件的时候有几率发生找不到文件 -> 最终表现为部分数据丢失。
time 01: logging - write file start
time 02: file timestamp should be around here, but no!
time 03: logging - write file end
time 04: file timestamp is actually here!
背景
NFS: Network File System
NFS是一种基于TCP/IP传输的网络文件系统协议,最初由SUN公司开发。通过使用NFS协议,客户机可以像访问本地目录一样访问服务器中的共享资源。NFS服务的实现依赖于RPC(远程过程调用)机制,以完成远程到本地的映射过程。
由此可知,写文件到 NFS ,分为三个过程:
- 将内容写到本地内存/磁盘
- 网络传输
- 将内容写到服务器磁盘
其中,一般第二步的耗时占大头。
分析
从代码层面分析,我发现主要是使用了PrintWriter
实现写文件,其中主要用了两个方法:write
方法写文件,flush
方法写入磁盘。但是没有使用close
方法关闭文件。(会不会内存溢出是另一回事,实际情况是 Java 自动 GC 掉了)
笔者分析:flush
固然能保证将数据写入本地的磁盘,即完成上一小节的第1步,但是对于 NFS ,不能保证第2,3步。
笔者猜想: 因为close
方法会断开 socket ,所以,它会一致等待直到网络传输完成,再执行。即,close
方法能保证第2步完成。而由于第3步的耗时远低于第2步,可以认为,当第2步完成时,第3步立刻瞬时完成。
测试
下面,将分别测试带close()
和不带close()
的情况。
先上代码,自定义了一个 LargeFileWriter.java
。
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
public class LargeFileWriter {
public static final String LINE_SPLITTER = "\n";
private static final int PRINT_BUFFER_SIZE = 1000000;
private static final int PRINT_BUFFER_LIMIT = PRINT_BUFFER_SIZE - 12000;
private String outputFile;
private PrintWriter pWriter = null;
private StringBuilder sb = null;
private final Lock lock = new ReentrantLock();
public LargeFileWriter(String fileName, boolean append) {
this.outputFile = fileName;
File file = new File(fileName);
File parentFile = file.getParentFile();
if (parentFile != null) {
parentFile.mkdirs();
}
try {
pWriter = new PrintWriter(new BufferedWriter(new FileWriter(outputFile, append), PRINT_BUFFER_LIMIT));
} catch (IOException e) {
throw new RuntimeException(e);
}
sb = new StringBuilder(PRINT_BUFFER_SIZE);
}
public void write(String str) {
try {
lock.lock();
sb.append(str).append(LINE_SPLITTER);
if (sb.length() > PRINT_BUFFER_LIMIT) {
pWriter.print(sb);
// fail explicitly when output while the disk is full
if (pWriter.checkError()) {
try (FileWriter fw = new FileWriter(outputFile)) {
fw.write(sb.toString());
} catch (IOException e) {
throw new RuntimeException(e);
}
}
sb.setLength(0);
}
} finally {
lock.unlock();
}
}
public void flush() {
try {
lock.lock();
if (pWriter != null) {
if (sb != null && sb.length() > 0) {
pWriter.print(sb);
sb.setLength(0);
}
pWriter.flush();
}
} finally {
lock.unlock();
}
}
public void close() {
pWriter.close();
}
}
主函数分两种情况:
- write to local
- write to nfs
另外,还需要测试带writer.close
和不带的情况。
public static void main(String args[]) {
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
int recordSize = 500000000;
String writeString = "asdfghjkl";
// session 1
System.out.println("session 1 - local");
System.out.println(sdf.format(new Date(System.currentTimeMillis())) + ": start");
String outputFile1 = "/local/tmp/location/testfile.txt";
LargeFileWriter writer1 = new LargeFileWriter(outputFile1, true);
for (int i=0; i<recordSize; i++) {
writer1.write(writeString + i + i*i);
}
if(writer1 != null){
writer1.flush();
}
//writer1.close();
System.out.println(sdf.format(new Date(System.currentTimeMillis())) + ": end");
// session 2
System.out.println("session 2 - nfs");
System.out.println(sdf.format(new Date(System.currentTimeMillis())) + ": start");
String outputFile2 = "/home/user/tmp/location/testfile.txt";
LargeFileWriter writer2 = new LargeFileWriter(outputFile2, true);
for (int i=0; i<recordSize; i++) {
writer2.write(writeString + i + i*i);
}
if(writer2 != null){
writer2.flush();
}
//writer2.close();
System.out.println(sdf.format(new Date(System.currentTimeMillis())) + ": end");
}
测试结果
14G file, without writer.close()
, there's a file generation lag
session 1 - local
2021-11-08 05:25:43.598: start
2021-11-08 05:27:22.256: file timestamp
2021-11-08 05:27:22.257: end
cost around 1 min 40s
session 2 - nfs
2021-11-08 05:27:22.257: start
2021-11-08 05:30:48.883: end
2021-11-08 05:30:55.462: file timestamp
cost around 3 min 30s
14G file, with writer.close()
, there's NO file generation lag
session 1 - local
2021-11-08 05:54:54.016: start
2021-11-08 05:56:31.996: file timestamp
2021-11-08 05:56:31.997: end
cost around 1 min 40s
session 2 - nfs
2021-11-08 05:56:31.997: start
2021-11-08 06:00:05.193: file timestamp
2021-11-08 06:00:05.196: end
cost around 3 min 30s
结论
在 NFS 环境进行文件输出,一定要 close()
,否则不能保证文件实际生成的时间!
参考
- NFS(网络文件系统)的配置及使用 https://www.cnblogs.com/yoo2767/p/5963177.html