Zookeeper源码阅读(四)数据存储-SNOPSHOT
前言
事务日志记录了zk的事务记录,而zk同时有快照机制来记录zk服务器中某一时刻全量内存数据,并写入磁盘中保存。快照机制其实很多框架都提供,也是目前做恢复的一种很主流的机制。
前面第二篇说到Zk内存中数据以树的形式存在内存中,相应的类是DataTree。快照就是基于DataTree,把它序列化然后存储到磁盘中。
SnapShot接口
与事务日志类似(接口为TxnLog),快照也有其对饮的维护快照数据对外的接口(读写等操作),即SnapShot。
/**
* snapshot interface for the persistence layer.
* implement this interface for implementing
* snapshots.
*/
public interface SnapShot {
/**
* deserialize a data tree from the last valid snapshot and
* return the last zxid that was deserialized
* @param dt the datatree to be deserialized into
* @param sessions the sessions to be deserialized into
* @return the last zxid that was deserialized from the snapshot
* @throws IOException
*/
long deserialize(DataTree dt, Map<Long, Integer> sessions)
throws IOException;
/**
* persist the datatree and the sessions into a persistence storage
* @param dt the datatree to be serialized
* @param sessions
* @throws IOException
*/
void serialize(DataTree dt, Map<Long, Integer> sessions,
File name)
throws IOException;
/**
* find the most recent snapshot file
* @return the most recent snapshot file
* @throws IOException
*/
File findMostRecentSnapshot() throws IOException;
/**
* free resources from this snapshot immediately
* @throws IOException
*/
void close() throws IOException;
}
从类与方法本身的注释就能看出来这个接口就是为快照服务的。从方法我们也可以看出,这个接口是为了做序列化与反序列化的。还有两个方法是找到最近的快照文件与释放快照资源。
FileSnap(具体的功能类):
/**
* This class implements the snapshot interface.
* it is responsible for storing, serializing
* and deserializing the right snapshot.
* and provides access to the snapshots.
*/
public class FileSnap implements SnapShot {
File snapDir;//目录
private volatile boolean close = false;//快照是否开启
private static final int VERSION=2;
private static final long dbId=-1;
private static final Logger LOG = LoggerFactory.getLogger(FileSnap.class);
public final static int SNAP_MAGIC
= ByteBuffer.wrap("ZKSN".getBytes()).getInt();
从类的注释中能看到这个类是为了保存,序列化/反序列化正确的快照和访问快照而生的。这里的version和dbid都是常量。关于快照的格式可以参考: Snapshot格式
反序列化:
/**
* deserialize a data tree from the most recent snapshot
* @return the zxid of the snapshot
*/
public long deserialize(DataTree dt, Map<Long, Integer> sessions)
throws IOException {
// we run through 100 snapshots (not all of them)
// if we cannot get it running within 100 snapshots
// we should give up
//最多只会拿100个快照文件,如果这100个里面不包含想要的,就放弃了(因为后面查不到)但是会有log警告。
//这里拿到的是按Zxid排序过的
List<File> snapList = findNValidSnapshots(100);
if (snapList.size() == 0) {
return -1L;
}
File snap = null;
boolean foundValid = false;
for (int i = 0; i < snapList.size(); i++) {
snap = snapList.get(i);
InputStream snapIS = null;
CheckedInputStream crcIn = null;
try {
LOG.info("Reading snapshot " + snap);
snapIS = new BufferedInputStream(new FileInputStream(snap));
crcIn = new CheckedInputStream(snapIS, new Adler32());//含有校验信息的
InputArchive ia = BinaryInputArchive.getArchive(crcIn);
deserialize(dt,sessions, ia);//根据ia反序列化到dataTree和session
long checkSum = crcIn.getChecksum().getValue();
long val = ia.readLong("val");//读写入的校验值
if (val != checkSum) {
throw new IOException("CRC corruption in snapshot : " + snap);
}
foundValid = true;
break;//因为之前排序过,所以只要有一个完整的就可以了
} catch(IOException e) {
LOG.warn("problem reading snap file " + snap, e);
} finally {
if (snapIS != null)
snapIS.close();
if (crcIn != null)
crcIn.close();
}
}
if (!foundValid) {
throw new IOException("Not able to find valid snapshots in " + snapDir);
}
dt.lastProcessedZxid = Util.getZxidFromName(snap.getName(), "snapshot");
return dt.lastProcessedZxid;
}
其中,具体的反序列化是:
public void deserialize(DataTree dt, Map<Long, Integer> sessions,
InputArchive ia) throws IOException {
FileHeader header = new FileHeader();
header.deserialize(ia, "fileheader");//反序列化头,里面有几个常量和魔数
if (header.getMagic() != SNAP_MAGIC) {
throw new IOException("mismatching magic headers "
+ header.getMagic() +
" != " + FileSnap.SNAP_MAGIC);
}
SerializeUtils.deserializeSnapshot(dt,ia,sessions);//这里序列化了datatree和session
}
public static void deserializeSnapshot(DataTree dt,InputArchive ia,
Map<Long, Integer> sessions) throws IOException {
int count = ia.readInt("count");
while (count > 0) {
//一个一个读session并打log
long id = ia.readLong("id");
int to = ia.readInt("timeout");
sessions.put(id, to);
if (LOG.isTraceEnabled()) {
ZooTrace.logTraceMessage(LOG, ZooTrace.SESSION_TRACE_MASK,
"loadData --- session in archive: " + id
+ " with timeout: " + to);
}
count--;
}
//反序列化datatree
dt.deserialize(ia, "tree");
}
序列化:
public synchronized void serialize(DataTree dt, Map<Long, Integer> sessions, File snapShot)
throws IOException {
if (!close) {
OutputStream sessOS = new BufferedOutputStream(new FileOutputStream(snapShot));
//在写入的时候就用了CheckedOutputStream为了后面生成checksum
CheckedOutputStream crcOut = new CheckedOutputStream(sessOS, new Adler32());
//CheckedOutputStream cout = new CheckedOutputStream()
OutputArchive oa = BinaryOutputArchive.getArchive(crcOut);
FileHeader header = new FileHeader(SNAP_MAGIC, VERSION, dbId);
//具体序列化过程
serialize(dt,sessions,oa, header);
long val = crcOut.getChecksum().getValue();
oa.writeLong(val, "val");
oa.writeString("/", "path");
sessOS.flush();
crcOut.close();
sessOS.close();
}
}
protected void serialize(DataTree dt,Map<Long, Integer> sessions,
OutputArchive oa, FileHeader header) throws IOException {
// this is really a programmatic error and not something that can
// happen at runtime
if(header==null)
throw new IllegalStateException(
"Snapshot's not open for writing: uninitialized header");
//先序列化头然后序列化datatree和session
header.serialize(oa, "fileheader");
SerializeUtils.serializeSnapshot(dt,oa,sessions);
}
public static void serializeSnapshot(DataTree dt,OutputArchive oa,
Map<Long, Integer> sessions) throws IOException {
HashMap<Long, Integer> sessSnap = new HashMap<Long, Integer>(sessions);
//一个一个写session
oa.writeInt(sessSnap.size(), "count");
for (Entry<Long, Integer> entry : sessSnap.entrySet()) {
oa.writeLong(entry.getKey().longValue(), "id");
oa.writeInt(entry.getValue().intValue(), "timeout");
}
//把datatree序列化
dt.serialize(oa, "tree");
}
找到可用的快照
private List<File> findNValidSnapshots(int n) throws IOException {
//在这里排序的!!!
List<File> files = Util.sortDataDir(snapDir.listFiles(),"snapshot", false);
int count = 0;
List<File> list = new ArrayList<File>();
for (File f : files) {
// we should catch the exceptions
// from the valid snapshot and continue
// until we find a valid one
try {
//这里要验证文件是否是合法的,因为如果写的时候服务挂了,这个快照就不合法了
//可以展开看下,主要是通过文件是否为空,文件大小(<10字节),最后字符不为"/"等方式来判断的。
if (Util.isValidSnapshot(f)) {
list.add(f);
count++;
//反序列化的时候最多100个
if (count == n) {
break;
}
}
} catch (IOException e) {
LOG.info("invalid snapshot " + f, e);
}
}
return list;
}
找最近的快照
public List<File> findNRecentSnapshots(int n) throws IOException {
//和上面的方法基本上一样
List<File> files = Util.sortDataDir(snapDir.listFiles(), "snapshot", false);
int count = 0;
List<File> list = new ArrayList<File>();
for (File f: files) {
if (count == n)
break;
//存在就加一个
if (Util.getZxidFromName(f.getName(), "snapshot") != -1) {
count++;
list.add(f);
}
}
return list;
}
可视化
和事务日志一样,zk也提供了可视化的类SnapshotFormatter,简单说下:
public void run(String snapshotFileName) throws IOException {
InputStream is = new CheckedInputStream(
new BufferedInputStream(new FileInputStream(snapshotFileName)),
new Adler32());
InputArchive ia = BinaryInputArchive.getArchive(is);
FileSnap fileSnap = new FileSnap(null);
DataTree dataTree = new DataTree();
Map<Long, Integer> sessions = new HashMap<Long, Integer>();
//核心其实就是调用了filesnap的反序列化方法。
fileSnap.deserialize(dataTree, sessions, ia);
printDetails(dataTree, sessions);
}
具体的使用方法可以看 日志和快照可视化。
思考
快照维护问题
日志和快照一个是做事务性操作的记录,另一个是把zk内存数据记录到磁盘。它俩执行的机制
根据我的观察,zk里貌似需要通过网络发送的实体类都是通过jute生成的,而像日志和快照这种也需要做序列化的不是,应该是因为那些实体类内部没有逻辑,而这些是有的。
还有一个关于快照一直有的问题,快照基本都不是最新的,根据快照恢复一般会有数据丢失,怎么办,根据日志来补?
参考
https://blog.csdn.net/quhongwei_zhanqiu/article/details/45647573