Zookeeper源码阅读（四）数据存储-SNOPSHOT

前言

事务日志记录了zk的事务记录，而zk同时有快照机制来记录zk服务器中某一时刻全量内存数据，并写入磁盘中保存。快照机制其实很多框架都提供，也是目前做恢复的一种很主流的机制。

前面第二篇说到Zk内存中数据以树的形式存在内存中，相应的类是DataTree。快照就是基于DataTree，把它序列化然后存储到磁盘中。

SnapShot接口

与事务日志类似(接口为TxnLog)，快照也有其对饮的维护快照数据对外的接口（读写等操作），即SnapShot。

/**
 * snapshot interface for the persistence layer.
 * implement this interface for implementing 
 * snapshots.
 */
public interface SnapShot {
    
    /**
     * deserialize a data tree from the last valid snapshot and 
     * return the last zxid that was deserialized
     * @param dt the datatree to be deserialized into
     * @param sessions the sessions to be deserialized into
     * @return the last zxid that was deserialized from the snapshot
     * @throws IOException
     */
    long deserialize(DataTree dt, Map<Long, Integer> sessions) 
        throws IOException;
    
    /**
     * persist the datatree and the sessions into a persistence storage
     * @param dt the datatree to be serialized
     * @param sessions 
     * @throws IOException
     */
    void serialize(DataTree dt, Map<Long, Integer> sessions, 
            File name) 
        throws IOException;
    
    /**
     * find the most recent snapshot file
     * @return the most recent snapshot file
     * @throws IOException
     */
    File findMostRecentSnapshot() throws IOException;
    
    /**
     * free resources from this snapshot immediately
     * @throws IOException
     */
    void close() throws IOException;
}

从类与方法本身的注释就能看出来这个接口就是为快照服务的。从方法我们也可以看出，这个接口是为了做序列化与反序列化的。还有两个方法是找到最近的快照文件与释放快照资源。

FileSnap（具体的功能类）：

/**
 * This class implements the snapshot interface.
 * it is responsible for storing, serializing
 * and deserializing the right snapshot.
 * and provides access to the snapshots.
 */
public class FileSnap implements SnapShot {
    File snapDir;//目录
    private volatile boolean close = false;//快照是否开启
    private static final int VERSION=2;
    private static final long dbId=-1;
    private static final Logger LOG = LoggerFactory.getLogger(FileSnap.class);
    public final static int SNAP_MAGIC
        = ByteBuffer.wrap("ZKSN".getBytes()).getInt();

从类的注释中能看到这个类是为了保存，序列化/反序列化正确的快照和访问快照而生的。这里的version和dbid都是常量。关于快照的格式可以参考： Snapshot格式

反序列化：

/**
 * deserialize a data tree from the most recent snapshot
 * @return the zxid of the snapshot
 */ 
public long deserialize(DataTree dt, Map<Long, Integer> sessions)
        throws IOException {
    // we run through 100 snapshots (not all of them)
    // if we cannot get it running within 100 snapshots
    // we should  give up
    //最多只会拿100个快照文件，如果这100个里面不包含想要的，就放弃了（因为后面查不到）但是会有log警告。
    //这里拿到的是按Zxid排序过的
    List<File> snapList = findNValidSnapshots(100);
    if (snapList.size() == 0) {
        return -1L;
    }
    File snap = null;
    boolean foundValid = false;
    for (int i = 0; i < snapList.size(); i++) {
        snap = snapList.get(i);
        InputStream snapIS = null;
        CheckedInputStream crcIn = null;
        try {
            LOG.info("Reading snapshot " + snap);
            snapIS = new BufferedInputStream(new FileInputStream(snap));
            crcIn = new CheckedInputStream(snapIS, new Adler32());//含有校验信息的
            InputArchive ia = BinaryInputArchive.getArchive(crcIn);
            deserialize(dt,sessions, ia);//根据ia反序列化到dataTree和session
            long checkSum = crcIn.getChecksum().getValue();
            long val = ia.readLong("val");//读写入的校验值
            if (val != checkSum) {
                throw new IOException("CRC corruption in snapshot :  " + snap);
            }
            foundValid = true;
            break;//因为之前排序过，所以只要有一个完整的就可以了
        } catch(IOException e) {
            LOG.warn("problem reading snap file " + snap, e);
        } finally {
            if (snapIS != null) 
                snapIS.close();
            if (crcIn != null) 
                crcIn.close();
        } 
    }
    if (!foundValid) {
        throw new IOException("Not able to find valid snapshots in " + snapDir);
    }
    dt.lastProcessedZxid = Util.getZxidFromName(snap.getName(), "snapshot");
    return dt.lastProcessedZxid;
}

其中，具体的反序列化是：

public void deserialize(DataTree dt, Map<Long, Integer> sessions,
        InputArchive ia) throws IOException {
    FileHeader header = new FileHeader();
    header.deserialize(ia, "fileheader");//反序列化头，里面有几个常量和魔数
    if (header.getMagic() != SNAP_MAGIC) {
        throw new IOException("mismatching magic headers "
                + header.getMagic() + 
                " !=  " + FileSnap.SNAP_MAGIC);
    }
    SerializeUtils.deserializeSnapshot(dt,ia,sessions);//这里序列化了datatree和session
}

public static void deserializeSnapshot(DataTree dt,InputArchive ia,
        Map<Long, Integer> sessions) throws IOException {
    int count = ia.readInt("count");
    while (count > 0) {
        //一个一个读session并打log
        long id = ia.readLong("id");
        int to = ia.readInt("timeout");
        sessions.put(id, to);
        if (LOG.isTraceEnabled()) {
            ZooTrace.logTraceMessage(LOG, ZooTrace.SESSION_TRACE_MASK,
                    "loadData --- session in archive: " + id
                    + " with timeout: " + to);
        }
        count--;
    }
    //反序列化datatree
    dt.deserialize(ia, "tree");
}

序列化：

public synchronized void serialize(DataTree dt, Map<Long, Integer> sessions, File snapShot)
        throws IOException {
    if (!close) {
        OutputStream sessOS = new BufferedOutputStream(new FileOutputStream(snapShot));
        //在写入的时候就用了CheckedOutputStream为了后面生成checksum
        CheckedOutputStream crcOut = new CheckedOutputStream(sessOS, new Adler32());
        //CheckedOutputStream cout = new CheckedOutputStream()
        OutputArchive oa = BinaryOutputArchive.getArchive(crcOut);
        FileHeader header = new FileHeader(SNAP_MAGIC, VERSION, dbId);
        //具体序列化过程
        serialize(dt,sessions,oa, header);
        long val = crcOut.getChecksum().getValue();
        oa.writeLong(val, "val");
        oa.writeString("/", "path");
        sessOS.flush();
        crcOut.close();
        sessOS.close();
    }
}

protected void serialize(DataTree dt,Map<Long, Integer> sessions,
        OutputArchive oa, FileHeader header) throws IOException {
    // this is really a programmatic error and not something that can
    // happen at runtime
    if(header==null)
        throw new IllegalStateException(
                "Snapshot's not open for writing: uninitialized header");
    //先序列化头然后序列化datatree和session
    header.serialize(oa, "fileheader");
    SerializeUtils.serializeSnapshot(dt,oa,sessions);
}

public static void serializeSnapshot(DataTree dt,OutputArchive oa,
        Map<Long, Integer> sessions) throws IOException {
    HashMap<Long, Integer> sessSnap = new HashMap<Long, Integer>(sessions);
    //一个一个写session
    oa.writeInt(sessSnap.size(), "count");
    for (Entry<Long, Integer> entry : sessSnap.entrySet()) {
        oa.writeLong(entry.getKey().longValue(), "id");
        oa.writeInt(entry.getValue().intValue(), "timeout");
    }
    //把datatree序列化
    dt.serialize(oa, "tree");
}

找到可用的快照

private List<File> findNValidSnapshots(int n) throws IOException {
    //在这里排序的！！！
    List<File> files = Util.sortDataDir(snapDir.listFiles(),"snapshot", false);
    int count = 0;
    List<File> list = new ArrayList<File>();
    for (File f : files) {
        // we should catch the exceptions
        // from the valid snapshot and continue
        // until we find a valid one
        try {
            //这里要验证文件是否是合法的，因为如果写的时候服务挂了，这个快照就不合法了
            //可以展开看下，主要是通过文件是否为空，文件大小(<10字节)，最后字符不为"/"等方式来判断的。
            if (Util.isValidSnapshot(f)) {
                list.add(f);
                count++;
                //反序列化的时候最多100个
                if (count == n) {
                    break;
                }
            }
        } catch (IOException e) {
            LOG.info("invalid snapshot " + f, e);
        }
    }
    return list;
}

找最近的快照

public List<File> findNRecentSnapshots(int n) throws IOException {
    //和上面的方法基本上一样
    List<File> files = Util.sortDataDir(snapDir.listFiles(), "snapshot", false);
    int count = 0;
    List<File> list = new ArrayList<File>();
    for (File f: files) {
        if (count == n)
            break;
        //存在就加一个
        if (Util.getZxidFromName(f.getName(), "snapshot") != -1) {
            count++;
            list.add(f);
        }
    }
    return list;
}

可视化

和事务日志一样，zk也提供了可视化的类SnapshotFormatter，简单说下：

public void run(String snapshotFileName) throws IOException {
    InputStream is = new CheckedInputStream(
            new BufferedInputStream(new FileInputStream(snapshotFileName)),
            new Adler32());
    InputArchive ia = BinaryInputArchive.getArchive(is);
    
    FileSnap fileSnap = new FileSnap(null);

    DataTree dataTree = new DataTree();
    Map<Long, Integer> sessions = new HashMap<Long, Integer>();
    
    //核心其实就是调用了filesnap的反序列化方法。
    fileSnap.deserialize(dataTree, sessions, ia);

    printDetails(dataTree, sessions);
}

具体的使用方法可以看日志和快照可视化。

思考

快照维护问题

日志和快照一个是做事务性操作的记录，另一个是把zk内存数据记录到磁盘。它俩执行的机制

根据我的观察，zk里貌似需要通过网络发送的实体类都是通过jute生成的，而像日志和快照这种也需要做序列化的不是，应该是因为那些实体类内部没有逻辑，而这些是有的。

还有一个关于快照一直有的问题，快照基本都不是最新的，根据快照恢复一般会有数据丢失，怎么办，根据日志来补？

参考

https://blog.csdn.net/quhongwei_zhanqiu/article/details/45647573

https://www.jianshu.com/p/8714586d4a04

https://blog.csdn.net/qq_27739989/article/details/78105603

posted @ 2018-09-19 21:50 SmallMushroom 阅读(613) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

SmallMushroom