hadoop hdfs总结 NameNode部分 3 ----DatanodeDescriptor

    DatanodeDescriptor是对DataNode的抽象,它是NameNode的内部数据结构,配合BlockMap和INode,记录了文件系统中所有Datanodes包含的Block信息,以及对应的INode信息。

    DatanodeDescriptor继承自DatanodeInfo,DatanodeInfo继承自DatanodeID。

    一、DatanodeID

    DatanodeID有以下属性:    

    public String name; /// hostname:portNumber    
    public String storageID; /// unique per cluster storageID    集群内唯一的hostname
    protected int infoPort; /// the port where the infoserver is running    infoPort的端口号
    public int ipcPort; /// the port where the ipc server is running  底层IPC通信端口号

    二、DatanodeInfo

    1、DatanodeInfo有以下属性:    

    protected long capacity;     
    protected long dfsUsed;
    protected long remaining;

    protected String hostName = null;  hostname由Datanode在register时候提供
    protected long lastUpdate;
    protected int xceiverCount;     这个比较重要,表示的是Datanode与client或者Datanode连接时候的连接数,超出后会出错
    protected String location = NetworkTopology.DEFAULT_RACK;  网络拓扑结构,这个可以定义,按照机架进行备份放置策略

    protected AdminStates adminState;    adminState表示的是此Datanode的运行状态,运行状态有NORMAL, DECOMMISSION_INPROGRESS, DECOMMISSIONED;  在Datanode进行decommission时候有用,decommission指的是Datanode下线,为了防止数据丢失,在下线过程中需要将此Datanode对应的Block拷贝到其他Datanode上。

    2、重要方法

    public String dumpDatanode()  将所有的属性统计信息输出。

    三、DatanodeDescriptor

    DatanodeDescriptor是对DataNode所有操作的抽象,DataNode就是存储文件系统的所有数据,数据对应了文件,文件由多个块构成,每个块又有多分备份。对于DataNode的操作,基本上有client向Datanode传输数据,Datanode需要记录所有的block,如果数据丢失需要将block进行重新复制(replicate),如果数据在append过程或者传输过程中产生错误,需要进行恢复(recovery)等等。DatanodeDescriptor中封装了所有的操作。

   1、重要数据结构

   (1)内部类 BlockTargetPair

  public static class BlockTargetPair {
public final Block block;
public final DatanodeDescriptor[] targets;

BlockTargetPair(Block block, DatanodeDescriptor[] targets) {
this.block = block;
this.targets = targets;
}
}

      表示的是block以及对应所有副本存放的Datanode。为下面的一些数据结构提供基础。

    (2)内部类private static class BlockQueue 

    用来对BlockTargetPair队列进行封装,包括出列入列等方法。

    (3)private volatile BlockInfo blockList = null;

    每个DatanodeDescriptor要记录该Datanode所保存的所有Block,就是通过BlockInfo来保存的,blockList根据三元组存储(见BlocksMap分析),作为头节点,存储所有的block,通过链表来获得。

    (4)内部结构:

  /** A queue of blocks to be replicated by this datanode */
private BlockQueue replicateBlocks = new BlockQueue();
/** A queue of blocks to be recovered by this datanode */
private BlockQueue recoverBlocks = new BlockQueue();
/** A set of blocks to be invalidated by this datanode */
private Set<Block> invalidateBlocks = new TreeSet<Block>();

    这些内部结构包括有需要由这个Datanode复制给其它Datanode的----replicateBlock,需要由该Datanode复制给其它Datanode的----recoverBlocks,需要将Block从Datanode删除的。

    前两个结构需要得到其它DatanodeDescriptor,由于需要获知需要进行复制和恢复的Datanode,而invalidate只是本次Datanode需要删除的,与其它Datanode无关。

    (5)以下变量维护了block调度包括block report和heartbeat时间等。

  private int currApproxBlocksScheduled = 0;
private int prevApproxBlocksScheduled = 0;
private long lastBlocksScheduledRollTime = 0;
private static final int BLOCKS_SCHEDULED_ROLL_INTERVAL = 600*1000; //10min

     2、重要方法

    (1)void updateHeartbeat

  void updateHeartbeat(long capacity, long dfsUsed, long remaining,
int xceiverCount) {
this.capacity = capacity;
this.dfsUsed = dfsUsed;
this.remaining = remaining;
this.lastUpdate = System.currentTimeMillis();
this.xceiverCount = xceiverCount;
rollBlocksScheduled(lastUpdate);
}

    DataNode向NameNode进行心跳汇报时,更新状态,包括有capacity,dfsused,remainning和xceiverCount,并且将最后更新时间更新。

    (2)boolean addBlock(BlockInfo b)    

  boolean addBlock(BlockInfo b) {
if(!b.addNode(this))
return false;
// add to the head of the data-node list
blockList = b.listInsert(blockList, this);
return true;
}

    将block插入到队列头。

    (3)boolean removeBlock(BlockInfo b)

  boolean removeBlock(BlockInfo b) {
blockList = b.listRemove(blockList, this);
return b.removeNode(this);
}

    从队列中删除。

    (4)void addBlockToBeReplicated

  void addBlockToBeReplicated(Block block, DatanodeDescriptor[] targets) {
assert(block != null && targets != null && targets.length > 0);
replicateBlocks.offer(block, targets);
}

    将Block放置在replicateBlocks结构中。

    (5)void addBlockToBeRecovered

  void addBlockToBeRecovered(Block block, DatanodeDescriptor[] targets) {
assert(block != null && targets != null && targets.length > 0);
recoverBlocks.offer(block, targets);
}

    将Block放置在recoverBlocks结构中。

    (6)void addBlocksToBeInvalidated

  void addBlocksToBeInvalidated(List<Block> blocklist) {
assert(blocklist != null && blocklist.size() > 0);
synchronized (invalidateBlocks) {
for(Block blk : blocklist) {
invalidateBlocks.add(blk);
}
}
}

    将Block放置在invalidateBlocks结构中。

    (7) BlockCommand getReplicationCommand(int maxTransfers)

    BlockCommand getLeaseRecoveryCommand(int maxTransfers)

    BlockCommand getInvalidateBlocks(int maxblocks) 

    这三个方法相同,就是将三个内部数据结构中的数据封装成writable的数据形式传输给对应的Datanode,同时将cmd指定为DatanodeProtocol.DNA_TRANSFER,DatanodeProtocol.DNA_RECOVERBLOCK或者DatanodeProtocol.DNA_INVALIDATE。

    (8)reportDiff 这个方法是DatanodeDescriptor中最重要的方法

void reportDiff(BlocksMap blocksMap,
BlockListAsLongs newReport,
Collection<Block> toAdd,
Collection<Block> toRemove,
Collection<Block> toInvalidate) {
// place a deilimiter in the list which separates blocks
// that have been reported from those that have not
BlockInfo delimiter = new BlockInfo(new Block(), 1);
boolean added = this.addBlock(delimiter);
assert added : "Delimiting block cannot be present in the node";
if(newReport == null)
newReport = new BlockListAsLongs( new long[0]);
// scan the report and collect newly reported blocks
// Note we are taking special precaution to limit tmp blocks allocated
// as part this block report - which why block list is stored as longs
Block iblk = new Block(); // a fixed new'ed block to be reused with index i
for (int i = 0; i < newReport.getNumberOfBlocks(); ++i) {
iblk.set(newReport.getBlockId(i), newReport.getBlockLen(i),
newReport.getBlockGenStamp(i));
BlockInfo storedBlock = blocksMap.getStoredBlock(iblk);
if(storedBlock == null) {
// If block is not in blocksMap it does not belong to any file
toInvalidate.add(new Block(iblk));
continue;
}
if(storedBlock.findDatanode(this) < 0) {// Known block, but not on the DN
// if the size differs from what is in the blockmap, then return
// the new block. addStoredBlock will then pick up the right size of this
// block and will update the block object in the BlocksMap
if (storedBlock.getNumBytes() != iblk.getNumBytes()) {
toAdd.add(new Block(iblk));
} else {
toAdd.add(storedBlock);
}
continue;
}
// move block to the head of the list
this.moveBlockToHead(storedBlock);
}
// collect blocks that have not been reported
// all of them are next to the delimiter
Iterator<Block> it = new BlockIterator(delimiter.getNext(0), this);
while(it.hasNext())
toRemove.add(it.next());
this.removeBlock(delimiter);
}

    Datanode会定期向NameNode进行report,当然由于report十分消耗资源,所有report时间不会非常频繁。当汇报时候,会将新获得的Block与BlocksMap中的Block进行对比,如果BlocksMap中不存在该Block,则删除。如果缺少副本数则添加,其它的加入道Datanode到Block的映射中。









posted on 2012-03-25 21:11  萌@宇  阅读(1145)  评论(0编辑  收藏  举报

导航