Hadoop源码分析之读文件时NameNode和DataNode的处理过程
转自: http://blog.csdn.net/workformywork/article/details/21783861
从NameNode节点获取数据块所在节点等信息
客户端在和数据节点建立流式接口的TCP连接,读取文件数据前需要定位数据的位置,所以首先客户端在 DFSClient.callGetBlockLocations()
方法中调用了远程方法 ClientProtocol.getBlockLocations()
,调用该方法返回一个LocatedBlocks对象,包含了一系列的LocatedBlock实例,通过这些信息客户端就知道需要到哪些数据节点上去获取数据。这个方法会在NameNode.getBlockLocations()中调用,进而调用FSNamesystem.同名的来进行实际的调用过程,FSNamesystem有三个重载方法,代码如下:
LocatedBlocks getBlockLocations(String clientMachine, String src,
long offset, long length) throws IOException {
LocatedBlocks blocks = getBlockLocations(src, offset, length, true, true,
true);
if (blocks != null) {//如果blocks不为空,那么就对数据块所在的数据节点进行排序
//sort the blocks
// In some deployment cases, cluster is with separation of task tracker
// and datanode which means client machines will not always be recognized
// as known data nodes, so here we should try to get node (but not
// datanode only) for locality based sort.
Node client = host2DataNodeMap.getDatanodeByHost(
clientMachine);
if (client == null) {
List<String> hosts = new ArrayList<String> (1);
hosts.add(clientMachine);
String rName = dnsToSwitchMapping.resolve(hosts).get(0);
if (rName != null)
client = new NodeBase(clientMachine, rName);
}
DFSUtil.StaleComparator comparator = null;
if (avoidStaleDataNodesForRead) {
comparator = new DFSUtil.StaleComparator(staleInterval);
}
// Note: the last block is also included and sorted
for (LocatedBlock b : blocks.getLocatedBlocks()) {
clusterMap.pseudoSortByDistance(client, b.getLocations());
if (avoidStaleDataNodesForRead) {
Arrays.sort(