Hadoop blocks

一In cases where the last record in a block is incomplete, the input split includes location information for the next block and the byte offset of the data needed to complete the record.

假如我们有一个128M的文本文件，HADOOP blocksize默认是64M，则我们的文件上传上到HDFS需要有两个Blocks来存储，但如果我们第一个block在切分64M的时候，

是切在中间位置，即没有包含行的尾巴，那么使用Textinputformat进行处理的时候，哪个mapper会读到这条信息？

根据这句话的意思，包含行头的mapper所含的inputsplit信息会包含下一个block的信息和需要读取多少来完整读完这一行的偏移量信息。

posted on 2015-04-21 06:46 tneduts 阅读(159) 评论(2) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

我的空中楼阁

Hadoop blocks

导航

公告