总访问量: PV

DataScience && DataMining && BigData

SequenceFile实例操作

     HDFS API提供了一种二进制文件支持,直接将<key,value>对序列化到文件中,该文件格式是不能直接查看的,可以通过hadoop  dfs -text命令查看,后面跟上SequenceFile的HDFS路径

     通过写入SequenceFile和读入SequenceFile文件,打成jar包在Hadoop环境中运行。

     1.写入SequenceFile代码:

 1 package Hdfs;
 2 
 3 import java.io.IOException;
 4 import java.net.URI;
 5 
 6 import org.apache.hadoop.conf.Configuration;
 7 import org.apache.hadoop.fs.FileSystem;
 8 import org.apache.hadoop.fs.Path;
 9 import org.apache.hadoop.io.IOUtils;
10 import org.apache.hadoop.io.IntWritable;
11 import org.apache.hadoop.io.SequenceFile;
12 import org.apache.hadoop.io.Text;
13 
14 public class SequenceFileWriter {
15     private static final String[] text={
16         "床前明月光",
17         "疑似地上霜",
18         "举头望明月",
19         "低头思故乡"
20     };
21     public static void main(String[] args) {
22         String uri="hdfs://neusoft-master:9000/user/root/test/demo1";
23         Configuration conf=new Configuration();
24         SequenceFile.Writer writer=null;
25         
26         try {
27             FileSystem fs= FileSystem.get(URI.create(uri), conf);
28             Path path = new Path(uri);
29             IntWritable key = new IntWritable();
30             Text value = new Text();
31             writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), value.getClass());
32             for (int i = 0; i < 100; i++) {
33                 key.set(100-i);
34                 value.set(text[i%text.length]);
35                 writer.append(key, value);
36             }
37         } catch (IOException e) {
38             e.printStackTrace();
39         }finally{
40             IOUtils.closeStream(writer);
41         }
42     }
43 }
SequenceFileWriter

        windows上打包成testseq.jar包,通过SecureFx上传到Linux对应目录中。

        以下是linux操作步骤:

        

        

        

    问题:如果CRT显示乱码如何解决?

       

      在会话选项修改字符编码即可。

       

     2.读入SequenceFile代码:

 1 package Hdfs;
 2 
 3 import java.io.IOException;
 4 import java.net.URI;
 5 
 6 import org.apache.hadoop.conf.Configuration;
 7 import org.apache.hadoop.fs.FileSystem;
 8 import org.apache.hadoop.fs.Path;
 9 import org.apache.hadoop.io.IOUtils;
10 import org.apache.hadoop.io.SequenceFile;
11 import org.apache.hadoop.io.Writable;
12 import org.apache.hadoop.util.ReflectionUtils;
13 
14 public class SequenceFileReader {
15     
16     public static void main(String[] args) {
17         String uri="hdfs://neusoft-master:9000/user/root/test/demo1";
18         Configuration conf = new Configuration();
19         SequenceFile.Reader reader =null;
20         try {
21             FileSystem fs = FileSystem.get(URI.create(uri),conf);
22             Path path = new Path(uri);
23             reader=new SequenceFile.Reader(fs, path,conf);
24             Writable key = (Writable)ReflectionUtils.newInstance(reader.getKeyClass(), conf);
25             Writable value =(Writable)ReflectionUtils.newInstance(reader.getValueClass(), conf); 
26             long position = reader.getPosition();
27             while (reader.next(key,value)) {
28                 System.out.printf("[%s]\t%s\n",key,value);
29                 position=reader.getPosition();
30             }
31         } catch (IOException e) {
32             e.printStackTrace();
33         }finally{
34             IOUtils.closeStream(reader);
35         }
36         
37     }
38 }
SequenceFileReader

        windows上打包成testseq.jar包,在程序中指定主類,提交jar包時無需指定,通过SecureFx上传到Linux对应目录中。

       

        以下是linux操作步骤:

        

END::SequenceFile

可通過IP:50070訪問

         

 

posted @ 2017-01-30 20:19  CJZhaoSimons  阅读(1002)  评论(0编辑  收藏  举报