SequenceFile实例操作
HDFS API提供了一种二进制文件支持,直接将<key,value>对序列化到文件中,该文件格式是不能直接查看的,可以通过hadoop dfs -text命令查看,后面跟上SequenceFile的HDFS路径
通过写入SequenceFile和读入SequenceFile文件,打成jar包在Hadoop环境中运行。
1.写入SequenceFile代码:
1 package Hdfs; 2 3 import java.io.IOException; 4 import java.net.URI; 5 6 import org.apache.hadoop.conf.Configuration; 7 import org.apache.hadoop.fs.FileSystem; 8 import org.apache.hadoop.fs.Path; 9 import org.apache.hadoop.io.IOUtils; 10 import org.apache.hadoop.io.IntWritable; 11 import org.apache.hadoop.io.SequenceFile; 12 import org.apache.hadoop.io.Text; 13 14 public class SequenceFileWriter { 15 private static final String[] text={ 16 "床前明月光", 17 "疑似地上霜", 18 "举头望明月", 19 "低头思故乡" 20 }; 21 public static void main(String[] args) { 22 String uri="hdfs://neusoft-master:9000/user/root/test/demo1"; 23 Configuration conf=new Configuration(); 24 SequenceFile.Writer writer=null; 25 26 try { 27 FileSystem fs= FileSystem.get(URI.create(uri), conf); 28 Path path = new Path(uri); 29 IntWritable key = new IntWritable(); 30 Text value = new Text(); 31 writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), value.getClass()); 32 for (int i = 0; i < 100; i++) { 33 key.set(100-i); 34 value.set(text[i%text.length]); 35 writer.append(key, value); 36 } 37 } catch (IOException e) { 38 e.printStackTrace(); 39 }finally{ 40 IOUtils.closeStream(writer); 41 } 42 } 43 }
windows上打包成testseq.jar包,通过SecureFx上传到Linux对应目录中。
以下是linux操作步骤:
问题:如果CRT显示乱码如何解决?
在会话选项修改字符编码即可。
2.读入SequenceFile代码:
1 package Hdfs; 2 3 import java.io.IOException; 4 import java.net.URI; 5 6 import org.apache.hadoop.conf.Configuration; 7 import org.apache.hadoop.fs.FileSystem; 8 import org.apache.hadoop.fs.Path; 9 import org.apache.hadoop.io.IOUtils; 10 import org.apache.hadoop.io.SequenceFile; 11 import org.apache.hadoop.io.Writable; 12 import org.apache.hadoop.util.ReflectionUtils; 13 14 public class SequenceFileReader { 15 16 public static void main(String[] args) { 17 String uri="hdfs://neusoft-master:9000/user/root/test/demo1"; 18 Configuration conf = new Configuration(); 19 SequenceFile.Reader reader =null; 20 try { 21 FileSystem fs = FileSystem.get(URI.create(uri),conf); 22 Path path = new Path(uri); 23 reader=new SequenceFile.Reader(fs, path,conf); 24 Writable key = (Writable)ReflectionUtils.newInstance(reader.getKeyClass(), conf); 25 Writable value =(Writable)ReflectionUtils.newInstance(reader.getValueClass(), conf); 26 long position = reader.getPosition(); 27 while (reader.next(key,value)) { 28 System.out.printf("[%s]\t%s\n",key,value); 29 position=reader.getPosition(); 30 } 31 } catch (IOException e) { 32 e.printStackTrace(); 33 }finally{ 34 IOUtils.closeStream(reader); 35 } 36 37 } 38 }
windows上打包成testseq.jar包,在程序中指定主類,提交jar包時無需指定,通过SecureFx上传到Linux对应目录中。
以下是linux操作步骤:
END::SequenceFile
可通過IP:50070訪問
博客地址:http://www.cnblogs.com/jackchen-Net/