MapReduce案例运行
从《Hadoop权威指南》选取了一个小案例,在Hadoop集群环境中运行。
1、新建JAVA类,保存书中源代码。
[huser@master bin]$ vi URLCat.java import java.io.InputStream; import java.net.URL; import org.apache.hadoop.fs.FsUrlStreamHandlerFactory; import org.apache.hadoop.io.IOUtils; public class URLCat { static { URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); } public static void main(String[] args) throws Exception { InputStream in = null; try { in = new URL(args[0]).openStream(); IOUtils.copyBytes(in, System.out, 4096, false); } finally { IOUtils.closeStream(in); } } } ~ "URLCat.java" [新] 23L, 481C 已写入
2、编译JAVA类。
[huser@master bin]$ javac URLCat.java URLCat.java:4: 错误: 程序包org.apache.hadoop.fs不存在 import org.apache.hadoop.fs.FsUrlStreamHandlerFactory; ^ URLCat.java:5: 错误: 程序包org.apache.hadoop.io不存在 import org.apache.hadoop.io.IOUtils; ^ URLCat.java:10: 错误: 找不到符号 URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); ^ 符号: 类 FsUrlStreamHandlerFactory 位置: 类 URLCat URLCat.java:17: 错误: 找不到符号 IOUtils.copyBytes(in, System.out, 4096, false); ^ 符号: 变量 IOUtils 位置: 类 URLCat URLCat.java:19: 错误: 找不到符号 IOUtils.closeStream(in); ^ 符号: 变量 IOUtils 位置: 类 URLCat 5 个错误
这是因为找不到编译需要加载的类库,指定编译的类库路径。
[huser@master bin]$ javac -classpath ../hadoop-core-1.2.1.jar URLCat.java
[huser@master bin]$ ll 总用量 152 -rwxr-xr-x 1 huser huser 15147 7月 23 2013 hadoop -rwxr-xr-x 1 huser huser 2643 7月 23 2013 hadoop-config.sh -rwxr-xr-x 1 huser huser 5064 7月 23 2013 hadoop-daemon.sh -rwxr-xr-x 1 huser huser 1329 7月 23 2013 hadoop-daemons.sh -rwxr-xr-x 1 huser huser 2810 7月 23 2013 rcc -rwxr-xr-x 1 huser huser 2050 7月 23 2013 slaves.sh -rwxr-xr-x 1 huser huser 1166 7月 23 2013 start-all.sh -rwxr-xr-x 1 huser huser 1065 7月 23 2013 start-balancer.sh -rwxr-xr-x 1 huser huser 1745 7月 23 2013 start-dfs.sh -rwxr-xr-x 1 huser huser 1145 7月 23 2013 start-jobhistoryserver.sh -rwxr-xr-x 1 huser huser 1259 7月 23 2013 start-mapred.sh -rwxr-xr-x 1 huser huser 1119 7月 23 2013 stop-all.sh -rwxr-xr-x 1 huser huser 1116 7月 23 2013 stop-balancer.sh -rwxr-xr-x 1 huser huser 1246 7月 23 2013 stop-dfs.sh -rwxr-xr-x 1 huser huser 1131 7月 23 2013 stop-jobhistoryserver.sh -rwxr-xr-x 1 huser huser 1168 7月 23 2013 stop-mapred.sh -rwxr-xr-x 1 huser huser 63598 7月 23 2013 task-controller -rw-rw-r-- 1 huser huser 1021 4月 17 23:09 URLCat.class -rw-rw-r-- 1 huser huser 481 4月 17 23:04 URLCat.java
编译成功为CLASS。
3、运行程序
[huser@master bin]$ ../bin/hadoop URLCat hdfs://master/user/huser/in/test2.txt Warning: $HADOOP_HOME is deprecated. 14/04/17 23:34:37 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/04/17 23:34:38 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/04/17 23:34:39 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/04/17 23:34:40 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/04/17 23:34:41 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/04/17 23:34:42 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/04/17 23:34:43 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/04/17 23:34:44 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/04/17 23:34:45 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/04/17 23:34:46 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) Exception in thread "main" java.net.ConnectException: Call to master/192.168.1.115:8020 failed on connection exception: java.net.ConnectException: 拒绝连接 at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142) at org.apache.hadoop.ipc.Client.call(Client.java:1118) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422) at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263) at org.apache.hadoop.fs.FsUrlConnection.connect(FsUrlConnection.java:45) at org.apache.hadoop.fs.FsUrlConnection.getInputStream(FsUrlConnection.java:56) at java.net.URL.openStream(URL.java:1037) at URLCat.main(URLCat.java:16) Caused by: java.net.ConnectException: 拒绝连接 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:457) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583) at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249) at org.apache.hadoop.ipc.Client.call(Client.java:1093) ... 22 more
这是因为连接失败,需要检查HDFS环境。
[huser@master conf]$ cat core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property>
端口是9000,不是默认值。
[huser@master bin]$ ../bin/hadoop URLCat hdfs://master:9000/user/huser/in/test2.txt Warning: $HADOOP_HOME is deprecated. hello hadoop
运行成功。