Hadoop Shell 介绍
以 hadoop 2.7.3 为例
bin 目录下是最基础的集群管理脚本, 用户可通过该脚本完成各种功能, 如 HDFS 管理, MapReduce 作业管理等.
作为入门, 先介绍bin 目录下的 hadoop 脚本的使用方法, 如下所示: 参考 官网的 Hadoop 命令参考
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
hadoop 对应在 hadoop-2.7.3/bin/hadoop , 相关 shell 代码如下: ( fs 对应 org.apache.hadoop.fs.FsShell , jar 对应 org.apache.hadoop.util.RunJar )
// 这段在 hadoop-2.7.3/bin/hadoop # the core commands ...... if [ "$COMMAND" = "fs" ] ; then CLASS=org.apache.hadoop.fs.FsShell elif [ "$COMMAND" = "version" ] ; then CLASS=org.apache.hadoop.util.VersionInfo elif [ "$COMMAND" = "jar" ] ; then CLASS=org.apache.hadoop.util.RunJar if [[ -n "${YARN_OPTS}" ]] || [[ -n "${YARN_CLIENT_OPTS}" ]]; then echo "WARNING: Use \"yarn jar\" to launch YARN applications." 1>&2 fi ......
bin 目录下的 hdfs 脚本的使用方法, 如下所示: 参考 官网的 HDFS 命令参考
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
where COMMAND is one of:
dfs run a filesystem command on the file systems supported in Hadoop.
classpath prints the classpath
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
journalnode run the DFS journalnode
zkfc run the ZK Failover Controller daemon
datanode run a DFS datanode
dfsadmin run a DFS admin client
haadmin run a DFS HA admin client
fsck run a DFS filesystem checking utility
balancer run a cluster balancing utility
jmxget get JMX exported values from NameNode or DataNode.
mover run a utility to move block replicas across
storage types
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory contents with a snapshot
lsSnapshottableDir list all snapshottable dirs owned by the current user
Use -help to see options
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies list/get/set block storage policies
version print the version
Most commands print help when invoked w/o parameters.
bin 目录下的 mapred 脚本的使用方法, 如下所示: 参考 官网的 MapReduce 命令参考
Usage: mapred [--config confdir] [--loglevel loglevel] COMMAND
where COMMAND is one of:
pipes run a Pipes job
job manipulate MapReduce jobs
queue get information regarding JobQueues
classpath prints the class path needed for running
mapreduce subcommands
historyserver run job history servers as a standalone daemon
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
hsadmin job history server admin interface
Most commands print help when invoked w/o parameters.
bin 目录下的 yarn 脚本的使用方法, 如下所示: 参考 官网的 YARN 命令
Usage: yarn [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
resourcemanager -format-state-store deletes the RMStateStore
resourcemanager run the ResourceManager
nodemanager run a nodemanager on each slave
timelineserver run the timeline server
rmadmin admin tools
sharedcachemanager run the SharedCacheManager daemon
scmadmin SharedCacheManager admin tools
version print the version
jar <jar> run a jar file
application prints application(s)
report/kill application
applicationattempt prints applicationattempt(s)
report
container prints container(s) report
node prints node report(s)
queue prints queue information
logs dump container logs
classpath prints the class path needed to
get the Hadoop jar and the
required libraries
cluster prints cluster information
daemonlog get/set the log level for each
daemon
Most commands print help when invoked w/o parameters.
bin 目录下的 rcc 脚本的使用方法, 如下所示:
Usage: rcc --language [java|c++] ddl-files
其中, --config 用于设置Hadoop 配置文件目录. 默认目录为 ${HADOOP_HOME}/etc/hadoop. 而 COMMAND 是具体的某个命令, 常用的是 hadoop 的管理命令 fs, 作业提交命令 jar 等. CLASSNAME 指运行名为 CLASSNAME 的类 .