下载安装包
|
sqoop-1.99.3-bin-hadoop200.tar.gz
|
解压
|
tar zxvf sqoop-1.99.3-bin-hadoop200.tar.gz
|
建立sqoop链接
|
ln -s sqoop-1.99.3-bin-hadoop200 sqoop
|
修改sqoop配置
|
cd sqoop
vi server/conf/catalina.properties
修改内容如下: 找到common.loader行,把/usr/lib/hadoop/lib/*.jar改成你的hadoop jar 包目录 例如:/home/hadoop/hadoop/share/hadoop/yarn/lib/*.jar, /home/hadoop/hadoop/share/hadoop/yarn/*.jar, /home/hadoop/hadoop/share/hadoop/hdfs/*.jar, /home/hadoop/hadoop/share/hadoop/hdfs/lib/*.jar, /home/hadoop/hadoop/share/hadoop/mapreduce/*.jar, /home/hadoop/hadoop/share/hadoop/mapreduce/lib/*.jar, /home/hadoop/hadoop/share/hadoop/common/lib/*.jar, /home/hadoop/hadoop/share/hadoop/common/*.jar
vi server/conf/sqoop.properties 找到:mapreduce.configuration.directory行,修改值为你的hadoop配置文件目录 如:/home/hadoop/hadoop/etc/hadoop/ 并且替换@LOGDIR@ 和@BASEDIR@ : 0,$ s/@LOGDIR@/logs/g 0,$ s/@BASEDIR@/base/g
然后找到你的数据库jdbc驱动复制到sqoop/lib目录下,如果不存在则创建
|
修改环境参数
|
vi /etc/profile
增加以下内容:
export SQOOP_HOME=/home/hadoop/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export CATALINA_BASE=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs/
|
执行环境参数
|
source /etc/profile
|
启动
|
./bin/sqoop.sh server start
|
测试
|
bin/sqoop.sh client 默认sqoop开启ports 12000 and 12001
|
停止
|
./bin/sqoop.sh server stop
|
Configure client to use your Sqoop server:
sqoop:000> set server --host your.host.com --port 12000 --webapp sqoop
显示版本:show version --all
显示连接器:show connector --all
创建连接:create connection --cid 1
Creating connection for connector with id 1
Please fill following values to create new connection object
Name: First connection
Configuration configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://mysql.server/database
Username: sqoop
Password: *****
JDBC Connection Properties:
There are currently 0 values in the map:
entry#
Security related configuration options
Max connections: 0
New connection was successfully created with validation status FINE and persistent id 1
显示连接:show connection
创建任务:create job --xid 1 --type import
sqoop:000> create job --xid 1 --type import
Creating job for connection with id 1
Please fill following values to create new job object
Name: First job
Database configuration
Table name: users
Table SQL statement:
Table column names:
Partition column name:
Boundary query:
Output configuration
Storage type:
0 : HDFS
Choose: 0
Output directory: /user/jarcec/users
New job was successfully created with validation status FINE and persistent id 1
Throttling resources
Extractors: 20
Loaders: 10
注意创建job过程中会出现Extractors跟Loaders分别对应map 跟reduce个数
启动任务:start job --jid 1
启动任务同步执行:start job --jid 1 -s
显示任务:status job --jid 1
显示所有任务:show job -a
停止任务:stop job --jid 1
克隆连接:clone connection --xid 1
克隆任务:clone job --jid 1
运行wordcount出现:Application application_1396260476774_0001 failed 2 times due to AM Container for appattempt_1396260476774_0001_000002 exited with exitCode: 1 due to: Exception from container-launch
查看
hadoop/logs/userlogs/application_1386683368281_0001/container_1386683368281_0001_01_000001/stderr
yarn配置修改完后,可以正常跑wordcount,sqoop还是提示Exception from container-launch: 这个时候把sqoop server 重启就行
导出数据出现异常
is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 1.6 GB of 6 GB virtual memory used. Killing container.
修改mapred-site.xml
<property>
<name>mapred.map.child.java.opts</name>
<value>-Xmx8000m</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>8</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>2046</value>
</property>
使用sqoop导入数据时,当数据量变大时,在map/reduce的过程中就会提示 java heap space error。经过总结,解决方法有两个:
1、 修改每个运行子进程的jvm大小
修改mapred-site.xml文件,添加以下属性:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx8000m</value>
</property>
<property>
<name>mapred.reduce.child.java.opts</name>
<value>-Xmx8000m</value>
</property>
<property>
<name>mapred.map.child.java.opts</name>
<value>-Xmx8000m</value>
</property>
2、 增加map数量,
sqoop job里设置Extractors与Loaders数量