Fork me on GitHub

sqoop的安装与配置

最近需要将MySQL的数据导出到HDFS,所以搜到了sqoop2。跟sqoop1相比,sqoop2的好处是直接使用程序连接到集群上的sqoop,远程操作。流程是需要先创建link也可以理解成要操作的对象,比如一个link是hdfs,一个link是mysql,有了link后需要创建job,创建job需要指定这两个link进行交互,设置from和to的关系,然后执行job就可以了。

安装:

安装真是个大问题,问题简直层出不穷,花了我整整一个晚上才把它勉强弄好,下面记录一下安装路上遇到的坑s。

首先,我安装的是1.99.7最新版本的,下载地址

官方文档可见:Apache Sqoop2

一、Hadoop安装

hadoop安装的具体操作可见该博客的第5节之后的内容:https://www.cnblogs.com/bjwu/p/9863634.html

注意⚠️,在配置core-site.xml的过程中,需要再添加一下两个属性:

<property>
  <name>hadoop.proxyuser.sqoop2.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.sqoop2.groups</name>
  <value>*</value>
</property>

并且,在配置文件container-executor.cfg中,记得添加:

allowed.system.users=sqoop2

二、Third party jars

第三方jars,由于我的项目需要,我只要导入mysql-connector-java就ok。在这里下载,解压后取得jar文件,执行以下命令:

# Create directory for extra jars
mkdir -p /var/lib/sqoop2/

# Copy all your JDBC drivers to this directory
cp mysql-jdbc*.jar /var/lib/sqoop2/

三、环境变量

.bash_profile中添加环境变量

export SQOOP_HOME=/usr/lib/sqoop 
export SQOOP_SERVER_EXTRA_LIB=/var/lib/sqoop2/
export PATH=$PATH:$SQOOP_HOME/bin

四、配置服务器

这里问题就来了,看到官网上是这样写的:

Second configuration file called sqoop.properties contains remaining configuration properties that can affect Sqoop server. The configuration file is very well documented, so check if all configuration properties fits your environment. Default or very little tweaking should be sufficient in most common cases.

然而,只是默认的配置还真不行:

打开sqoop.properties,将以下第一行改为你自己的目录,再加上另外三行:

官方文档上只说了配置上面第一项,mapreduce的配置文件路径,但后来运行出现authentication异常,找到sqoop文档描述security部分,发现sqoop2支持hadoop的simple和kerberos两种验证机制。所以配置了一个simple验证,这个异常才消除。

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=$HADOOP_HOME/etc/hadoop

org.apache.sqoop.security.authentication.type=SIMPLE  
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler  
org.apache.sqoop.security.authentication.anonymous=true  

当然在这个过程中,可能遇到若干个问题,比如

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/configuration/Configuration

你可以试试如下方法:

cp -R $HADOOP_HOME/share/hadoop/common/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/common/lib/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/hdfs/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/hdfs/lib/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/mapreduce/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/mapreduce/lib/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/yarn/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/yarn/lib/* $SQOOP_HOME/server/lib/

五、启动

配置完后第一次启动前需要进行配置初始化,即:

$ sqoop2-tool upgrade
Setting conf dir: /usr/lib/sqoop/bin/../conf
Sqoop home directory: /usr/lib/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.UpgradeTool
2019-01-10 22:31:06,509 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.

真香!之后,可以检测是否配置一切都正确:

$ sqoop2-tool verify 
Setting conf dir: /usr/lib/sqoop/bin/../conf
Sqoop home directory: /usr/lib/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
2019-01-10 22:31:42,317 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2019-01-10 22:31:42,326 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

启动服务器:

$ sqoop2-server start  
Setting conf dir: /usr/lib/sqoop/bin/../conf
Sqoop home directory: /usr/lib/sqoop
Starting the Sqoop2 server...
2019-01-10 22:37:22,806 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2019-01-10 22:37:22,816 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
Sqoop2 server started.

六、换个思路

好吧,说了这么多,我还是换成sqoop1了,因为sqoop2的操作及真正完全没有bug真是有点小复杂,学习成本有点高。

sqoop1的安装网上教程多的是。就说一点,在运行sqoop1程序的时候,导入maven的依赖有点多:

反正我因为各种Exception放了以下这么多的库😢:

<dependency>
	<groupId>org.apache.sqoop</groupId>
	<artifactId>sqoop</artifactId>
	<version>1.4.7</version>
	<scope>system</scope>
	<systemPath>${basedir}/lib/sqoop-1.4.7.jar</systemPath>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
	<version>2.6.5</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-mapreduce-client-core</artifactId>
	<version>2.6.5</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
	<version>2.6.5</version>
</dependency>
<dependency>
	<groupId>org.apache.avro</groupId>
	<artifactId>avro</artifactId>
	<version>1.8.2</version>
</dependency>

Reference:

  1. https://stackoverflow.com/questions/41405072/sqoop-integration-with-hadoop-throw-classnotfoundexception
  2. https://sqoop.apache.org/docs/1.99.7/admin/Installation.html
  3. http://brianoneill.blogspot.com/2014/10/sqoop-1993-w-hadoop-2-installation.html
  4. https://www.yiibai.com/sqoop/sqoop_installation.html
posted @ 2019-01-11 13:34  Byron_NG  阅读(5132)  评论(0编辑  收藏  举报