Sqoop安装及操作
一、集群环境:
Hostname |
IP |
Hadoop版本 |
Hadoop |
功能 |
系统 |
node1
|
192.168.1.151 |
0.20.0 |
namenode |
hive+sqoop |
rhel5.4X86 |
node2
|
192.168.1.152 |
0.20.0 |
datanode |
mysql |
rhel5.4X86 |
node3
|
192.168.1.153 |
0.20.0 |
datanode |
|
rhel5.4X86 |
二、安装sqoop
1、下载sqoop压缩包,并解压
压缩包分别是:sqoop-1.2.0-CDH3B4.tar.gz,hadoop-0.20.2-CDH3B4.tar.gz, Mysql JDBC驱动包mysql-connector-java-5.1.10-bin.jar
[root@node1 ~]# ll drwxr-xr-x 15 root root 4096 Feb 22 2011 hadoop-0.20.2-CDH3B4 -rw-r--r-- 1 root root 724225 Sep 15 06:46 mysql-connector-java-5.1.10-bin.jar drwxr-xr-x 11 root root 4096 Feb 22 2011 sqoop-1.2.0-CDH3B4
2、将sqoop-1.2.0-CDH3B4拷贝到/home/hadoop目录下,并将Mysql JDBC驱动包和hadoop-0.20.2-CDH3B4下的hadoop-core-0.20.2-CDH3B4.jar至sqoop-1.2.0-CDH3B4/lib下,最后修改一下属主。
[root@node1 ~]# cp mysql-connector-java-5.1.10-bin.jar sqoop-1.2.0-CDH3B4/lib [root@node1 ~]# cp hadoop-0.20.2-CDH3B4/hadoop-core-0.20.2-CDH3B4.jar sqoop-1.2.0-CDH3B4/lib [root@node1 ~]# chown -R hadoop:hadoop sqoop-1.2.0-CDH3B4 [root@node1 ~]# mv sqoop-1.2.0-CDH3B4 /home/hadoop [root@node1 ~]# ll /home/hadoop total 35748 -rw-rw-r-- 1 hadoop hadoop 343 Sep 15 05:13 derby.log drwxr-xr-x 13 hadoop hadoop 4096 Sep 14 16:16 hadoop-0.20.2 drwxr-xr-x 9 hadoop hadoop 4096 Sep 14 20:21 hive-0.10.0 -rw-r--r-- 1 hadoop hadoop 36524032 Sep 14 20:20 hive-0.10.0.tar.gz drwxr-xr-x 8 hadoop hadoop 4096 Sep 25 2012 jdk1.7 drwxr-xr-x 12 hadoop hadoop 4096 Sep 15 00:25 mahout-distribution-0.7 drwxrwxr-x 5 hadoop hadoop 4096 Sep 15 05:13 metastore_db -rw-rw-r-- 1 hadoop hadoop 406 Sep 14 16:02 scp.sh drwxr-xr-x 11 hadoop hadoop 4096 Feb 22 2011 sqoop-1.2.0-CDH3B4 drwxrwxr-x 3 hadoop hadoop 4096 Sep 14 16:17 temp drwxrwxr-x 3 hadoop hadoop 4096 Sep 14 15:59 user
3、配置configure-sqoop,注释掉对于HBase和ZooKeeper的检查
[root@node1 bin]# pwd /home/hadoop/sqoop-1.2.0-CDH3B4/bin [root@node1 bin]# vi configure-sqoop #!/bin/bash # # Licensed to Cloudera, Inc. under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. . . . # Check: If we can't find our dependencies, give up here. if [ ! -d "${HADOOP_HOME}" ]; then echo "Error: $HADOOP_HOME does not exist!" echo 'Please set $HADOOP_HOME to the root of your Hadoop installation.' exit 1 fi #if [ ! -d "${HBASE_HOME}" ]; then # echo "Error: $HBASE_HOME does not exist!" # echo 'Please set $HBASE_HOME to the root of your HBase installation.' # exit 1 #fi #if [ ! -d "${ZOOKEEPER_HOME}" ]; then # echo "Error: $ZOOKEEPER_HOME does not exist!" # echo 'Please set $ZOOKEEPER_HOME to the root of your ZooKeeper installation.' # exit 1 #fi
4、修改/etc/profile和.bash_profile文件,添加Hadoop_Home,调整PATH
[hadoop@node1 ~]$ vi .bash_profile # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs HADOOP_HOME=/home/hadoop/hadoop-0.20.2 PATH=$HADOOP_HOME/bin:$PATH:$HOME/bin export HIVE_HOME=/home/hadoop/hive-0.10.0 export MAHOUT_HOME=/home/hadoop/mahout-distribution-0.7 export PATH HADOOP_HOME
三、测试Sqoop
1、查看mysql中的数据库:
[hadoop@node1 bin]$ ./sqoop list-databases --connect jdbc:mysql://192.168.1.152:3306/ --username sqoop --password sqoop 13/09/15 07:17:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 13/09/15 07:17:17 INFO manager.MySQLManager: Executing SQL statement: SHOW DATABASES information_schema mysql performance_schema sqoop test
2、将mysql的表导入到hive中:
[hadoop@node1 bin]$ ./sqoop import --connect jdbc:mysql://192.168.1.152:3306/sqoop --username sqoop --password sqoop --table test --hive-import -m 1 13/09/15 08:15:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 13/09/15 08:15:01 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 13/09/15 08:15:01 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 13/09/15 08:15:01 INFO tool.CodeGenTool: Beginning code generation 13/09/15 08:15:01 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:02 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:02 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2/bin/.. 13/09/15 08:15:02 INFO orm.CompilationManager: Found hadoop core jar at: /home/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar 13/09/15 08:15:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a71936fd2bb45ea6757df22751a320e3/test.jar 13/09/15 08:15:03 WARN manager.MySQLManager: It looks like you are importing from mysql. 13/09/15 08:15:03 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 13/09/15 08:15:03 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 13/09/15 08:15:03 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 13/09/15 08:15:03 INFO mapreduce.ImportJobBase: Beginning import of test 13/09/15 08:15:04 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:05 INFO mapred.JobClient: Running job: job_201309150505_0009 13/09/15 08:15:06 INFO mapred.JobClient: map 0% reduce 0% 13/09/15 08:15:34 INFO mapred.JobClient: map 100% reduce 0% 13/09/15 08:15:36 INFO mapred.JobClient: Job complete: job_201309150505_0009 13/09/15 08:15:36 INFO mapred.JobClient: Counters: 5 13/09/15 08:15:36 INFO mapred.JobClient: Job Counters 13/09/15 08:15:36 INFO mapred.JobClient: Launched map tasks=1 13/09/15 08:15:36 INFO mapred.JobClient: FileSystemCounters 13/09/15 08:15:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=583323 13/09/15 08:15:36 INFO mapred.JobClient: Map-Reduce Framework 13/09/15 08:15:36 INFO mapred.JobClient: Map input records=65536 13/09/15 08:15:36 INFO mapred.JobClient: Spilled Records=0 13/09/15 08:15:36 INFO mapred.JobClient: Map output records=65536 13/09/15 08:15:36 INFO mapreduce.ImportJobBase: Transferred 569.6514 KB in 32.0312 seconds (17.7842 KB/sec) 13/09/15 08:15:36 INFO mapreduce.ImportJobBase: Retrieved 65536 records. 13/09/15 08:15:36 INFO hive.HiveImport: Removing temporary files from import process: test/_logs 13/09/15 08:15:36 INFO hive.HiveImport: Loading uploaded data into Hive 13/09/15 08:15:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:36 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1 13/09/15 08:15:41 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/hadoop/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties 13/09/15 08:15:41 INFO hive.HiveImport: Hive history file=/tmp/hadoop/hive_job_log_hadoop_201309150815_1877092059.txt 13/09/15 08:16:10 INFO hive.HiveImport: OK 13/09/15 08:16:10 INFO hive.HiveImport: Time taken: 28.791 seconds 13/09/15 08:16:11 INFO hive.HiveImport: Loading data to table default.test 13/09/15 08:16:12 INFO hive.HiveImport: Table default.test stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 583323, raw_data_size: 0] 13/09/15 08:16:12 INFO hive.HiveImport: OK 13/09/15 08:16:12 INFO hive.HiveImport: Time taken: 1.704 seconds 13/09/15 08:16:12 INFO hive.HiveImport: Hive import complete.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 使用C#创建一个MCP客户端
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· ollama系列1:轻松3步本地部署deepseek,普通电脑可用
· 按钮权限的设计及实现