Spark+Hadoop+Hive集群上数据操作记录

[rc@vq18ptkh01 ~]$ hadoop fs -ls /
drwxr-xr-x+  - jc_rc      supergroup                 0 2016-11-03 11:46 /dt
[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1030.csv /dt
[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1031.csv /dt
[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1101.csv /dt

[rc@vq18ptkh01 ~]$ hadoop fs -ls /dt
16/11/03 11:53:16 INFO hdfs.PeerCache: SocketCache disabled.
Found 3 items
-rw-r--r--+  3 jc_rc supergroup    1548749 2016-11-03 11:48 /dt/wifi_phone_list_1030.csv
-rw-r--r--+  3 jc_rc supergroup    1262964 2016-11-03 11:52 /dt/wifi_phone_list_1031.csv
-rw-r--r--+  3 jc_rc supergroup     979619 2016-11-03 11:52 /dt/wifi_phone_list_1101.csv


[rc@vq18ptkh01 ~]$ beeline
Connecting to jdbc:hive2://1.8.15.1:24002,10.78.152.24:24002,1.8.15.2:24002,1.8.12.42:24002,1.8.15.62:24002/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;sasl.qop=auth-conf;auth=KERBEROS;principal=hive/hadoop.hadoop.com@HADOOP.COM
Debug is  true storeKey false useTicketCache true useKeyTab false doNotPrompt false ticketCache is null isInitiator true KeyTab is null refreshKrb5Config is false principal is null tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Acquire TGT from Cache
Principal is jc_rc@HADOOP.COM
Commit Succeeded 

Connected to: Apache Hive (version 1.3.0)
Driver: Hive JDBC (version 1.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.3.0 by Apache Hive
0: jdbc:hive2://1.8.15.2:21066/> use r_hive_db;
No rows affected (0.547 seconds)

0: jdbc:hive2://1.8.15.2:21066/> create table tmp_wifi1030(imisdn string,starttime string,endtime string) row format delimited fields terminated by ',' stored as textfile;
0: jdbc:hive2://1.8.15.2:21066/> show tables;

[rc@vq18ptkh01 ~]$ wc wifi_phone_list_1030.csv -l
25390 wifi_phone_list_1030.csv
+---------------+--+
|   tab_name    |
+---------------+--+
| tmp_wifi1030  |
+---------------+--+
1 row selected (0.401 seconds)
0: jdbc:hive2://1.8.15.2:21066/> load data inpath 'hdfs:/dt/wifi_phone_list_1030.csv' into table tmp_wifi1030;
0: jdbc:hive2://1.8.15.2:21066/> select * from tmp_wifi1030;
| tmp_wifi1030.imisdn  |  tmp_wifi1030.starttime  |   tmp_wifi1030.endtime   |
+----------------------+--------------------------+--------------------------+--+
| 18806503523          | 2016-10-30 23:58:56.000  | 2016-10-31 00:01:07.000  |
| 15700125216          | 2016-10-30 23:58:57.000  | 2016-10-31 00:01:49.000  |
+----------------------+--------------------------+--------------------------+--+
25,390 rows selected (5.649 seconds)

0: jdbc:hive2://1.8.15.2:21066/> select count(*) from tmp_wifi1030;
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1475071482566_2471703
INFO  : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19416140 for jc_rc)
INFO  : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 57 df 96 8a 01 58 4c 64 63 96 8d 0d 65 ff 8e 03 97
INFO  : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2471703/
INFO  : Starting Job = job_1475071482566_2471703, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2471703/
INFO  : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job  -kill job_1475071482566_2471703
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO  : 2016-11-03 12:04:58,351 Stage-1 map = 0%,  reduce = 0%
INFO  : 2016-11-03 12:05:04,702 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.72 sec
INFO  : 2016-11-03 12:05:12,096 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.86 sec
INFO  : MapReduce Total cumulative CPU time: 4 seconds 860 msec
INFO  : Ended Job = job_1475071482566_2471703
+--------+--+
|  _c0   |
+--------+--+
| 25390  |
+--------+--+
1 row selected (25.595 seconds)


0: jdbc:hive2://1.8.15.62:21066/> select * from default.d_s1mme limit 10;
+----------------------+--------------------+-------------------------+----------------------+-------------------+--------------------+--------------------+----------------------+------------------------------+------------------------------------+----------------------------------+--------------------------------+-----------------------------+-----------------------------+-----------------------+------------------------------+----------------------------+------------------------+----------------------+--------------------+------------------------------------+------------------------------------+--------------------------+--------------------------+------------------------+------------------------+-------------------+-----------------------+-------------------------+-------------------------+-------------------+---------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------------------+--+
| d_s1mme .length  | d_s1mme .city  。。。。。。。。。。。。                           |                               | 2016101714           |
| NULL                 | 579                | 5                       | 130980097fb8c900     | 6                 | 460006791248581    | 352093070081343    | 88888888888888888    | 20                           | 2016-10-17 13:30:23.0              | 2016-10-17 13:30:23.0            | 0                              | 20                          | NULL                        | 0                     | 209743848                    | 419                        | 32                     | D5095073             | NULL               | NULL                               | NULL                               | 100.67.254.45            | 100.111.211.166          | 36412                  | 36412                  | 589D              | BAE6802               | NULL                    | NULL                    | NULL              | 0                               | NULL                      | NULL                        | NULL                       | NULL                          | NULL                                | NULL                                |                           |                             |                            |                               |                                     |                                     |                               |                               |                               |                               | 2016101714           |
+----------------------+--------------------+-------------------------+----------------------+-------------------+--------------------+--------------------+----------------------+------------------------------+------------------------------------+----------------------------------+--------------------------------+-----------------------------+-----------------------------+-----------------------+------------------------------+----------------------------+------------------------+----------------------+--------------------+------------------------------------+------------------------------------+--------------------------+--------------------------+------------------------+------------------------+-------------------+-----------------------+-------------------------+-------------------------+-------------------+---------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------------------+--+
10 rows selected (0.6 seconds)

create table tmp_mr_s1_mme1030 as select a.length,a.city,a.interface,a.xdr_id,a.rat,a.imsi,a.imei,a.msisdn,a.procedure_start_time,a.procedure_end_time,a.mme_ue_s1ap_id,a.mme_group_id,a.mme_code,a.user_ipv4,a.tac,a.cell_id,a.other_tac,a.other_eci from default.d_s1mme a join r_hive_db.tmp_wifi1030 b on a.msisdn=b.imisdn and a.p_hour>='20161030' and a.p_hour<'20161031'; 0: jdbc:hive2://1.8.15.2:21066/> create table tmp_mr_s1_mme_enbs1030 as 0: jdbc:hive2://1.8.15.2:21066/> select cell_id/256 from tmp_mr_s1_mme1030; 0: jdbc:hive2://1.8.15.62:21066/> create table tmp_mr_s1_mme_cellids1030 as select distinct cast(cell_id as bigint) as cellid from tmp_mr_s1_mme1030; 0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.mapfiles; +---------------------------+--+ | set | +---------------------------+--+ | hive.merge.mapfiles=true | +---------------------------+--+ 1 row selected (0.022 seconds) 0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.mapredfields; +---------------------------------------+--+ | set | +---------------------------------------+--+ | hive.merge.mapredfields is undefined | +---------------------------------------+--+ 1 row selected (0.022 seconds) 0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.size.per.task=1024000000; No rows affected (0.012 seconds) 0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.smallfiles.avgsize=1024000000; No rows affected (0.012 seconds) 0: jdbc:hive2://1.8.15.62:21066/> use r_hive_db; No rows affected (0.031 seconds) 0: jdbc:hive2://1.8.15.62:21066/> insert overwrite directory '/dt/' row format delimited fields terminated by '|' select * from tmp_mr_s1_mme_cellids1030; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:17 INFO : Submitting tokens for job: job_1475071482566_2477152 INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19422634 for jc_rc) INFO : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 d2 8f 0b 8a 01 58 4c df 13 0b 8d 0d 6c 4b 8e 03 98 INFO : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2477152/ INFO : Starting Job = job_1475071482566_2477152, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2477152/ INFO : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job -kill job_1475071482566_2477152 INFO : Hadoop job information for Stage-1: number of mappers: 17; number of reducers: 0 INFO : 2016-11-03 14:40:52,492 Stage-1 map = 0%, reduce = 0% INFO : 2016-11-03 14:40:58,835 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 28.78 sec INFO : 2016-11-03 14:40:59,892 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 33.55 sec INFO : 2016-11-03 14:41:10,486 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 37.13 sec INFO : 2016-11-03 14:41:11,549 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 41.13 sec INFO : MapReduce Total cumulative CPU time: 41 seconds 130 msec INFO : Ended Job = job_1475071482566_2477152 INFO : Stage-3 is filtered out by condition resolver. INFO : Stage-2 is selected by condition resolver. INFO : Stage-4 is filtered out by condition resolver. INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:1 INFO : Submitting tokens for job: job_1475071482566_2477181 INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19422663 for jc_rc) INFO : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 d2 8f 0b 8a 01 58 4c df 13 0b 8d 0d 6c 4b 8e 03 98 INFO : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2477181/ INFO : Starting Job = job_1475071482566_2477181, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2477181/ INFO : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job -kill job_1475071482566_2477181 INFO : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0 INFO : 2016-11-03 14:41:22,190 Stage-2 map = 0%, reduce = 0% INFO : 2016-11-03 14:41:28,571 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.2 sec INFO : MapReduce Total cumulative CPU time: 2 seconds 200 msec INFO : Ended Job = job_1475071482566_2477181 INFO : Moving data to directory /dt from hdfs://hacluster/dt/.hive-staging_hive_2016-11-03_14-40-43_774_4317869403646242426-140183/-ext-10000 No rows affected (46.604 seconds) [rc@vq18ptkh01 dt]$ hadoop fs -ls /dt 16/11/03 14:46:18 INFO hdfs.PeerCache: SocketCache disabled. Found 1 items -rwxrwxrwx+ 3 jc_rc supergroup 26819 2016-11-03 14:41 /dt/000000_0 [rc@vq18ptkh01 dt]$ hadoop fs -copyToLocal /dt/000000_0 16/11/03 14:46:33 INFO hdfs.PeerCache: SocketCache disabled. [rc@vq18ptkh01 dt]$ ls 000000_0 [rc@vq18ptkh01 dt]$ [rc@vq18ptkh01 dt]$ ls 000000_0 000001_0 000002_0 000003_0 000004_0 000005_0 [rc@vq18ptkh01 dt]$ ftp 10.70.41.126 21 Connected to 10.70.41.126 (10.70.41.126). 220 10.70.41.126 FTP server ready Name (10.70.41.126:rc): joy 331 Password required for joy. Password: 230 User joy logged in. Remote system type is UNIX. Using binary mode to transfer files. ftp> put 000000_0 /Temp/a_dt/ local: 000000_0 remote: /Temp/a_dt/ 227 Entering Passive Mode (10,70,41,126,168,163). 550 /Temp/a_dt/: Not a regular file ftp> put (local-file) 000000_0 (remote-file) /Temp/a_dt/000000_0 local: 000000_0 remote: /Temp/a_dt/000000_0 227 Entering Passive Mode (10,70,41,126,168,207). 150 Opening BINARY mode data connection for /Temp/a_dt/000000_0 226 Transfer complete. 1049905992 bytes sent in 33 secs (31787.20 Kbytes/sec) ftp> put 000001_0 /Temp/a_dt/000001_0 local: 000001_0 remote: /Temp/a_dt/000001_0 227 Entering Passive Mode (10,70,41,126,168,255). 150 Opening BINARY mode data connection for /Temp/a_dt/000001_0 452 Transfer aborted. No space left on device ftp> put 000002_0 /Temp/a_dt/000002_0 local: 000002_0 remote: /Temp/a_dt/000002_0 227 Entering Passive Mode (10,70,41,126,169,20). 150 Opening BINARY mode data connection for /Temp/a_dt/000002_0 452 Transfer aborted. No space left on device ftp> put 000003_0 /Temp/a_dt/000003_0 local: 000003_0 remote: /Temp/a_dt/000003_0 227 Entering Passive Mode (10,70,41,126,169,40). 150 Opening BINARY mode data connection for /Temp/a_dt/000003_0 452 Transfer aborted. No space left on device ftp> put 000004_0 /Temp/a_dt/000004_0 local: 000004_0 remote: /Temp/a_dt/000004_0 227 Entering Passive Mode (10,70,41,126,169,66). 150 Opening BINARY mode data connection for /Temp/a_dt/000004_0 452 Transfer aborted. No space left on device ftp> put 000005_0 /Temp/a_dt/000005_0 local: 000005_0 remote: /Temp/a_dt/000005_0 227 Entering Passive Mode (10,70,41,126,169,85). 150 Opening BINARY mode data connection for /Temp/a_dt/000005_0 226 Transfer complete. 23465237 bytes sent in 0.747 secs (31391.79 Kbytes/sec) ftp>

查询hdfs文件内容,如果文件过大时不能一次加载,可以使用:

hadoop fs -cat /user/my/ab.txt |more 

 

posted @ 2016-11-04 00:55  cctext  阅读(1243)  评论(0编辑  收藏  举报