Pig安装与应用


 

1.  参考说明

参考文档:

 

http://pig.apache.org/docs/r0.17.0/start.html#build

 

 

2.  安装环境说明

2.1.  环境说明

 

CentOS7.4+ Hadoop2.7.5的伪分布式环境

 

主机名

NameNode

SecondaryNameNode

DataNodes

centoshadoop.smartmap.com

192.168.1.80

192.168.1.80

192.168.1.80

 

 

 

 

 

Hadoop的安装目录为:/opt/hadoop/hadoop-2.7.5

 

3.  安装

 

 

3.1.  Pig下载

 

http://pig.apache.org/releases.html#Download

 

clip_image002[4]

 

[root@server1 ~]# mkdir /opt/mongodb

[root@server1 ~]# chown -R mongodb:mongodb /opt/mongodb/

 

3.2.  Pig解压

 

将下载的pig-0.17.0.tar.gz解压到/opt/hadoop/pig-0.17.0目录下

 

4.  配置

 

4.1.  修改profile文件

vi /etc/profile

 

export PIG_HOME=/opt/hadoop/pig-0.17.0

export PATH=$PATH:$PIG_HOME/bin

 

4.2.  JDK升级为1.8版本

 

JDK切换成1.8的版本,并修改所有与JAVA_HOME相关的变量

 

4.3.  修改pig的配置文件

vi /opt/hadoop/pig-0.17.0/conf/pig.properties

 

exectype=mapreduce

 

4.4.  修改mapred-site.xml以启用jobhistory

 

vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/mapred-site.xml

 

<property>

       <name>mapreduce.jobhistory.address</name>                

       <value>192.168.1.80:10020</value>

</property>

 

5.  启动Hadoop

 

5.1.  启动YARNHDFS

cd /opt/hadoop/hadoop-2.7.5/sbin

 

start-all.sh

 

5.2.  启动historyserver

 

cd /opt/hadoop/hadoop-2.7.5/sbin

 

mr-jobhistory-daemon.sh start historyserver

 

6.  应用Pig工具

 

6.1.  导入文件到HDFS

 

hadoop fs -mkdir -p /input/ncdc/micro-tab

hadoop fs -copyFromLocal sample.txt /input/ncdc/micro-tab/sample.txt

 

6.2.  启动运行Pig的交互式Shell环境

 

cd /opt/hadoop/pig-0.17.0/bin

 

pig

 

clip_image004[4]

6.3.  运行任务

 

grunt> records = load '/input/ncdc/micro-tab/sample.txt' as (year:chararray, temperature:int, quality:int);

 

grunt> dump records;

 

clip_image006[4]

 

clip_image008[4]

 

6.4.  退出

 

 

grunt> \q

 

clip_image010[4]

 

6.5.  显示模式

 

cd /opt/hadoop/pig-0.17.0/bin

 

pig

 

 

grunt> records = LOAD '/input/ncdc/micro-tab/sample.txt' as (year:chararray, temperature:int, quality:int);

 

grunt> DUMP records;

 

grunt> DESCRIBE records

records: {year: chararray,temperature: int,quality: int}

grunt>

 

6.6.  过滤数据

 

grunt> filter_records = FILTER records BY temperature != 9999 AND quality IN (0, 1, 4, 5, 9);

grunt> DUMP filter_records;

 

clip_image012[4]

 

clip_image014[4]

 

6.7.  分组记录

 

grunt> grouped_records = GROUP filter_records BY year;

grunt> DUMP grouped_records;

grunt> DESCRIBE grouped_records;

grouped_records: {group: chararray,filter_records: {(year: chararray,temperature: int,quality: int)}}

grunt>

 

clip_image016[4]

 

clip_image018[4]

 

clip_image020[4]

 

 

6.8.  计算最大值

 

grunt> max_temp = FOREACH grouped_records GENERATE group, MAX(filter_records.temperature);

grunt> DUMP max_temp;

 

clip_image022[4]

 

clip_image024[4]

 

6.9.  查看执行过程

 

grunt> ILLUSTRATE max_temp;

 

clip_image026[4]

 

clip_image028[4]

 

posted @ 2018-05-21 00:37  ParamousGIS  阅读(722)  评论(0编辑  收藏  举报