hive

CREATE TABLE t1(name string,id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
LOAD DATA LOCAL INPATH '/Users/***/Desktop/test.txt' INTO TABLE t1;

然后在hdfs上查看， port 50070
dfs -ls /user/wyq/hive;
---------------------------------------------------
eclipse java(jar cvf demoudf.jar ///.java)

import java.util.Data;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
import java.next.DataFormat;
   public class UnixTodate extends UDF{
   public Text evaluate(Text text){
     if (text ==null) return null;
     long timestamp = Long.parseLong(text.toString());
     return new Text(toDate(timestamp)); 
}
   private String toDate(long timestamp){
     Date date = new Date(timestamp*1000);
     return DateFormat.getInstance().format(date).toString();
}

}

ADD jar /Users/wyq/Desktop/demoudf.jar;
create temporary function userdate as 'demoudf.UnixTodate';
create table test(id string, unixtime string) row format delimited fields terminated by ',';
load data local inpath '/Users/wyq/Desktop/udf_test.txt' into table test;
select * from test;
select id,userdate(unixtime) from test;

cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
-----------------------------------------------------
用python将表列进行转换

row format delimited fields terminated by '\t'
load data local inpath '///' into table test
add File ///.py;
insert overwrite table u_data_new select transform (col1,col2,unixtime) using 'python ...py' as (col1,col2,unixtime) from u_data

python:

for line in sys.std: 
  line = line.strip() 
  col1,col2,unixtime = line.split('\t')
  weekday=datetime.datetime.formtimestamp(float(unixtime)).isoweekday()
  print '\t'.join(col1,col2,string(weekday))

-------------------------------------------------------
hive :

1组件架构： hiveserver2（beeline）,hive,metadb

其中Execution Engine – The component which executes the execution plan created by the compiler. The plan is a DAG of stages. The execution engine manages the dependencies between these different stages of the plan and executes these stages on the appropriate system components.

2连接方式到hiveserver2 :GUI CLI JDBC(beeline)

3数据源：用kafka，sqoop等获得data，放入hdfs，这些数据各种结构都有。关系数据库的表，MongoDB 或json数据，或日志

4怎么执行hql的？背后运行的是mapreduce or Tez jobs(类似于pig latin脚本执行pig)（tracking url）insert into test values("wangyuq","123");

stage?将你的数据移到目的位置之前，将会staing 那儿一段时间 staging文件没了。

5优劣与评价。pig是对非结构化数据处理的好的etl。

hive不是关系数据库，只是维护存储在HDFS的数据的metadata，使得对大数据操作就像sql操作表一样，只不过hql和sql稍有出入。hive使用metastore存表。hive默认derby但是可自定义更换。

使我们能用sql来执行mr。可以对hdfs数据进行query。

---但是：
hive不能承诺优化，只是简单，因此hive性能不能支持实时
index view,有限制（partition bucket）
read only 不支持update
和sql 的datatype不完全一样
新的partition可以被插入但不能

6与hdfs？hdfs里有hive

7那么如何处理数据？（partition bucket semidata->structured）

load语句：将hdfs搬运到hive，hdfs不再有该数据。只是将真正的data转到了hive目录下。

8那么怎么存数据的？ data在hdfs上，schema在metastore里。

9安装及error
mysql：（用户管理问题）

step 1: SET PASSWORD = PASSWORD('your new password');
step 2: ALTER USER 'root'@'localhost' PASSWORD EXPIRE NEVER;
step 3: flush privileges;

1.$mysql -u root -p
2.mysql> create user 'hive' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
3.mysql> grant all privileges on *.* to 'hive' with grant option;
Query OK, 0 rows affected (0.00 sec)
4.mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

create user 'hive'@'%' identified by 'hive';
grant all privileges on *.* to 'hive'@'%' with grant option;
flush privileges;

启动hadoop：
hadoop namenode -format; start-all.sh

posted on 2017-10-28 04:55 satyrs 阅读(309) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

satyrs

hive

导航

公告