hive

CREATE TABLE t1(name string,id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
LOAD DATA LOCAL INPATH '/Users/***/Desktop/test.txt' INTO TABLE t1;

然后在hdfs上查看, port 50070
dfs -ls /user/wyq/hive;
---------------------------------------------------
eclipse java(jar cvf demoudf.jar ///.java)

import java.util.Data;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
import java.next.DataFormat;
   public class UnixTodate extends UDF{
   public Text evaluate(Text text){
     if (text ==null) return null;
     long timestamp = Long.parseLong(text.toString());
     return new Text(toDate(timestamp)); 
}
   private String toDate(long timestamp){
     Date date = new Date(timestamp*1000);
     return DateFormat.getInstance().format(date).toString();
}

}
ADD jar /Users/wyq/Desktop/demoudf.jar;
create temporary function userdate as 'demoudf.UnixTodate';
create table test(id string, unixtime string) row format delimited fields terminated by ',';
load data local inpath '/Users/wyq/Desktop/udf_test.txt' into table test;
select * from test;
select id,userdate(unixtime) from test;

cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
-----------------------------------------------------
用python将表列进行转换

row format delimited fields terminated by '\t'
load data local inpath '///' into table test
add File ///.py;
insert overwrite table u_data_new select transform (col1,col2,unixtime) using 'python ...py' as (col1,col2,unixtime) from u_data

python:

for line in sys.std: 
  line = line.strip() 
  col1,col2,unixtime = line.split('\t')
  weekday=datetime.datetime.formtimestamp(float(unixtime)).isoweekday()
  print '\t'.join(col1,col2,string(weekday))

-------------------------------------------------------
hive :

1组件架构: hiveserver2(beeline),hive,metadb

其中Execution Engine – The component which executes the execution plan created by the compiler. The plan is a DAG of stages. The execution engine manages the dependencies between these different stages of the plan and executes these stages on the appropriate system components.

2连接方式到hiveserver2  :GUI CLI JDBC(beeline)

3数据源:用kafka,sqoop等获得data,放入hdfs,这些数据各种结构都有。关系数据库的表,MongoDB 或json数据,或日志

4怎么执行hql的?背后运行的是mapreduce or Tez jobs(类似于pig latin脚本执行pig)(tracking url)insert into test values("wangyuq","123");

stage?将你的数据移到目的位置之前,将会staing 那儿一段时间 staging文件没了。

5优劣与评价。pig是对非结构化数据处理的好的etl。

hive不是关系数据库,只是维护存储在HDFS的数据的metadata,使得对大数据操作就像sql操作表一样,只不过hql和sql稍有出入。hive使用metastore存表。hive默认derby但是可自定义更换。

使我们能用sql来执行mr。可以对hdfs数据进行query。

---但是:
hive不能承诺优化,只是简单,因此hive性能不能支持实时
index view,有限制(partition bucket)
read only 不支持update
和sql 的datatype不完全一样
新的partition可以被插入但不能

6与hdfs?hdfs里有hive

7那么如何处理数据?(partition bucket semidata->structured)

load语句: 将hdfs搬运到hive,hdfs不再有该数据。只是将真正的data转到了hive目录下。

8那么怎么存数据的? data在hdfs上,schema在metastore里。

9安装及error
mysql:(用户管理问题)

step 1: SET PASSWORD = PASSWORD('your new password');
step 2: ALTER USER 'root'@'localhost' PASSWORD EXPIRE NEVER;
step 3: flush privileges;

1.$mysql -u root -p
2.mysql> create user 'hive' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
3.mysql> grant all privileges on *.* to 'hive' with grant option;
Query OK, 0 rows affected (0.00 sec)
4.mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

create user 'hive'@'%' identified by 'hive';
grant all privileges on *.* to 'hive'@'%' with grant option;
flush privileges;

启动hadoop:
hadoop namenode -format; start-all.sh

 

posted on 2017-10-28 04:55  satyrs  阅读(309)  评论(0编辑  收藏  举报

导航