用户报在Hue中执行一条sql:select admission_date, discharge_date,birth_date from hm_004_20170309141149.inpatient_visit limit 20; 返回的结果中date类型的列全部显示为null,可是在hive客户端中能正常显示。
验证一下:beeline -u jdbc:hive2://0.0.0.0:10000 -e "select admission_date, discharge_date,birth_date from hm_004_20170309141149.inpatient_visit limit 20;"
怀疑是hiveserver2的问题,可查询另一个包含date的表,却显示正常:select part_dt from default.kylin_sales limit 50;
于是怀疑是serde的问题,hm_004_20170309141149.inpatient_visit用的是org.openx.data.jsonserde.JsonSerDe,default.kylin_sales用的是TextInputFormat.
CREATE EXTERNAL TABLE `default.inpatient_visit`( `age_m` int COMMENT 'from deserializer', `discharge_date` date COMMENT 'from deserializer', `address_code` string COMMENT 'from deserializer', `admission_date` date COMMENT 'from deserializer', `visit_dept_name` string COMMENT 'from deserializer', `birth_date` date COMMENT 'from deserializer', `outcome` string COMMENT 'from deserializer', `age` int COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://xxxx/user/hive/warehouse/xx.db/inpatient_visit';
本地测试beeline -u jdbc:hive2://0.0.0.0:10000 -e "add jar /home/work/hive/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar;select admission_date, discharge_date,birth_date from default.inpatient_visit limit 20;"
在Hue中测试:
【测试系统自带JsonSerDe是否功能一样】
CREATE TABLE json_nested_test ( count string, usage string, pkg map<string,string>, languages array<string>, store map<string,array<map<string,string>>>) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE;
遇到个报错:
2017-04-25 15:46:38,655 WARN [main]: data.JsonSerDe (JsonSerDe.java:deserialize(181)) - Error [java.io.IOException: Start of Array expected] parsing json text [{"count":2,"usage":91273,"pkg":{"weight":8,"type":"apple"},"languages":["German","French","Italian"],"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}]}}]. 2017-04-25 15:46:38,656 ERROR [main]: CliDriver (SessionState.java:printError(960)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array expected java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array expected at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array expected at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:183) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:488) ... 15 more Caused by: java.io.IOException: Start of Array expected at org.apache.hive.hcatalog.data.JsonSerDe.extractCurrentField(JsonSerDe.java:332) at org.apache.hive.hcatalog.data.JsonSerDe.extractCurrentField(JsonSerDe.java:356) at org.apache.hive.hcatalog.data.JsonSerDe.populateRecord(JsonSerDe.java:218) at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:174) ... 16 more
经过多轮测试(具体测试过程见:http://www.cnblogs.com/aprilrain/p/6916359.html),发现这个SerDe对于复杂些的嵌套会报此错,例如map<string,array<string>>
CREATE TABLE s6 ( store map<string,array<string>> ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE; load data local inpath '/home/work/s6.txt' overwrite into table s6; select * from s6; 6s.txt的内容 {"store":{"fruit":["weight","8","type","apple"]}} {"store":{"fruit":["weight","9","type","orange"]}}
向社区报了一个issue: https://issues.apache.org/jira/browse/HIVE-16526
CREATE TABLE json_nested_test_openx ( count string, usage string, pkg map<string,string>, languages array<string>, store map<string,array<map<string,string>>>) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE;
hive> select pkg['weight'],languages[0],store['fruit'][0]['type'] from json_nested_test_openx; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating store['fruit'][0]['type']