Hadoop/Hive 疑难杂症
1. namenode is in safe mode, 使用下面命令行离开safe mode:
hadoop dfsadmin -safemode leave
2. Container [pid=22826,containerID=container_1526436506350_0003_01_000024] is running beyond virtual memory limits. Current usage: 208.0 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container.
容器超过了内存限制:
In mapred-site.xml set the memory small enough:
mapreduce.map.memory.mb=2048
mapreduce.reduce.memory.mb=2048
3. 0.0.0.0:10020 Connection Refused:
'java.io.IOException(java.net.ConnectException: Call From localhost/127.0.0.1 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused)'
10020端口是给job history server用的, 默认情况下它没有启动
配置mapred-site.xml
mapreduce.jobhistory.address=127.0.0.1:10020
然后在sbin目录里执行:
./mr-jobhistory-daemon.sh start historyserver
4. 调整HIVE执行的mapper数量:
set mapred.max.split.size=2560000000;
通过让每个mapper处理的split变大, 来减少mapper的数量. 调整mapper的数量有助于避免一些错误.
5. Hive基于已有数据创建表后 空的情况
有partiton的话, 要手动加partition:
ALTER TABLE myData ADD PARTITION (group='group1',team='team1') ;
ALTER TABLE myData ADD PARTITION (group='group2',team='team2') ;
6. java.io.IOException(Could not find status of job
有问张提到是jobname里有非法字符导致的
set hive.jobname.length=10; #or smaller, the default value is 50
但我亲测重启集群后解决.
7. Hive 指定端口
hiveserver2 --hiveconf hive.server2.thrift.port=20001
8. Hive compilation lock
Hive默认同时智能编译一段HiveQL, 所以如果在UDF中执行了一段HiveQL的话, 就会锁住, 解决办法:
在hive-site.xml.增加hive.driver.parallel.compilation=true
或在启动命令行中加上 --hiveconf hive.driver.parallel.compilation=true
9. Hive 没有权限调用MR
beeline 命令行指定用户名:
beeline -u jdbc:hive2://localhost:10000/default -n username
可能还需要配置hadoop权限: https://blog.csdn.net/rav009/article/details/80271656
10. Bad status for request TFetchResultsReq
Hive不能把csv文件直接用load命令导入到特殊编码格式的表比如store as parquet的表, 如果这么做了load命令虽然成功,但是查询的时候会报上面这个错误