Hadoop-Impala学习笔记之管理
配置参数管理
待补充。。。
资源分配管理(Admission Control)
Impala有资源池的概念,允许某些查询在特定的资源池执行,不过在白天不跑批/晚上不跑adhoc的DSS系统中,该机制并不常用(oracle、cgroup性质都类似),有兴趣可以参考《Impala Guide 中的Admission Control and Query Queuing》。
安全管理(跟一般的RDBMS差不多,只不过认证和授权是外部的,比较复杂)
Impala认证基于Kerberos框架《Enabling Kerberos Authentication for Impala》,Impala授权框架基于Sentry开源项目《Enabling Sentry Authorization for Impala》,从Impala 1.1.0开始加入,审计特性从1.1.1开始支持。
kerberos安装:https://www.jianshu.com/p/fc2d2dbd510b
kerberos介绍:https://www.cnblogs.com/ulysses-you/p/8107862.html
CDH集成Kerberos配置:https://blog.csdn.net/qxf1374268/article/details/79321951
如何在CDH5.12集群中启用Kerberos认证:https://blog.csdn.net/cy309173854/article/details/79288491
优化
启用short-circuit读
该特性使得Impala可以从文件系统直接读取本地数据,避免了和DataNodes通信的必要性,提升性能,它要求使用libhadoop.so(hadoop原生库)。tarball安装中不包含此库,.rpm, .deb, parcel中包含。
该特性可以通过修改hdfs-site.xml或Cloudera Manager修改。
启用块位置跟踪
该特性可以使得Impala更好地利用底层的磁盘,如果Impala不是由Cloudera Manager管理,则需要启用块位置跟踪特性。该特性同样可以通过hdfs-site.xml修改。
JDBC访问
JDBC 2.0及之后的版本可通过21050访问Impala,可通过impalad启动参数--hs2_port修改默认端口 。
在Impala 2.0+,可通过Cloudera JDBC Connector和Hive 0.13(0.12之前的版本无法访问2.0) JDBC访问。
连接串:jdbc:impala://Host:Port[/Schema];Property1=Value;Property2=Value;...
jdbc:hive2://myhost.example.com:21050/;auth=noSasl
jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM -- Kerberos认证的Impala
当前版本的驱动在对Kudu表执行DML操作时,如果发生一些错误如唯一性约束违反,不会报错。如果有此要求,可以使用Kudu Java API而不是JDBC。
impala jdbc没有发布在共有的maven仓库中,需要自己从https://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-43.html下载,并维护到本地maven仓库,https://github.com/onefoursix/Cloudera-Impala-JDBC-Example包含了一个例子,它使用就和普通的JDBC一样的,没什么特别的。
Impala支持的HDFS文件格式
其中Snappy在压缩率和解压效率之间取得平衡,是推荐的做法。Gzip可以得到最好的压缩率。如果数据几乎一直驻留内存,则不用考虑压缩,因为节省不了I/O。
默认情况下,Impala创建的就是文本文件格式的表。
Parquet是列式存储的二进制文件格式,适合于访问少数列的场景。要创建Parquet格式的表,可以在create table中声明STORED AS PARQUET;子句,如下:
[impala-host:21000] > create table parquet_table_name (x INT, y STRING) STORED AS PARQUET;
还可以直接从Parquet推断出列定义:
CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat' STORED AS PARQUET LOCATION '/user/etl/destination';
Impala使用的端口列表
Component | Service | Port | Access Requirement | Comment |
---|---|---|---|---|
Impala Daemon |
Impala Daemon Frontend Port |
21000 |
External |
Used to transmit commands and receive results by |
Impala Daemon |
Impala Daemon Frontend Port |
21050 |
External |
Used to transmit commands and receive results by applications, such as Business Intelligence tools, using JDBC, the Beeswax query editor in Hue, and some ODBC drivers. |
Impala Daemon |
Impala Daemon Backend Port |
22000 |
Internal |
Internal use only. Impala daemons use this port for Thrift based communication with each other. |
Impala Daemon |
StateStoreSubscriber Service Port |
23000 |
Internal |
Internal use only. Impala daemons listen on this port for updates from the statestore daemon. |
Catalog Daemon |
StateStoreSubscriber Service Port |
23020 |
Internal |
Internal use only. The catalog daemon listens on this port for updates from the statestore daemon. |
Impala Daemon |
Impala Daemon HTTP Server Port |
25000 |
External |
Impala web interface for administrators to monitor and troubleshoot. |
Impala StateStore Daemon |
StateStore HTTP Server Port |
25010 |
External |
StateStore web interface for administrators to monitor and troubleshoot. |
Impala Catalog Daemon |
Catalog HTTP Server Port |
25020 |
External |
Catalog service web interface for administrators to monitor and troubleshoot. New in Impala 1.2 and higher. |
Impala StateStore Daemon |
StateStore Service Port |
24000 |
Internal |
Internal use only. The statestore daemon listens on this port for registration/unregistration requests. |
Impala Catalog Daemon |
Catalog Service Port |
26000 |
Internal |
Internal use only. The catalog service uses this port to communicate with the Impala daemons. New in Impala 1.2 and higher. |
Impala Daemon |
KRPC Port |
27000 |
Internal |
Internal use only. Impala daemons use this port for KRPC based communication with each other. |
Impala Daemon |
Llama Callback Port |
28000 |
Internal |
Internal use only. Impala daemons use to communicate with Llama. New in Impala 1.3and higher. |
Impala Llama ApplicationMaster |
Llama Thrift Admin Port |
15002 |
Internal |
Internal use only. New in Impala 1.3 and higher. |
Impala Llama ApplicationMaster |
Llama Thrift Port |
15000 |
Internal |
Internal use only. New in Impala 1.3 and higher. |
Impala Llama ApplicationMaster |
Llama HTTP Port |
15001 |
External |
Llama service web interface for administrators to monitor and troubleshoot. New in Impala 1.3 and higher. |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 上周热点回顾(2.24-3.2)
2017-04-07 windows 7/10 安装u盘制作