ClickHouse高可用集群搭建
一、集群说明
zookeeper 需要单独部署在其他机器上,以免clickhouse 节点挂掉,引起zookeeper 挂掉。
0. 高可用原理:zookeeper + ReplicatedMergeTree(复制表) + Distributed(分布式表)
1. 前提准备:所有节点防火墙关闭或者开放端口;所有节点建立互信(免密码登录);hosts表和主机名一定要集群保持一致正确配置,因为zookeeper返
回的是主机名,配置错误或不配置复制表时会失败.
clickhouse测试节点3个:10.0.0.236 cdhserver1(clickhouse1), 10.0.0.237 cdhserver2 (clickhouse2),10.0.0.238 cdhserver3 (clickhouse3),10.0.0.239 cdhserver4
(clickhouse4)
zookeeper测试节点3个:10.0.0.237 cdhserver2 (zookeeper),10.0.0.238 cdhserver3 (zookeeper),10.0.0.239 cdhserver4 (zookeeper)
配置方案:4个节点点各配置两个clickhouse实例,相互备份.
cdhserver1: 实例1, 端口: tcp 9006, http 8123, 同步端口9006, 类型: 主节点
cdhserver2: 实例2, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片1, 副本1
cdhserver3: 实例3, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片2, 副本1
cdhserver4: 实例4, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片3, 副本1
二、环境准备
1)主机配置(根据自身情况)
10.0.0.236 cdhserver1 centos 7.1 32G 200G
10.0.0.237 cdhserver2 centos 7.1 32G 200G
10.0.0.238 cdhserver3 centos 7.1 32G 200G
10.0.0.239 cdhserver4 centos 7.1 32G 200G
2)hosts表和主机名
3)所有节点防火墙关闭或者开放端口;
# 1.关闭防火墙
service iptables stop
chkconfig iptables off
chkconfig ip6tables off
# 2.关闭selinux
修改/etc/selinux/config中的SELINUX=disabled后重启
[root@cdhserver1 ~]# vim /etc/selinux/config
SELINUX=disabled
4)优化所有节点服务器打开文件个数
在/etc/security/limits.conf、/etc/security/limits.d/90-nproc.conf这2个文件的末尾加入一下内容:
[root@cdhserver1 software]# vim /etc/security/limits.conf
在文件末尾添加:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
[root@cdhserver1 software]# vim /etc/security/limits.d/90-nproc.conf
在文件末尾添加:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
重启服务器it -n 或者ulimit -a查看设置结果
[root@cdhserver1 ~]# ulimit -n
65536
5)所有节点建立互信(免密码登录);
1、在各节点通过ssh-keygen生成RSA密钥和公钥
ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa
2、将所有的公钥文件汇总到一个总的授权key文件中,在cdhserver1机器执行如下命令,必须逐行执行:
ssh cdhserver1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver3 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver4 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3、设置授权key文件的权限,在M01机器执行如下命令:
chmod 600 ~/.ssh/authorized_keys
4、分发授权key文件到所有服务器,必须逐行执行:
scp ~/.ssh/authorized_keys cdhserver1:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver2:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver3:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver4:~/.ssh/
6)检查系统是否支持SSE 4.2
[root@cdhserver1 ~]# grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
SSE 4.2 supported
三、分布式集群安装
1、安装
1)rpm方式安装
说明:
1) 安装资源获取:
https://packagecloud.io/Altinity/clickhouse。
https://repo.yandex.ru/clickhouse/rpm/
2)在所有节点安装clickhouse(centos 7.1 为例)
3) 分别在个服务器上进行如下安装和配置
安装clickhouse:
安装 libicu
mkdir -p /usr/local/icu/
cd /usr/local/icu/libicu-4.2.1-14.el6.x86_64.rpm
rpm -ivh libicu-4.2.1-14.el6.x86_64.rpm
安装clickhouse
rpm -ivh clickhouse-server-common-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-compressor-1.1.54336-3.el6.x86_64.rpm
rpm -ivh clickhouse-common-static-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-server-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-client-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-test-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-debuginfo-18.14.12-1.el6.x86_64.rpm
或 rpm -ivh clickhouse-*-.el6.x86_64.rpm
2)yum安装
# CentOS / RedHat
sudo yum install yum-utils
sudo rpm --import https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.yandex.ru/clickhouse/rpm/stable/x86_64
sudo yum install clickhouse-server clickhouse-client
#使用脚本安装yum源 curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash #yum 安装 server 以及 client sudo yum install -y clickhouse-server clickhouse-client #查看是否安装完成 sudo yum list installed 'clickhouse*'
3)升级
ClickHouse 的基础上升级也是非常方便,直接下载新版本的 RPM 包,执行如下命令安装升级(可以不用关闭 ClickHouse 服务),升级的过程中,原有的 config.xml 等配置均会被保留,也可以参考官方资料使用其它方式升级 ClickHouse。
# 查看当前版本 clickhouse-server --version # 升级。从安装的过程我们也可以看到,新包中的配置以 .rpmnew 后缀,旧的配置文件保留 [root@cdh2 software]# rpm -Uvh clickhouse-*-20.5.4.40-1.el7.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:clickhouse-server-common-20.5.4.4warning: /etc/clickhouse-server/config.xml created as /etc/clickhouse-server/config.xml.rpmnew ################################# [ 13%] warning: /etc/clickhouse-server/users.xml created as /etc/clickhouse-server/users.xml.rpmnew 2:clickhouse-common-static-20.5.4.4################################# [ 25%] 3:clickhouse-server-20.5.4.40-1.el7################################# [ 38%] Create user clickhouse.clickhouse with datadir /var/lib/clickhouse 4:clickhouse-client-20.5.4.40-1.el7################################# [ 50%] Create user clickhouse.clickhouse with datadir /var/lib/clickhouse Cleaning up / removing... 5:clickhouse-client-19.16.3.6-1.el7################################# [ 63%] 6:clickhouse-server-19.16.3.6-1.el7################################# [ 75%] 7:clickhouse-server-common-19.16.3.################################# [ 88%] 8:clickhouse-common-static-19.16.3.################################# [100%]
3、目录结构
/etc/clickhouse-server:服务端的配置文件目录,包括全局配置 config.xml 和用户配置 users.xml
/var/lib/clickhouse:默认的数据存储目录,如果是生产环境可以将其修改到空间较大的磁盘挂载路径。可以通过修改 /etc/clickhouse-server/config.xml 配置文件中 <path> 、<tmp_path> 和 <user_files_path> 标签值来设置。
/var/log/clickhouse-server:默认的日志保存目录。同样可以通过修改 /etc/clickhouse-server/config.xml 配置文件中 <log> 和 <errorlog> 标签值来设置。
/etc/cron.d/clickhouse-server:clickhouse server 的一个定时配置,用于恢复因异常中断的 ClickHouse 服务进程。
~/.clickhouse-client-history:client 执行的 sql 历史记录。
4、配置文件修改
1)四个节点修改配置文件config.xml
因为集群之间需要互相方位其它节点的服务,需要开放ClickHouse服务的ip和端口,在四个节点机器上配置/etc/clickhouse-server/config.xml文件,在<yandex>标签下释放 <listen_host>标签(大概在69、70行),配置如下:
<?xml version="1.0"?> <yandex> <logger> <level>trace</level> <log>/var/log/clickhouse-server/clickhouse-server.log</log> <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog> <size>1000M</size> <count>10</count> </logger> <http_port>8123</http_port> <tcp_port>9006</tcp_port> <openSSL> <server> <certificateFile>/etc/clickhouse-server/server.crt</certificateFile> <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile> <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile> <verificationMode>none</verificationMode> <loadDefaultCAFile>true</loadDefaultCAFile> <cacheSessions>true</cacheSessions> <disableProtocols>sslv2,sslv3</disableProtocols> <preferServerCiphers>true</preferServerCiphers> </server> <client> <loadDefaultCAFile>true</loadDefaultCAFile> <cacheSessions>true</cacheSessions> <disableProtocols>sslv2,sslv3</disableProtocols> <preferServerCiphers>true</preferServerCiphers> <invalidCertificateHandler> <name>RejectCertificateHandler</name> </invalidCertificateHandler> </client> </openSSL> <interserver_http_port>9009</interserver_http_port> <interserver_http_host>hadoop4</interserver_http_host> <listen_host>0.0.0.0</listen_host> <max_connections>4096</max_connections> <keep_alive_timeout>3</keep_alive_timeout> <max_concurrent_queries>100</max_concurrent_queries> <uncompressed_cache_size>8589934592</uncompressed_cache_size> <mark_cache_size>5368709120</mark_cache_size> <path>/data/clickhouse/</path> <tmp_path>/data/clickhouse/tmp/</tmp_path> <user_files_path>/data/clickhouse/user_files/</user_files_path> <users_config>users.xml</users_config> <default_profile>default</default_profile> <default_database>default</default_database> <mlock_executable>false</mlock_executable> <!-- 设置扩展配置文件的路径,大概在第229行附近--> <include_from>/etc/clickhouse-server/metrika.xml</include_from> <remote_servers incl="clickhouse_remote_servers" > </remote_servers> <zookeeper incl="zookeeper-servers" optional="true" /> <macros incl="macros" optional="true" /> <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval> <max_session_timeout>3600</max_session_timeout> <default_session_timeout>60</default_session_timeout> <query_log> <database>system</database> <table>query_log</table> <partition_by>toYYYYMM(event_date)</partition_by> <flush_interval_milliseconds>7500</flush_interval_milliseconds> </query_log> <trace_log> <database>system</database> <table>trace_log</table> <partition_by>toYYYYMM(event_date)</partition_by> <flush_interval_milliseconds>7500</flush_interval_milliseconds> </trace_log> <query_thread_log> <database>system</database> <table>query_thread_log</table> <partition_by>toYYYYMM(event_date)</partition_by> <flush_interval_milliseconds>7500</flush_interval_milliseconds> </query_thread_log> <dictionaries_config>*_dictionary.xml</dictionaries_config> <compression incl="clickhouse_compression"> </compression> <distributed_ddl> <path>/clickhouse/task_queue/ddl</path> </distributed_ddl> <graphite_rollup_example> <pattern> <regexp>click_cost</regexp> <function>any</function> <retention> <age>0</age> <precision>3600</precision> </retention> <retention> <age>86400</age> <precision>60</precision> </retention> </pattern> <default> <function>max</function> <retention> <age>0</age> <precision>60</precision> </retention> <retention> <age>3600</age> <precision>300</precision> </retention> <retention> <age>86400</age> <precision>3600</precision> </retention> </default> </graphite_rollup_example> <format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path> </yandex>
补充:
[root@cdhserver1 ~]# vim /etc/clickhouse-server/config.xml <http_port>8123</http_port> <tcp_port>9006</tcp_port> <listen_host>::</listen_host> <!-- <listen_host>::1</listen_host> --> <!-- <listen_host>127.0.0.1</listen_host> --> <!-- <max_table_size_to_drop>0</max_table_size_to_drop> --> <!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> --> <!-- 设置时区为东八区,大概在第144行附近--> <timezone>Asia/Shanghai</timezone> <!-- 设置扩展配置文件的路径,大概在第226行附近-- 新增>
<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
-->
<include_from>/etc/clickhouse-server/metrika.xml</include_from>
2)四个节点修改配置文件users.xml添加用户
设置用户认证。密码配置有两种方式,一种是明文方式,一种是密文方式(sha256sum的Hash值),官方推荐使用密文作为密码配置。
[root@cdhserver1 ~]# vim /etc/clickhouse-server/users.xml
修改/etc/clickhouse-server/users.xml
在<!-- Example of user with readonly access. -->上新增:
<ck> <password_sha256_hex>8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92</password_sha256_hex> <networks incl="networks" replace="replace"> <ip>::/0</ip> </networks> <profile>default</profile> <quota>default</quota> </ck>
生成sha256sum的Hash值可以执行如下命令(第一行),回车后输出两行信息(第二行和第三行),其中第二行是原始密码,第三行是加密的密文,配置文件使用第三行的字符串,客户端登录是使用第二行的密码。
[root@cdhserver1 clickhouse-server]# PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-' zjisCtGb 2451b6a1bd72c59fd96972ae8dd1c4597257679033794d4c65e6a4dd49997dd6
3)四个节点的/etc/clickhouse-server目录下新建metrika.xml文件集群分片的配置
[root@cdhserver1 ~]# vim /etc/clickhouse-server/metrika.xml
添加如下内容:
<?xml version="1.0"?> <yandex> <!--ck集群节点--> <clickhouse_remote_servers>
<!-- 定义的集群名 idc_cluster--> <idc_cluster> <!--分片1--> <shard>
<!-- 分片权重值, 默认为 1,官方建议这个值不要设置的太大,分一个分片的权重值越大,被写入数据的就会越多 --> <weight>1</weight> <replica> <host>cdhserver2</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> <!--分片2--> <shard> <weight>1</weight> <replica> <host>cdhserver3</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> <!--分片3--> <shard> <weight>1</weight> <replica> <host>cdhserver4</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> </idc_cluster> </clickhouse_remote_servers> <!--zookeeper相关配置--> <zookeeper-servers> <node index="1"> <host>cdhserver2</host> <port>2181</port> </node> <node index="2"> <host>cdhserver3</host> <port>2181</port> </node> <node index="3"> <host>cdhserver4</host> <port>2181</port> </node> </zookeeper-servers> <!--分片和副本配置--> <macros> <replica>cdhserver1</replica> </macros> <!--开启远程访问--> <networks> <ip>::/0</ip> </networks> <!--压缩相关配置--> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> <!--压缩算法lz4压缩比zstd快, 更占磁盘--> </case> </clickhouse_compression> </yandex>
注意:上面标红的地方需要根据节点不同去修改
cdhserver2 分片1,副本1:
<macros> <layer>01</layer> <shard>01</shard> <replica>cdhserver2</replica> </macros>
cdhserver3 分片1,副本1:
<macros> <layer>01</layer> <shard>02</shard> <replica>cdhserver3</replica> </macros>
cdhserver4 分片1,副本1:
<macros> <layer>01</layer> <shard>03</shard> <replica>cdhserver4</replica> </macros>
补充:clickhouse集群多分片多副本metrika.xml配置(例如:3分片3副本)
ck1: 实例1, 端口: tcp 9006, ,http 8123, 同步端口9006, 类型: 分片1, 副本1
ck2: 实例2, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片2, 副本1
ck3: 实例3, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片3, 副本1
ck4: 实例1, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片1, 副本2(ck1的副本)
ck5: 实例2, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片2, 副本2(ck2的副本)
ck6: 实例3, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片3, 副本2(ck3的副本)
[root@ck1~]# vim /etc/clickhouse-server/metrika.xml
添加如下内容:
<!--所有实例均使用这个集群配置,不用个性化 --> <yandex> <!-- 集群配置 --> <!-- clickhouse_remote_servers所有实例配置都一样 --> <!-- 集群配置 --> <clickhouse_remote_servers> <!-- 定义的集群名 idc_cluster--> <idc_cluster> <!-- 数据分片1 --> <shard> <!-- 在分布式表中的这个 shard 内只选择一个合适的 replica 写入数据。如果为本地表引擎为 ReplicatedMergeTree ,多个副本之间的数据交由引擎自己处理 --> <internal_replication>true</internal_replication> <replica> <host>ck1</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> <replica> <host>ck4</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> </shard> <!-- 数据分片2 --> <shard> <internal_replication>true</internal_replication> <replica> <host>ck2</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> <replica> <host>ck5</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> </shard> <!-- 数据分片3 --> <shard> <internal_replication>true</internal_replication> <replica> <host>ck3</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> <replica> <host>ck6</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> </shard> </idc_cluster> </clickhouse_remote_servers> <!-- ZK --> <!-- zookeeper_servers所有实例配置都一样 --> <zookeeper-servers> <node index="1"> <host>192.168.10.66</host> <port>2181</port> </node> <node index="2"> <host>192.168.10.57</host> <port>2181</port> </node> <node index="3"> <host>192.168.10.17</host> <port>2181</port> </node> </zookeeper-servers> <!-- marcos-分片和副本配置,每个实例配置不一样 分片1, 副本1 --> <macros> <layer>01</layer> <shard>01</shard> <replica>cluster01-01</replica> </macros> <!--开启远程访问--> <networks> <ip>::/0</ip> </networks> <!-- 数据压缩算法 --> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> <!--压缩算法lz4压缩比zstd快, 更占磁盘--> </case> </clickhouse_compression> </yandex>
注意:上面标红的地方需要根据节点不同去修改
ck2 分片2,副本1:
<macros> <layer>01</layer> <shard>02</shard> <replica>cluster02-01</replica> </macros>
ck3 分片3,副本1:
<macros> <layer>01</layer> <shard>03</shard> <replica>cluster03-01</replica> </macros>
ck4 分片1,副本2:
<macros> <layer>01</layer> <shard>01</shard> <replica>cluster01-02</replica> </macros>
ck5 分片2,副本2:
<macros> <layer>01</layer> <shard>02</shard> <replica>cluster02-02</replica> </macros>
ck6 分片3,副本2:
<macros> <layer>01</layer> <shard>03</shard> <replica>cluster03-02</replica> </macros>
5、 服务启动脚本
ClickHouse Server服务的启停命令如下
# 1 启动。 # 可以在/var/log/clickhouse-server/目录中查看日志。 #sudo /etc/init.d/clickhouse-server start systemctl start clickhouse-server # 2 查看状态 systemctl status clickhouse-server # 3 重启 systemctl restart clickhouse-server # 4 关闭 systemctl stop clickhouse-server
前台启动:
[root@cdhserver1 software]# clickhouse-server --config-file=/etc/clickhouse-server/config.xml
后台启动:
[root@cdhserver1 software]# nohup clickhouse-server --config-file=/etc/clickhouse-server/config.xml >null 2>&1 &
=================================================Clickhouse 单节点多实例=======================================
1)将/etc/clickhouse-server/config.xml文件拷贝一份改名
[root@ck1 clickhouse-server]# cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config9002.xml
2) 编辑/etc/clickhouse-server/config9002.xml更改以下内容将两个服务区分开来
多实例修改的config9002.xml:原来内容 <log>/var/log/clickhouse-server/clickhouse-server.log</log> <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog> <http_port>8123</http_port> <tcp_port>9000</tcp_port> <mysql_port>9004</mysql_port> <interserver_http_port>9009</interserver_http_port> <path>/data/clickhouse/</path> <tmp_path>/data/clickhouse/tmp/</tmp_path> <user_files_path>/data/clickhouse/user_files/</user_files_path> <access_control_path>/data/clickhouse/access/</access_control_path> <include_from>/etc/clickhouse-server/metrika.xml</include_from> #集群配置文件 多实例修改的config9002.xml:调整后内容 <log>/var/log/clickhouse-server/clickhouse-server-9002.log</log> <errorlog>/var/log/clickhouse-server/clickhouse-server-9002.err.log</errorlog> <http_port>8124</http_port> <tcp_port>9002</tcp_port> <mysql_port>9005</mysql_port> <interserver_http_port>9010</interserver_http_port> <path>/data/clickhouse9002/</path> <tmp_path>/data/clickhouse9002/tmp/</tmp_path> <user_files_path>/data/clickhouse9002/user_files/</user_files_path> <access_control_path>/data/clickhouse9002/access/</access_control_path> <include_from>/etc/clickhouse-server/metrika9002.xml</include_from>
3)创建对应的目录
[root@ck1 clickhouse-server]# mkdir -p /data/clickhouse9002 [root@ck1 clickhouse-server]# chown -R clickhouse:clickhouse /data/clickhouse9002
4)增加实例对应的服务启动脚本
[root@ck1 init.d]# cp /etc/init.d/clickhouse-server /etc/init.d/clickhouse-server9002 [root@ck1 init.d]# vim /etc/init.d/clickhouse-server9002 调整内容如下: 调整后内容: CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config9002.xml CLICKHOUSE_PIDFILE="$CLICKHOUSE_PIDDIR/$PROGRAM-9002.pid" 调整前内容: CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config.xml CLICKHOUSE_PIDFILE="$CLICKHOUSE_PIDDIR/$PROGRAM.pid"
5)启动高可用clickhouse集群
[root@ck1 init.d]# systemctl start clickhouse-server
[root@ck1 init.d]# systemctl start clickhouse-server9002
说明:其他配置参考安装步骤
==============================================================================================================
五、客户端工具
5.1 clickhouse-client
# 1 未设置密码时 clickhouse-client # 2 指定用户名和密码 clickhouse-client -h 127.0.0.1 -u ck --password 123456 clickhouse-client -h 127.0.0.1 --port 9006 -u ck --password 123456 --multiline # 指定sql命令方式 clickhouse-client -h 127.0.0.1 --port 9006 -u ck --password 123456 --multiline -q "SELECT now()"
-- 查看集群信息 cdhserver1 :) SELECT * FROM system.clusters; SELECT * FROM system.clusters ┌─cluster─────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──┬─host_address─┬─port─┬─is_local─┬─user─┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐ │ test_ck_cluster │ 1 │ 1 │ 1 │ cdhserver2 │ 10.0.0.237 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 1 │ 1 │ 2 │ cdhserver3 │ 10.0.0.238 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 2 │ 1 │ 1 │ cdhserver3 │ 10.0.0.238 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 2 │ 1 │ 2 │ cdhserver4 │ 10.0.0.239 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 3 │ 1 │ 1 │ cdhserver4 │ 10.0.0.239 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 3 │ 1 │ 2 │ cdhserver2 │ 10.0.0.237 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ └─────────────────┴───────────┴──────────────┴─────────────┴────────────┴──────────────┴──────┴──────────┴──────┴──────────────────┴──────────────┴─────────────────────────┘ 6 rows in set. Elapsed: 2.683 sec.
5.2 DBeaver
新建连接
All(或者Analytical),选择ClickHouse,下一步
端口默认是8123,主机选择ClickHouse的Server节点(如果是集群,随意一个ClickHouse 服务节点都行)。填写用户认证处设置用户名和密码。
测试连接,会提示下载驱动,确认下载即可
六、分布式DDL操作
重点是 ON CLUSTER
,后面跟上配置的集群名字,这样在创建分布式表时 ClickHouse 会根据集群配置在各个节点自动执行 DDL 语句。处理在创建分布式表时可以使用外,在创建本地表(*_local
)时也可以使用 ON CLUSTER
语句,这样在一个节点执行建表后,集群内其他节点也会创建同样的本地表。
1、表增加字段
alter TABLE idc.web_initial ON CLUSTER idc_cluster add COLUMN tracerouteip String AFTER jitter;
2、更改列的类型
alter TABLE idc.web_initial ON CLUSTER idc_cluster modify column tracerouteip UInt16;
3、 删除列
alter TABLE idc.web_initial ON CLUSTER idc_cluster drop column tracerouteip;
4、删除集群多个节点同一张表
drop table tabl on cluster clickhouse_cluster;
drop TABLE if exists idc.web_initial on CLUSTER idc_cluster ;
5、清理集群表数据
truncate table lmmbase.user_label_uid on cluster crm_4shards_1replicas;
6、创建集群库
CREATE DATABASE IF NOT EXISTS yhw ON CLUSTER idc_cluster;
7、集群删除时间分区数据
按时间分区:
toYYYYMM(EventDate):按月分区
toMonday(EventDate):按周分区
toDate(EventDate):按天分区
方法一:
ALTER TABLE baip.speed_sdk_info ON CLUSTER idc_cluster DELETE WHERE toDate(report_time)='2020-08-03';
方法二:
ALTER TABLE baip.speed_sdk_info ON CLUSTER idc_cluster DELETE WHERE report_time<=1596470399;
方法三:(当前两种方法分区数据没有删除掉的时候可以用方法三)
ALTER TABLE baip.speed_sdk_info ON CLUSTER idc_cluster DROP PARTITION '2020-08-03';
8、集群创建分布式表
1、创建本地表 drop TABLE if exists idc.web_element_detail_dist on CLUSTER idc_cluster ; drop TABLE if exists idc.web_element_detail on CLUSTER idc_cluster ; CREATE TABLE if not exists idc.web_element_detail on CLUSTER idc_cluster ( `task_id` UInt64 COMMENT '拨测任务id', `target` String COMMENT '域名/网址', `target_name` String COMMENT '网址名称', `element` String COMMENT '元素名称', `report_time` DateTime COMMENT '上报时间', `net_type` String COMMENT '网络接入方式', `probe_id` String COMMENT '探针id', `opt_type` String COMMENT '运营商类型', `opt_name` String COMMENT '运营商名称', `province_id` UInt32 COMMENT '省份编码', `province_name` String COMMENT '省份名称', `city_id` UInt32 COMMENT '地市编码', `city_name` String COMMENT '地市名称', `area_id` UInt32 COMMENT '区县编码', `area_name` String COMMENT '区县名称', `busi_type` String COMMENT '业务类型', `element_num` String COMMENT '元素个数', `idc_ip` String COMMENT '目标ip地址', `idc_delay` Float32 COMMENT 'idc延迟', `idc_size` Float32 COMMENT 'idc大小' , `ip_opt_type` String COMMENT '目标运营商类型', `ip_opt_name` String COMMENT '目标运营商名称', `ip_province_id` UInt32 COMMENT '目标IP省份编码', `ip_province_name` String COMMENT '目标IP省份名称', `ip_city_id` UInt32 COMMENT '目标IP地市编码', `ip_city_name` String COMMENT '目标IP地市名称', `ip_area_id` UInt32 COMMENT '目标IP区县编码', `ip_area_name` String COMMENT '目标IP区县名称', `five_min` UInt32, `ten_min` UInt32, `half_hour` UInt32, `one_hour` UInt32, `four_hour` UInt32, `half_day` UInt32 ) ENGINE = MergeTree() PARTITION BY (task_id, toYYYYMMDD(report_time)) ORDER BY (target, report_time) SETTINGS index_granularity = 8192; 2、创建分布式表 CREATE TABLE idc.web_element_detail_dist on CLUSTER idc_cluster AS idc.web_element_detail ENGINE = Distributed(idc_cluster, idc, web_element_detail, rand());
9、数据导入导出
导入数据:
sql的语法格式为 INSERT INTO 表名 FORMAT 输出格式
,输入格式同输出格式,这里以CSV和JSON两种方式为例,其它类似。
--max_insert_block_size=100000 #指定批量导入块大小 --format_csv_delimiter=$'\001' #指定分隔符 --input_format_allow_errors_num : 是允许的错误数 --input_format_allow_errors_ratio : 是允许的错误率, 范围是 [0-1] cat ~/csv_fileName.csv| clickhouse-client --host=ip --port=19000 --user=username --password=pwd --max_insert_block_size=100000 --format_csv_delimiter=$'\001' --query="INSERT INTO table_name FORMAT CSVWithNames" cat ~/csv_fileName.csv| clickhouse-client --host=ip --port=19000 --user=username --password=pwd --format_csv_delimiter=$'\001' --query="INSERT INTO table_name FORMAT CSVWithNames" cat tablename.csv | clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --max_insert_block_size=10000 --query="INSERT INTO yhw.speed_sdk_info FORMAT CSVWithNames" cat /web/tablename.csv | clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --format_csv_delimiter=$'\001' --query="INSERT INTO yhw.speed_sdk_info FORMAT CSV" clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --query='INSERT INTO yhw.speed_sdk_info FORMAT CSV' < /web/tablename.csv clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --query "INSERT INTO yhw.speed_sdk_info JSONEachRow" < /web/tablename.json
导出数据:
-- 1 创建 supplier 表 CREATE TABLE supplier( S_SUPPKEY UInt32, S_NAME String, S_ADDRESS String, S_CITY LowCardinality(String), S_NATION LowCardinality(String), S_REGION LowCardinality(String), S_PHONE String )ENGINE = MergeTree ORDER BY S_SUPPKEY; -- 2 将数据加载到 supplier 表 clickhouse-client -h 127.0.0.1 --port 9000 -u default --password KavrqeN1 --query "INSERT INTO supplier FORMAT CSV" < supplier.tbl
# 1 数据导出 # 说明,--query="SQL",sql的语法格式为 SELECT * FROM 表名 FORMAT 输出格式 # 1.1 以CSV格式,指定需要导出的某些字段信息 clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --query "SELECT S_SUPPKEY, S_NAME, S_ADDRESS,S_CITY,S_NATION,S_REGION,S_PHONE FROM supplier FORMAT CSV" > /opt/supplier.tb0.csv # 1.2 以CSV格式,导出表中所有字段的数据 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT CSV" > /opt/supplier.tb1.csv # 查看导出的数据文件 head -n 5 supplier.tb1.csv 1,"Supplier#000000001","sdrGnXCDRcfriBvY0KL,i","PERU 0","PERU","AMERICA","27-989-741-2988" 2,"Supplier#000000002","TRMhVHz3XiFu","ETHIOPIA 1","ETHIOPIA","AFRICA","15-768-687-3665" 3,"Supplier#000000003","BZ0kXcHUcHjx62L7CjZS","ARGENTINA7","ARGENTINA","AMERICA","11-719-748-3364" 4,"Supplier#000000004","qGTQJXogS83a7MB","MOROCCO 4","MOROCCO","AFRICA","25-128-190-5944" 5,"Supplier#000000005","lONEYAh9sF","IRAQ 5","IRAQ","MIDDLE EAST","21-750-942-6364" # 1.3 以CSV格式带表头信息形式,导出表中所有字段的数据 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT CSVWithNames" > /opt/supplier.tb2.csv # 查看导出的数据文件 head -n 5 supplier.tb2.csv "S_SUPPKEY","S_NAME","S_ADDRESS","S_CITY","S_NATION","S_REGION","S_PHONE" 1,"Supplier#000000001","sdrGnXCDRcfriBvY0KL,i","PERU 0","PERU","AMERICA","27-989-741-2988" 2,"Supplier#000000002","TRMhVHz3XiFu","ETHIOPIA 1","ETHIOPIA","AFRICA","15-768-687-3665" 3,"Supplier#000000003","BZ0kXcHUcHjx62L7CjZS","ARGENTINA7","ARGENTINA","AMERICA","11-719-748-3364" 4,"Supplier#000000004","qGTQJXogS83a7MB","MOROCCO 4","MOROCCO","AFRICA","25-128-190-5944" # 1.4 以制表分隔符形式导出数据 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT TabSeparated``" > /opt/supplier.tb3.txt # 查看导出的数据文件 head -n 5 supplier.tb3.txt 1 Supplier#000000001 sdrGnXCDRcfriBvY0KL,i PERU 0 PERU AMERICA 27-989-741-2988 2 Supplier#000000002 TRMhVHz3XiFu ETHIOPIA 1 ETHIOPIA AFRICA 15-768-687-3665 3 Supplier#000000003 BZ0kXcHUcHjx62L7CjZS ARGENTINA7 ARGENTINA AMERICA 11-719-748-3364 4 Supplier#000000004 qGTQJXogS83a7MB MOROCCO 4 MOROCCO AFRICA 25-128-190-5944 5 Supplier#000000005 lONEYAh9sF IRAQ 5 IRAQ MIDDLE EAST 21-750-942-6364 # 1.5 带表头信息的方式,以制表符方式导出数据文件。TabSeparatedWithNames 等价于 TSVWithNames # 在解析这种文件时第一行会被完全忽略 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT TSVWithNames" > /opt/supplier.tb4.txt # 查看导出的数据文件 head -n 5 supplier.tb4.txt S_SUPPKEY S_NAME S_ADDRESS S_CITY S_NATION S_REGION S_PHONE 1 Supplier#000000001 sdrGnXCDRcfriBvY0KL,i PERU 0 PERU AMERICA 27-989-741-2988 2 Supplier#000000002 TRMhVHz3XiFu ETHIOPIA 1 ETHIOPIA AFRICA 15-768-687-3665 3 Supplier#000000003 BZ0kXcHUcHjx62L7CjZS ARGENTINA7 ARGENTINA AMERICA 11-719-748-3364 4 Supplier#000000004 qGTQJXogS83a7MB MOROCCO 4 MOROCCO AFRICA 25-128-190-5944 # 1.6 带表头信息的方式,以制表符方式导出数据文件。TabSeparatedWithNamesAndTypes 等价于 TSVWithNamesAndTypes # 在解析这种文件时前两行会被完全忽略 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT TabSeparatedWithNamesAndTypes" > /opt/supplier.tb5.txt # 查看导出的数据文件 head -n 5 supplier.tb5.txt S_SUPPKEY S_NAME S_ADDRESS S_CITY S_NATION S_REGION S_PHONE UInt32 String String LowCardinality(String) LowCardinality(String) LowCardinality(String) String 1 Supplier#000000001 sdrGnXCDRcfriBvY0KL,i PERU 0 PERU AMERICA 27-989-741-2988 2 Supplier#000000002 TRMhVHz3XiFu ETHIOPIA 1 ETHIOPIA AFRICA 15-768-687-3665 3 Supplier#000000003 BZ0kXcHUcHjx62L7CjZS ARGENTINA7 ARGENTINA AMERICA 11-719-748-3364 # 1.7 以 KV 形式输出每一行,和前面的 TabSeparated 类似,不过是 name=value 的格式 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT TSKV" > /opt/supplier.tb6.txt # 查看导出的数据文件 head -n 5 supplier.tb6.txt S_SUPPKEY=1 S_NAME=Supplier#000000001 S_ADDRESS=sdrGnXCDRcfriBvY0KL,i S_CITY=PERU 0 S_NATION=PERU S_REGION=AMERICA S_PHONE=27-989-741-2988 S_SUPPKEY=2 S_NAME=Supplier#000000002 S_ADDRESS=TRMhVHz3XiFu S_CITY=ETHIOPIA 1 S_NATION=ETHIOPIA S_REGION=AFRICA S_PHONE=15-768-687-3665 S_SUPPKEY=3 S_NAME=Supplier#000000003 S_ADDRESS=BZ0kXcHUcHjx62L7CjZS S_CITY=ARGENTINA7 S_NATION=ARGENTINA S_REGION=AMERICA S_PHONE=11-719-748-3364 S_SUPPKEY=4 S_NAME=Supplier#000000004 S_ADDRESS=qGTQJXogS83a7MB S_CITY=MOROCCO 4 S_NATION=MOROCCO S_REGION=AFRICA S_PHONE=25-128-190-5944 S_SUPPKEY=5 S_NAME=Supplier#000000005 S_ADDRESS=lONEYAh9sF S_CITY=IRAQ 5 S_NATION=IRAQ S_REGION=MIDDLE EAST S_PHONE=21-750-942-6364 # 1.8 以元组形式打印每一行,每个括号用英文逗号分割 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier LIMIT 3 FORMAT Values" > /opt/supplier.tb7.txt # 查看导出的数据文件 head supplier.tb7.txt (1,'Supplier#000000001','sdrGnXCDRcfriBvY0KL,i','PERU 0','PERU','AMERICA','27-989-741-2988'),(2,'Supplier#000000002','TRMhVHz3XiFu','ETHIOPIA 1','ETHIOPIA','AFRICA','15-768-687-3665'),(3,'Supplier#000000003','BZ0kXcHUcHjx62L7CjZS','ARGENTINA7','ARGENTINA','AMERICA','11-719-748-3364') # 1.9 以 JSON 形式打印每一行 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT JSONEachRow" > /opt/supplier.tb8.json # 查看导出的数据文件 head -n 5 supplier.tb8.json {"S_SUPPKEY":1,"S_NAME":"Supplier#000000001","S_ADDRESS":"sdrGnXCDRcfriBvY0KL,i","S_CITY":"PERU 0","S_NATION":"PERU","S_REGION":"AMERICA","S_PHONE":"27-989-741-2988"} {"S_SUPPKEY":2,"S_NAME":"Supplier#000000002","S_ADDRESS":"TRMhVHz3XiFu","S_CITY":"ETHIOPIA 1","S_NATION":"ETHIOPIA","S_REGION":"AFRICA","S_PHONE":"15-768-687-3665"} {"S_SUPPKEY":3,"S_NAME":"Supplier#000000003","S_ADDRESS":"BZ0kXcHUcHjx62L7CjZS","S_CITY":"ARGENTINA7","S_NATION":"ARGENTINA","S_REGION":"AMERICA","S_PHONE":"11-719-748-3364"} {"S_SUPPKEY":4,"S_NAME":"Supplier#000000004","S_ADDRESS":"qGTQJXogS83a7MB","S_CITY":"MOROCCO 4","S_NATION":"MOROCCO","S_REGION":"AFRICA","S_PHONE":"25-128-190-5944"} {"S_SUPPKEY":5,"S_NAME":"Supplier#000000005","S_ADDRESS":"lONEYAh9sF","S_CITY":"IRAQ 5","S_NATION":"IRAQ","S_REGION":"MIDDLE EAST","S_PHONE":"21-750-942-6364"} # 1.10 以二进制格式逐行格式化和解析数据 clickhouse-client -h 10.0.0.239 --port 90066 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT RowBinary" > /opt/supplier.tb9.dat
七、测试. 创建本地复制表和分布式表
1、分片与副本
创建分布式式表之前我们进一步解下 ClickHouse 的分片与副本,我们配置文件中我们的配置如下,一般一个 replica 对应一个 host,一个 shard 可以配置多个 replica ,一个集群可以定义多个 shard,ClickHouse 还可以定义为多集群方式,多集群方式包好多个子集群,ClickHouse 的配置是比较灵活的。
<?xml version="1.0"?> <yandex> <!--ck集群节点--> <clickhouse_remote_servers> <!-- 定义的集群名 idc_cluster--> <idc_cluster> <!-- 在分布式表中的这个 shard 内只选择一个合适的 replica 写入数据。如果为本地表引擎为 ReplicatedMergeTree ,多个副本之间的数据交由引擎自己处理 --> <shard> <!-- 分片权重值, 默认为 1,官方建议这个值不要设置的太大,分一个分片的权重值越大,被写入数据的就会越多 --> <weight>1</weight> <replica> <host>cdhserver2</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> <!--分片2--> <shard> <weight>1</weight> <replica> <host>cdhserver3</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> <!--分片3--> <shard> <weight>1</weight> <replica> <host>cdhserver4</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> </idc_cluster> </clickhouse_remote_servers> </yandex>
ClickHouse 的分片是以副本的形式表现,当一个分片中定义一个副本,则这个可以理解为就是分片的实现,如果一个分片中定义多个副本,副本中会选举(通过 ZK 选举)一个作为主副本,其他同一分片内的副本同步主副本的数据。一个表通过水平切片分为多个分片,写入多个节点的磁盘上,从而实现水平扩展和分区容错。副本之间的数据读写可以通过 ReplicatedMergeTree 引擎来实现。
2、分布式表的理解
分布表(Distributed)本身不存储数据,相当于路由,在创建时需要指定集群名、数据库名、数据表名、分片KEY,这里分片用rand()函数,表示随机分片。查询分布式表会根据集群配置信息,路由到具体的数据表,再把结果进行合并。分布式表创建时不会与本地表结构进行一致性的检查,与 Hive 类似,只有在读取数据时会验证数据与表结构是否有误,如果有误则会抛出错误。
创建分布式表的命名规则:
本地表:表名一般以 _local 为后缀,本地表是承接数据的载体,可以使用非 Distributed 的任一表引擎,在分布式表中,一张本地表对应了一个数据分片。
分布式表:一般以 _all 为后缀命名表名,表引擎只能使用 Distributed,与本地表之间形成一对多的映射,创建完毕后,后期对数据的操作可以通过分布式表操作多张本地表
3、分片规则
分布式表创建的语法如下:
-- clusterName,集群名,既 <perftest_3shards_1replicas> 标签的名,这个名字配置时可自定义 -- databases,指定对应的库名 -- table,指定对应的表名 -- sharding_key,分片键,可选项 Distributed(clusterName, databases, table[, sharding_key[, policy_name]])
从分布式表定义的语法可以看到,我们可以指定分片键,这个参数也就是分片的规则。对于分片键,它要求返回一个整型类型的数值(可以为 Intx系列和 UInt 系列):
默认:如果没有声明分片键,那么分布式表只能包含一个分片,也就是分布式表只能映射一张本地表,否则写入数据时会报错。
根据字段值进行分片:Distributed(cluster, databases, table, userId)
随机进行分片:Distributed(cluster, databases, table, rand())
根据字段hash值进行分片:Distributed(cluster, databases, table, intHash64(userId))
同时还会受到分片权重的的影响(weight),在定义集群分片配置时可以设置,默认 weight=1,分片的权重会影响数据在分片中的倾斜程度,分片权重的值越大,写入这个分片的数据就会越多
八、优化
1、max_table_size_to_drop
默认情况下, Clickhouse 不允许删除分区或表的大小大于 50GB 的分区或表. 可以通过修改server的配置文件来永久配置. 也可以临时设置一下来删除而不用重启服务.
永久配置
sudo vim /etc/clickhouse-server/config.xml
然后注释掉下面两行
<!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
<!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->
0表示不限制. 或者你可以设置为你想限制的最大的大小.
临时设置
创建个标志文件:
sudo touch '/home/username/clickhouse/flags/force_drop_table' && sudo chmod 666 '/home/username/clickhouse/flags/force_drop_table'
创建好之后, 就可以执行上面的删除分区或表的命令了.
2、max_memory_usage
此参数在/etc/clickhouse-server/users.xml中,表示档次query占内存最大值,超过本值query失败,建议在资源足够情况尽量调大
<max_memory_usage>25000000000</max_memory_usage>
3、删除多个节点上的同张表
drop table tabl on cluster clickhouse_cluster
https://blog.csdn.net/github_39577257/article/details/103066747?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~first_rank_v2~rank_v28-1-103066747.nonecase&utm_term=clickhouse%E9%9B%86%E7%BE%A4%E8%A7%86%E5%9B%BE%E5%88%9B%E5%BB%BA&spm=1000.2123.3001.4430
posted on 2020-09-21 12:43 uestc2007 阅读(2250) 评论(1) 编辑 收藏 举报