一、常见备份方式
1.1 备份方式
备份方式 | 特点 |
物理文件备份 | 对物理文件进行拷贝,备份期间禁止数据写入 |
dump数据导入导出 | 备份方式灵活,但备份速度慢 |
快照表备份 | 制作_bak表进行备份 |
FREEZE备份 | 表(分区表、非分区表)分区备份,可通过attach进行装载恢复 |
FETCH备份 | ReplicaMergeTree引擎的分区表分区备份,可通过attach进行装载恢复 |
元数据备份 | 建表、建库语句备份 |
1.2 恢复方式:
恢复方式 | 特点 |
attach | 通过对表分区的备份文件进行装载恢复,需要将备份文件放在对应的detached目录下 |
二、冷备
通过对物理文件的拷贝也可达到备份的效果,但是需要注意的是通过拷贝物理文件进行备份期间,要避免数据写入。 1.物理文件的备份主要分为两部分: (1) 表数据文件备份,${datadir}/data下可对某个数据库、某张表、某个分区文件进行备份 (2) 元数据备份,${datadir}/metadata下可对某个数据库、某张表的建表语句进行备份
2. 备份数据库文件
复制clickhouse完整的数据目录,数据的目录路径是config.xml的path属性的值;
恢复也很简单,只需要将数据目录替换为备份的数据目录即可.
三、热备
:) show databases; ┌─name───────────────┐ │ INFORMATION_SCHEMA │ │ default │ │ information_schema │ │ system │ │ test_log │ └────────────────────┘ :) use test_log; :) show tables; ┌─name───────┐ │ test_all │ │ test_local │ │ user_all │ └────────────┘ :) select * from test_all; # 192.168.12.91服务器数据:test_local ┌─id─┬─name─────┐ │ 1 │ zhangsan │ │ 2 │ lisi │ │ 7 │ yw │ │ 8 │ xc │ └────┴──────────┘ # 192.168.12.90服务器数据:test_local ┌─id─┬─name───┐ │ 3 │ wangm │ │ 4 │ lijing │ │ 9 │ cx │ │ 10 │ aa │ │ 13 │ kkkk │ └────┴────────┘ # 192.168.12.88服务器数据:test_local ┌─id─┬─name──────┐ │ 5 │ zhangquan │ │ 6 │ lihua │ │ 11 │ bb │ │ 12 │ acca │ └────┴───────────┘
3.1 数据导入导出
3.1.1 通过查询语句将指定数据导出到tsv文件进行备份,备份数据比较灵活,并且可通过客户端工具进行数据导入恢复。
导出文件备份
# 导出带表头的txt文件 clickhouse-client -h 127.0.0.1 --database="test01" -u default --password "123456" --format_csv_delimiter="|" --query="select * from pet_barrages FORMAT CSVWithNames" >local.txt # 导出不带表头的txt文件 clickhouse-client -h 127.0.0.1 --database="test01" -u default --password "123456" --format_csv_delimiter='|' --query="select * from student FORMAT CSV" > /root/local.txt # 导出带表头的csv文件 带表头即有标题的:name、age clickhouse-client --password "123456" -d test01 -q "select * from local FORMAT CSVWithNames" --format_csv_delimiter='|' > /data/local.csv # 导出不带表头的csv文件 clickhouse-client --password "123456" -d test01 -q "select * from local FORMAT CSV" --format_csv_delimiter='|' > /data/local.csv
导入文件:
# 导入带表头的txt文件 clickhouse-client -h 127.0.0.1 --database="test01" -u default --password "123456" --format_csv_delimiter='|' --query="insert into local FORMAT CSVWithNames" < local.txt # 导入不表头的txt文件 clickhouse-client --password 123456 -d test01 -q " insert into local FORMAT CSV" --format_csv_delimiter='|' < local.txt # 导入带表头的csv文件 clickhouse-client --password 123456 -d default -q " insert into local FORMAT CSVWithNames" --format_csv_delimiter='|' < /data/local.csv # 导入不表头的csv文件 clickhouse-client --password 123456 -d default -q " insert into local FORMAT CSV" --format_csv_delimiter='|' < /data/local.csv
四、备份工具clickhouse-backup
下载地址:https://github.com/Altinity/clickhouse-backup/releases
它有以下特点:
1. 可以轻松创建和恢复所有或特定表的备份
2. 在文件系统上高效存储多个备份
3. 通过流压缩上传和下载
4. 支持增量备份在远程存储上
5. 与AWS、Azure、GCS、腾讯COS、FTP兼容
tar -xzvf clickhouse-backup-linux-amd64.tar.gz
mv build/linux/amd64/clickhouse-backup /usr/bin/
mkdir /etc/clickhouse-backup
chown -R clickhouse:clickhouse /etc/clickhouse-backup
配置文件配置:
# 查看默认配置文件命令
clickhouse-backup default-config
vim /etc/clickhouse-backup/config.xml
general:
remote_storage: none # 默认为none,如果通过sftp上传到远程服务器,需要这个参数这只为sftp
disable_progress_bar: false
backups_to_keep_local: 7 # 本地备份的个数,大于7的自动删除旧的备份,默认为0,不删除备份
backups_to_keep_remote: 7 # 远程备份个数
log_level: info
allow_empty_backups: false
clickhouse:
username: default
password: "123456"
host: 192.168.12.91
port: 9000
data_path: "/data/clickhouse/clickhouse/" # clickhouse数据目录
skip_tables: # 不需要备份的库
- system.*
- default.*
- INFORMATION_SCHEMA.*
- information_schema.*
timeout: 5m
freeze_by_part: false
# 备份命令
[root@ backup]# clickhouse-backup create my_backup --config /etc/clickhouse-backup/config.xml
2023/07/24 18:00:32.259794 info clickhouse connection prepared: tcp://192.168.12.91:9000 run ping logger=clickhouse
2023/07/24 18:00:32.260590 info clickhouse connection open: tcp://192.168.12.91:9000 logger=clickhouse
2023/07/24 18:00:32.260619 info SELECT metadata_path FROM system.tables WHERE database = 'system' AND metadata_path!='' LIMIT 1; logger=clickhouse
2023/07/24 18:00:32.263036 info SELECT name, engine FROM system.databases WHERE name NOT IN ('system','INFORMATION_SCHEMA','information_schema','_temporary_and_external_tables','default') logger=clickhouse
2023/07/24 18:00:32.264230 info SHOW CREATE DATABASE `pet_battle_dev` logger=clickhouse
2023/07/24 18:00:32.264811 info SHOW CREATE DATABASE `pet_battle_test` logger=clickhouse
2023/07/24 18:00:32.265394 info SHOW CREATE DATABASE `test02` logger=clickhouse
2023/07/24 18:00:32.266068 info SELECT name, count(*) as is_present FROM system.settings WHERE name IN (?, ?) GROUP BY name with args [show_table_uuid_in_table_create_query_if_not_nil display_secrets_in_show_and_select] logger=clickhouse
2023/07/24 18:00:32.267830 info SELECT name FROM system.databases WHERE engine IN ('MySQL','PostgreSQL','MaterializedPostgreSQL') logger=clickhouse
2023/07/24 18:00:32.271753 info SELECT countIf(name='data_path') is_data_path_present, countIf(name='data_paths') is_data_paths_present, countIf(name='uuid') is_uuid_present, countIf(name='create_table_query') is_create_table_query_present, countIf(name='total_bytes') is_total_bytes_present FROM system.columns WHERE database='system' AND table='tables' logger=clickhouse
2023/07/24 18:00:32.273981 info SELECT database, name, engine , data_paths , uuid , create_table_query , coalesce(total_bytes, 0) AS total_bytes FROM system.tables WHERE is_temporary = 0 ORDER BY total_bytes DESC SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1 logger=clickhouse
2023/07/24 18:00:32.289898 info SELECT metadata_path FROM system.tables WHERE database = 'system' AND metadata_path!='' LIMIT 1; logger=clickhouse
2023/07/24 18:00:32.291693 info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='pet_battle_dev' AND table='test_tbl' GROUP BY database, table logger=clickhouse
2023/07/24 18:00:32.293367 info SELECT count() as cnt FROM system.columns WHERE database='system' AND table='functions' AND name='create_query' SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2023/07/24 18:00:32.294880 info SELECT name, create_query FROM system.functions WHERE create_query!='' logger=clickhouse
2023/07/24 18:00:32.296301 info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2023/07/24 18:00:32.297124 info SELECT count() is_disk_type_present FROM system.columns WHERE database='system' AND table='disks' AND name='type' logger=clickhouse
2023/07/24 18:00:32.298539 info SELECT path, any(name) AS name, any(type) AS type FROM system.disks GROUP BY path logger=clickhouse
2023/07/24 18:00:32.299913 info SELECT count() is_parts_column_present FROM system.tables WHERE database='system' AND name='parts_columns' logger=clickhouse
2023/07/24 18:00:32.301299 info SELECT column, groupUniqArray(type) AS uniq_types FROM system.parts_columns WHERE active AND database=? AND table=? GROUP BY column HAVING length(uniq_types) > 1 with args [test02 test_local] logger=clickhouse
2023/07/24 18:00:32.303138 info ALTER TABLE `test02`.`test_local` FREEZE WITH NAME '2aac621edb25476096bd0b6abd3d0d51'; logger=clickhouse
2023/07/24 18:00:32.306048 info ALTER TABLE `test02`.`test_local` UNFREEZE WITH NAME '2aac621edb25476096bd0b6abd3d0d51' logger=clickhouse
2023/07/24 18:00:32.306607 info SELECT mutation_id, command FROM system.mutations WHERE is_done=0 AND database=? AND table=? with args [test02 test_local] logger=clickhouse
2023/07/24 18:00:32.308310 info done backup=my_backup logger=backuper operation=create table=test02.test_local
2023/07/24 18:00:32.308351 info SELECT column, groupUniqArray(type) AS uniq_types FROM system.parts_columns WHERE active AND database=? AND table=? GROUP BY column HAVING length(uniq_types) > 1 with args [pet_battle_dev test_tbl] logger=clickhouse
2023/07/24 18:00:32.310104 info ALTER TABLE `pet_battle_dev`.`test_tbl` FREEZE WITH NAME '3a4112e17f994030b5c1b929fcbc3398'; logger=clickhouse
2023/07/24 18:00:32.311419 info ALTER TABLE `pet_battle_dev`.`test_tbl` UNFREEZE WITH NAME '3a4112e17f994030b5c1b929fcbc3398' logger=clickhouse
2023/07/24 18:00:32.311824 info SELECT mutation_id, command FROM system.mutations WHERE is_done=0 AND database=? AND table=? with args [pet_battle_dev test_tbl] logger=clickhouse
2023/07/24 18:00:32.313312 info done backup=my_backup logger=backuper operation=create table=pet_battle_dev.test_tbl
2023/07/24 18:00:32.313472 warn supports only schema backup backup=my_backup engine=Distributed logger=backuper operation=create table=test02.test_all
2023/07/24 18:00:32.313500 info SELECT mutation_id, command FROM system.mutations WHERE is_done=0 AND database=? AND table=? with args [test02 test_all] logger=clickhouse
2023/07/24 18:00:32.314993 info done backup=my_backup logger=backuper operation=create table=test02.test_all
2023/07/24 18:00:32.315022 info SELECT value FROM `system`.`build_options` where name='VERSION_DESCRIBE' logger=clickhouse
2023/07/24 18:00:32.316068 info done backup=my_backup duration=56ms logger=backuper operation=create
# create 指定备份名称,如果不指定备份名称,默认格式为:2023-07-24T10-21-13 伦敦时间
clickhouse-backup create --config /etc/clickhouse-backup/config.xml
# 默认备份文件备份到数据目录下的backup目录下
/data/clickhouse/clickhouse/backup/
# 查看已生成的备份
clickhouse-backup --config /etc/clickhouse-backup/config.xml list
# 删除已备份的数据库
clickhouse-backup delete local/remote my_backup --config /etc/clickhouse-backup/config.xml
local/remote 选择删除本地服务器还是删除远程服务求的参数,remote远程服务器上备份
# 恢复数据库
clikhouse restore my_backup
# 查看可以备份的表
clickhouse-backup --config config.xml list
# 定时备份脚本
#!/bin/bash
BACKUP_NAME="backup_$(date +%Y%m%d%H)"
/usr/bin/clickhouse-backup create $BACKUP_NAME --config /etc/clickhouse-backup/config.xml