pgspider Citu 副本以及节点移除简单学习
使用citus 使用副本配置对于灾备处理是比较重要的,以下是一个简单的学习
环境准备
使用docker-compose运行
- docker-compose 文件
version: "3"
services:
graphql-engine:
image: hasura/graphql-engine:v1.1.0
ports:
- "8080:8080"
environment:
HASURA_GRAPHQL_DATABASE_URL: postgres://postgres:dalong@pg-citus-master:5432/postgres
HASURA_GRAPHQL_ENABLE_CONSOLE: "true" # set to "false" to disable console
HASURA_GRAPHQL_ENABLED_LOG_TYPES: startup, http-log, webhook-log, websocket-log, query-log
pg-citus-master:
container_name: pg-citus-master
image: dalongrong/pgspider:citus-9.1
volumes:
- "./csvfiles:/opt/csv"
- "./sql:/docker-entrypoint-initdb.d/"
ports:
- "5432:5432"
environment:
- "POSTGRES_PASSWORD=dalong"
pg-citus-worker:
container_name: pg-citus-worker
image: dalongrong/pgspider:citus-9.1
volumes:
- "./csvfiles:/opt/csv"
- "./sql:/docker-entrypoint-initdb.d/"
ports:
- "5433:5432"
pg-citus-worker2:
container_name: pg-citus-worker2
image: dalongrong/pgspider:citus-9.1
volumes:
- "./csvfiles:/opt/csv"
- "./sql:/docker-entrypoint-initdb.d/"
ports:
- "5434:5432"
- init sql
主要是扩展的创建
-- wrap in transaction to ensure Docker flag always visible
BEGIN;
CREATE EXTENSION citus;
COMMIT;
- 需要的数据
都是来自官方文档
curl https://examples.citusdata.com/tutorial/companies.csv > csvfiles/scompanies.csv
curl https://examples.citusdata.com/tutorial/campaigns.csv > csvfiles/campaigns.csv
curl https://examples.citusdata.com/tutorial/ads.csv > csvfiles/ads.csv
- 启动
docker-compose up -d
基本citus 使用
master 节点操作,数据的导入在创建分布式表前后都可以的
- 创建表
CREATE TABLE companies (
id bigint NOT NULL,
name text NOT NULL,
image_url text,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL
);
CREATE TABLE campaigns (
id bigint NOT NULL,
company_id bigint NOT NULL,
name text NOT NULL,
cost_model text NOT NULL,
state text NOT NULL,
monthly_budget bigint,
blacklisted_site_urls text[],
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL
);
CREATE TABLE ads (
id bigint NOT NULL,
company_id bigint NOT NULL,
campaign_id bigint NOT NULL,
name text NOT NULL,
image_url text,
target_url text,
impressions_count bigint DEFAULT 0,
clicks_count bigint DEFAULT 0,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL
);
- 添加主键
ALTER TABLE companies
ADD PRIMARY KEY (id);
ALTER TABLE campaigns
ADD PRIMARY KEY (id, company_id);
ALTER TABLE ads
ADD PRIMARY KEY (id, company_id);
- 添加worker 节点
SELECT
master_add_node ('pg-citus-worker',
'5432');
SELECT
master_add_node ('pg-citus-worker2',
'5432');
- 创建分布式表
// 配置副本数为2,因为是2个worker,为了简单,同时测试灾备
SET citus.shard_replication_factor = 2;
SELECT create_distributed_table('companies', 'id');
SELECT create_distributed_table('campaigns', 'company_id');
SELECT create_distributed_table('ads', 'company_id');
灾备处理
前边通过membership-manager 的原理大概也就知道了,就是删除节点的分片原数据,同时删除节点。
但是需要注意数据的迁移的处理(节点添加数据的reblance 才是比较难的)
- 停止worker2
因为副本数为2,所以可以停止一个节点
docker-compose stop pg-citus-worker2
- 删除分片原数据
DELETE FROM pg_dist_placement WHERE groupid = (SELECT groupid FROM pg_dist_node WHERE nodename ='pg-citus-worker2' AND nodeport = '5432' LIMIT 1);
- 移除节点
SELECT master_remove_node('pg-citus-worker2', '5433')
- 数据查询
如果通过数据查询,会发现业务是无感知的,而且不影响数据查询处理
说明
citus 开源版本提供了一些操作函数,我们利用这些函数还是可以方便的维护citus 集群的,如果感觉维护费事,yugabyte 以及cockroachdb
都是可选的方案,但是个人推荐yugabyte
参考资料
http://docs.citusdata.com/en/v9.2/admin_guide/cluster_management.html
https://github.com/citusdata/membership-manager/blob/master/manager.py