bayaim-OceanBase合并问题-工单处理常用SQL1
OceanBase-合并问题-工单处理常用SQL1 全量合并 ------------------------------------------------------------- OB 默认是 增量合并 【发起的指定表的 全量合并 】 1、修改表的模式, 全量合并 https://www.oceanbase.com/knowledge-base/oceanbase-database-20000001021?back=kb 在全量合并过程中,会把当前的静态数据都读取出来,和内存中的动态数据合并后,再写到磁盘上去作为新的静态数据。 可以将表属性 progressive_merge_num 设置为 1,表示将合并方式修改为全量合并。 ALTER TABLE $table SET progressive_merge_num=1; ALTER TABLE xxx SET progressive_merge_num = 1; ----全量合并 用户租户执行一下这个 obclient> alter system set "_pushdown_storage_level" = 0; ---- 下压存储 Query OK, 0 rows affected (0.03 sec) 2、清除所有计划缓存: ALTER SYSTEM FLUSH PLAN CACHE; 3、OB合并触发方式-手动合并 可以在"root@sys"用户下,通过以下命令发起手动合并: alter system major freeze; FLUSH PLAN CACHE https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000221273 清除所有计划缓存。 ALTER SYSTEM FLUSH PLAN CACHE; 清除 MySQL 和 Oracle 租户的计划缓存。 ALTER SYSTEM FLUSH PLAN CACHE TENANT = 'MySQL','Oracle'; 在 sys 租户下删除指定的计划缓存。 ALTER SYSTEM FLUSH PLAN CACHE sql_id='B601070DFC14CB85FDA3766A69A9E1B3' databases='myob1' tenant='tenant1' GLOBAL; 普通租户登录 DBA 清除自己的计划缓存。 ALTER SYSTEM FLUSH PLAN CACHE sql_id='B601070DFC14CB85FDA3766A69A9E1B3' databases='myob1' GLOBAL; ------------------------------------------------------------- 【__all_rootservice_event_history】 查看合并历史记录 select * from __all_rootservice_event_history where module='daily_merge' order by 1 desc limit 20; select * from __all_rootservice_event_history where module = "daily_merge" and event like "%merge_error%" order by gmt_create desc limit 1; 【__all_server_event_history】 select * from __all_server_event_history order by gmt_create desc limit 20; select * from __all_server_event_history where gmt_create between '2023-11-14 14:56:00' and '2023-11-14 19:59:10' and value1 like '%我们这个创建索引的关键字%' order by gmt_create desc limit 20 ; 合并转储异常问题处理文档 存储引擎概览 https://yuque.antfin.com/ob/storage/rva6ky***** 存储结构 https://yuque.antfin.com/ob/storage/iqxg9x*** 4.0如何排查合并卡住的问题 https://yuque.antfin.com/ob/rootservice/zt2vp1*** 合并相关问题 https://yuque.antfin.com/ob/transaction/pd3d3n*** 合并慢/合并超时排查手册 https://yuque.antfin.com/ob/gs5qfu/gb5sfq**** 转储的实现 https://yuque.antfin.com/ob/gs5qfu/dfasgv*** 转储合并部分参数说明 https://yuque.antfin.com/ob/gs5qfu/yr9lpd*** compaction问题排查宝典 https://yuque.antfin.com/ob/gs5qfu/zsomi6#yQS2R**** Compaction虚拟表指南 https://yuque.antfin.com/ob/gs5qfu/ag00i4*** 合并问题排查指引 https://yuque.antfin.com/hznvk9/dbm6i7/msx5y6hxegp3i9lb#t6FsT**** 11、合并 —————————————————————————————————————————————— 【查看被合并事件:】查看合并记录和状态: __all_rootservice_event_history表,查看合并记录: __all_zone表, 查看当前合并状态: select * from __all_zone where name = "frozen_version" or name = "last_merged_version"; select * from __all_zone where name = "merge_status"; select * from __all_zone where name = "global_broadcast_version" or name = "broadcast_version"; 一、日志搜查询方法 observer.log : [ 表示活跃事务,导致卡合并 ] memtable cannot be minor merged now(ret=-4288 active_trx_count:1 fgrep 'packet fly cost too much time' observer.log* fgrep "packet fly cost too much time" observer.log* --- OB主机之间时钟偏移过大,OCP也有告警的 fgrep "Too many sstables, minor merge is not safe now" observer.log* --- 1、过滤合并进度 cat observer.log.2020122121*|grep "flush macro"|tail 2、查看是否合并 grep "check merge progress" rootservice.log |less 3、查看没有合并的版本 grep "partition's majority replicas not merged" rootservice.log grep 'WARN ' observer.log |grep merge RS日志: grep "check merge progress success" rootservice.log |tail -10 grep "daily_merge" rootservice.log grep ERROR rootservice.log | more fgrep "partition doesn't have leader" rootservice.log grep "check merge progress success" rootservice.log |tail -10 fgrep "long time no daily merge" rootservice.log fgrep "ERROR" rootservice.log rootservice.log.20230927095125:[2023-09-27 09:51:22.876676] WARN [RS] build_single_pg_leader_candidates (ob_leader_coordinator.cpp:4709) [2994080][0][YB420A08C088-00060170302DDD52-0-0] [lt=4] [dc=0] partition doesn't have leader(pkey={tid:1269935930081506, partition_id:0, part_cnt:0}) ---查询合并情况 grep "WARN " rootservice.log | grep merge grep "SERVER_BLACKLIST" observer.log.20230303* |grep -E 'diff|clog' 1、过滤合并进度 cat observer.log.2020122121*|grep "flush macro"|tail 2、查看是否合并 grep "check merge progress" rootservice.log |less grep "check merge progress success" rootservice.log |tail -10 ----查询合并情况 3、查看没有合并的版本 grep "partition's majority replicas not merged" rootservice.log 4、确认转储/合并是否成功: // 查看是否有转储执行失败的日志, pkey处填分区键 如果memtable未转储 // 日志量少的话,可以直接搜这条日志,用Tenant ID和Tablet ID过滤后,检查是否有失败的转储和合并DAG grep "dag_ret:-" observer.log.xxxx* | vi - grep "dag_ret:-" observer.log | vi - 然后搜是否有无法转储partition的表id // 查看是否有转储执行失败的日志, pkey处填分区键 grep "dag_ret:-.*pkey" observer.log.20xxxx* // 检查转储是否成功生成了对应的转储/合并sstable grep "sstable merge finish" observer.log | vi - 查看合并调度情况: grep "daily.*merge.*ret=-" rootservice.log grep "daily.*merge" rootservice.log grep "ob_daily_merge_scheduler.cpp" rootservice.log 其他 1、过滤合并进度 cat observer.log.2020122121*|grep "flush macro"|tail 2、查看是否合并 grep "check merge progress" rootservice.log |less 3、查看没有合并的版本 grep "partition's majority replicas not merged" rootservice.log 4、 合并是否进行就是看 all_meta_table 根据 data_version group by https://yuque.antfin-inc.com/ob/support/kcrx4b 二、查询SQL: 登录sys系统租户后,先通过Zone状态表查询合并状态,定位到具体合并失败Zone: select * from __all_zone where name in ('merge_status','all_merged_version','last_merged_time'); 将版本号254代入下面查询合并进度SQL查询where条件“major_version”中,即可定位到合并异常server地址: SELECT zone, svr_ip, major_version, macro_block_count, use_old_macro_block_count , merge_start_time, merge_finish_time, merge_process, merge_finish_time - merge_start_time AS cost_time , macro_block_count - use_old_macro_block_count AS merge_macro_block_count , (macro_block_count - use_old_macro_block_count) / (merge_finish_time - merge_start_time) AS avg_per_sec FROM __all_virtual_partition_sstable_image_info WHERE major_version = 438 AND merge_process <> 100 ORDER BY zone, svr_ip, major_version; select * from __all_virtual_tenant_memstore_info where (active_memstore_used > major_freeze_trigger) or (total_memstore_used > memstore_limit); select count(*),tenant_id from oceanbase.__all_virtual_partition_table where role=1 and svr_ip='10.8.148.180' group by tenant_id; select * from __all_virtual_trans_stat where is_exiting = 0 and (now() - ctx_create_time) > 1000; (1.x) select * from __all_meta_table where data_version != 254; select * from __all_root_table where data_version != 254; select * from __all_virtual_core_meta_table where data_version != 254; select * from __all_virtual_core_root_table where data_version != 254; 统计当前集群版本情况: select data_version,count(1) from __all_meta_table group by 1; 统计当前未合并成功版本的分布情况: select zone,svr_ip,count(1) from __all_meta_table where data_version =!440 group by 1,2; (2.x/3.x) select * from __all_virtual_meta_table where data_version != 254; select * from __all_virtual_core_meta_table where data_version != 254; select * from __all_root_table where data_version != 254; select * from __all_virtual_core_root_table where data_version != 254; 统计当前集群版本情况: select data_version,count(1) from __all_virtual_meta_table group by 1; 统计当前未合并成功版本的分布情况: select zone,svr_ip,count(1) from __all_virtual_meta_table where data_version = xxxx group by 1,2; 这里注意区分不同OB版本 是否开启手动合并参数,开启后只能通过手工合并命令进行合并; show parameters like '%enable_manual_merge%'; 值为false为正常 是否开启升级参数,开启后所有合并均被block; show parameters like '%enable_upgrade%'; 值为false为正常 合并进度查看 查看合并状态 : 看每个副本的 合并状态和整体的合并状态: MERGING select * from __all_zone where name like '%merge_status%'; SELECT * FROM __all_zone WHERE name LIKE '%merge%'; select * from __all_zone where name = "is_merge_error" or info = "ERROR"; 查看合并版本号 select * from __all_zone where name = "frozen_version" or name = "last_merged_version"; 查看合并历史记录 select * from __all_rootservice_event_history where module='daily_merge' order by 1 desc limit 20; select * from __all_rootservice_event_history where module = "daily_merge" and event like "%merge_error%" order by gmt_create desc limit 1; 查询合并同步到什么版本: select zone,svr_ip,data_version,count(*) from __all_virtual_meta_table group by 1,2,3; select zone,svr_ip,data_version,table_id,count(*) from __all_virtual_meta_table group by 1,2,3,4; 查看版本未达最新版本号的表 select * from __all_virtual_meta_table where data_version!=499; select svr_ip,count(*) from __all_virtual_meta_table where data_version !=148 group by svr_ip; select svr_ip,count(*) from __all_virtual_meta_table where data_version<>457 group by svr_ip ; 查看系统任务 select * from __all_virtual_sys_task_status; 查看合并调度任务进度 select * from __all_virtual_table_mgr where table_id = '1100611139482207' and partition_id = 0 and svr_ip='10.8.148.48' order by snapshot_version desc; 其中 version 为0表示尚在合并中,可以和 __all_virtual_sys_task_status 表对应,index_id对应status表中comment列table_id 查看有无推进合并点 select * from __all_virtual_table_mgr where table_id = xxx and partition_id = xxx and svr_ip= "xxx" and svr_port = xxx and is_active == 0 and snapshot_version > (select frozen_timestamp from __all_virtual_freeze_info where frozen_version = xxx) 其中frozen_version为当前版本号 如果结果为空,表示SSTable还没有推过Major点,怀疑合并前的强制冻结失败了 如果结果有一个memtable,表示合并需要的数据还有一部分在冻结memtable中,需要转储,查看是否有转储Dag 或 转储失败报错 如果结果有sstable,则应该发起合并,查看是否有合并Dag 或 合并失败报错 如果没有任何有用信息,可以搜索最近一段时间的observer.log的ERROR日志来获取线索 正在合并 #当前计算值是所有节点总和,评估单observer节点膨胀大小时,需要除以对应F、R类型副本数量 select table_id,partition_id,index_id,sum(size)/1024/1024/1024 data_size from (( select svr_ip,table_id,partition_id,index_id,size from __all_virtual_table_mgr where table_type not in (0,1,8) ) union all ( select m1.svr_ip,m1.table_id,m1.partition_id,m1.index_id,size from __all_virtual_table_mgr m1 join ( select svr_ip,table_id,partition_id,index_id,max(multi_version_start) max_version from __all_virtual_table_mgr where table_type=1 group by svr_ip,table_id,partition_id,index_id ) m2 on m1.table_type=1 and m1.svr_ip=m2.svr_ip and m1.table_id=m2.table_id and m1.partition_id=m2.partition_id and m1.index_id=m2.index_id and m1.multi_version_start=m2.max_version )) group by table_id,partition_id,index_id order by data_size desc limit 10; 转储数据过多 #查看集群增量数据大小 select svr_ip,sum(size)/1024/1024/1024 data_size from __all_virtual_table_mgr where table_type!=1 group by svr_ip; 基线数据版本过多 select m1.svr_ip,sum(m1.size)/1024/1024/1024 extra_size from __all_virtual_table_mgr m1 join ( select svr_ip,table_id,partition_id,index_id,max(multi_version_start) max_version,count(*) cnt from __all_virtual_table_mgr where table_type=1 group by svr_ip,table_id,partition_id,index_id having cnt>1 ) m2 on m1.table_type=1 and m1.svr_ip=m2.svr_ip and m1.table_id=m2.table_id and m1.partition_id=m2.partition_id and m1.index_id=m2.index_id and m1.multi_version_start!=m2.max_version group by m1.svr_ip; 通过部分partition异常原因定位获取到的table_id、partition_id、partition_cnt,确认partiton状态 select gmt_modified,svr_ip,svr_port,partition_id,partition_cnt,table_id,role,data_version,row_checksum,data_checksum,column_checksum from __all_(virtual_)meta_table where table_id = xxx and partition_id = xxx; 如果三个副本版本均异常,则优先考虑三节点轮转重启; 如果卡合并所在节点上role=1,即所在节点为leader,需要先对partition进行切主; 切主前检查集群切主参数是否为true: show parameters like 'enable_auto_leader_switch'; show parameters like 'enable_auto_leader_switch'; 进行partition切主(推荐通过设置租户和表的primary_zone来切主): alter system switch replica leader PARTITION_ID='partition_id%partition_cnt@table_id' server='ipadderss:port'; -这里的ipadress是要切到的目标server ip 检查切主是否成功: select gmt_modified,svr_ip,svr_port,partition_id,partition_cnt,table_id,role,data_version,row_checksum,data_checksum,column_checksum from __all_(virtual_)meta_table where table_id = xxx and partition_id = xxx; 预期role=1的记录应该是要切到的目标server ip 切主后检查集群补副本参数是否为true: show parameters like 'enable_rereplication'; 强制删除异常 partition: alter system drop replica PARTITION_ID='partition_id%partition_cnt@table_id' server='ipadderss:port'; -这里的ipaddress是异常server ip 等待集群自动补副本,可通过查询 __all_rootservice_event_history 查看进度 补副本完成后,检查副本状态: select gmt_modified,svr_ip,svr_port,partition_id,partition_cnt,table_id,role,row_checksum,data_checksum,column_checksum from __all_(virtual_)meta_table where table_id = xxx and partition_id = xxx; 新副本gmt_create为刚刚补副本时间 恢复leader设置,正常rs会自动切到切主前的leader,如果设置了租户或表的primary_zone,需要根据情况恢复设置。 此类异常一般通过alter操作,将异常参数配置正常即可立即恢复,一般操作格式为: alter system set `parameter_name`= 'True'; 如果日常合并未预期执行,可以检查下列sys租户系统参数是不是默认值: enable_manual_merge 默认为false; enable_major_freeze 默认为true; major_freeze_duty_time 默认为“02:00”; enable_upgrade_mode 默认为false; =================================================== 【checksum 问题】 1、查询哪个表 data_checksum 不一致的问题: select a.data_checksum,a.table_id,a.partition_id, a.svr_ip,a.role,a.row_count,a.data_version from oceanbase.__all_virtual_meta_table a where a.table_id=xxxxxx 2、 通过部分partition异常原因定位获取到的table_id、partition_id、partition_cnt,确认partiton状态 select gmt_modified,svr_ip,svr_port,partition_id,partition_cnt,table_id,role,data_version,row_checksum,data_checksum,column_checksum from __all_(virtual_)meta_table where table_id = xxx and partition_id = xxx; 补副本完成后,检查副本状态: 新副本gmt_create为刚刚补副本时间 恢复leader设置,正常rs会自动切到切主前的leader,如果设置了租户或表的primary_zone,需要根据情况恢复设置。 3、 查看 checksum 不一致的宏块 data checksum: select A.macro_idx_in_sstable, A.row_count, A.svr_ip, A.svr_port, A.macro_idx_in_data_file, A.data_checksum from __all_virtual_partition_sstable_macro_info as A join (select macro_idx_in_sstable, avg(row_count) as avg_row_count, avg(data_checksum) as avg_checksum from __all_virtual_partition_sstable_macro_info where table_id = xxxxx and partition_id = xxx and data_version = 351 group by macro_idx_in_sstable) as B on A.macro_idx_in_sstable = B.macro_idx_in_sstable and (A.data_checksum != B.avg_checksum or A.row_count != B.avg_row_count) where A.table_id = xxxxxx and A.partition_id = xxx and A.data_version = 351; 4、 column checksum: select * from __all_virtual_storage_stat where table_id = 1100611139498320 and partition_id=0 and major_version = 351\G select * from __all_virtual_storage_stat where table_id = 1101710651162818 and partition_id=1 and major_version = 845\G 5、 SELECT * from __all_virtual_partition_sstable_image_info order by 1,2,3; SELECT * from __all_sstable_checksum ; SELECT * from __all_sstable_column_checksum ; SELECT * from __all_tenant_sstable_column_checksum ; 6、 查看 checksum 不一致的宏块 select A.macro_idx_in_sstable, A.row_count, A.svr_ip, A.svr_port, A.macro_idx_in_data_file, A.data_checksum from __all_virtual_partition_sstable_macro_info as A join (select macro_idx_in_sstable, avg(row_count) as avg_row_count, avg(data_checksum) as avg_checksum from __all_virtual_partition_sstable_macro_info where table_id = 1101710651162818 and partition_id = 8 and data_version = 804 group by macro_idx_in_sstable) as B on A.macro_idx_in_sstable = B.macro_idx_in_sstable and (A.data_checksum != B.avg_checksum or A.row_count != B.avg_row_count) where A.table_id = 1101710651162818 and A.partition_id = 8 and A.data_version = 804; 【当有事务卡着,导致合并超时】: 执行如下操作,看下是否有活跃事务: SELECT count(1) FROM __all_virtual_trans_stat WHERE part_trans_action<=2 AND ctx_create_time < date_sub(now(), INTERVAL 600 SECOND) AND is_exiting != 1; 执行如下操作,看下是否有悬挂事务: SELECT count(1) FROM __all_virtual_trans_stat WHERE part_trans_action > 2 AND ctx_create_time < date_sub(now(), INTERVAL 600 SECOND) AND is_exiting != 1; 如有,可参考该文档处理:https://yuque.antfin.com/ob/transaction/gsuo05?singleDoc# 《悬挂事务&长事务应急文档》 另外,也许关注下ocp是否有异常告警 SELECT table_id,table_name FROM __all_virtual_table WHERE data_table_id = '1139094046426857'; 查询具体的表名: select table_name from __all_virtual_table where table_id in ( select /*+READ_CONSISTENCY(WEAK),query_timeout(100000000) */ table_id from __all_virtual_meta_table where data_version != (select value from __all_zone where name = 'global_broadcast_version')); 一个是表TUFUNDBUYDTL,另一个是索引,就是表TUFUNDBUYDTL的索引。 索引状态为1,是初始化状态,有问题。 select index_status from __all_virtual_table where table_id in ( select /*+READ_CONSISTENCY(WEAK),query_timeout(100000000) */ table_id from __all_virtual_meta_table where data_version != (select value from __all_zone where name = 'global_broadcast_version')); 让客户删除掉这个索引后,合并完成。 ---OBCP 考题 ACD 29、在运维过程中,发现集群的合并时间过长,通过哪些手段 提升合并速度? A、轮转合并下调大zone_merge_concurrency【正确答案】concurrency:并发性 B、调小 zone_merge_timeout C、调大merge_thread_count 【正确答案】 D、关闭轮转合并 【正确答案】 ----bayaim 工单处理: 加速合并建议: 1、适当轮转合并下调大 zone_merge_concurrency -- zone_merge_concurrency 用于设置在合并时,支持多少个 Zone 并发。当值为 0 时,由系统根据部署情况自动选择最佳并发度。 alter system set zone_merge_concurrency=3; --设置每日合并的并发度; 2、调大 merge_thread_count、 -- 设置合并线程数,控制可以同时执行合并的分区个数; alter system set merge_thread_count=32; 3、关闭轮转合并 alter system set enable_manual_merge = false; -- 关闭手动合并 参考网址: https://www.oceanbase.com/docs/enterprise-oceanbase-database-cn-10000000000946266 查看集群合并进度: SELECT zone, svr_ip, major_version, macro_block_count, use_old_macro_block_count, merge_start_time, merge_finish_time, merge_process, ( merge_finish_time - merge_start_time ) AS cost_time, ( macro_block_count - use_old_macro_block_count ) AS merge_macro_block_count, ( macro_block_count - use_old_macro_block_count ) / ( merge_finish_time - merge_start_time ) AS avg_per_sec FROM __all_virtual_partition_sstable_image_info ORDER BY zone,svr_ip,major_version; 查看集群合并耗时: data_version 为-1表示当前没有进行任何合并操作,即没有生成新的 data_version 值。 select t.data_version,t.zone,t.merge_begin_time,u.merge_end_time,timestampdiff(minute, t.merge_begin_time, u.merge_end_time) merge_time_minute from (select value1 zone,value2 data_version,gmt_create merge_begin_time From __all_rootservice_event_history where module='daily_merge' and event<>'admin_set_config' and event='start_merge' ) t, (select value2 zone,value1 data_version,gmt_create merge_end_time From __all_rootservice_event_history where module='daily_merge' and event<>'admin_set_config' and event='merge_succeed') u where t.zone=u.zone and t.data_version=u.data_version order by 3 desc limit 10; 查看集群当日转储次数: select max(gmt_create), svr_ip, value1 as tenant_id, count(1) from __all_server_event_history where event = 'do minor freeze success' and gmt_create between curdate() and now() group by svr_ip,tenant_id order by gmt_create desc; --合并和转储内存 select /*+ READ_CONSISTENCY(WEAK),query_timeout(100000000) */ TENANT_ID,IP, round(ACTIVE/1024/1024/1024,2)ACTIVE_GB, round(TOTAL/1024/1024/1024,2) TOTAL_GB, round(FREEZE_TRIGGER/1024/1024/1024,2) FREEZE_TRIGGER_GB, round(TOTAL/FREEZE_TRIGGER*100,2) percent_trigger, round(MEM_LIMIT/1024/1024/1024,2) MEM_LIMIT_GB from gv$memstore where tenant_id >1000 or TENANT_ID=1 order by tenant_id,TOTAL_GB desc; ACTIVE:当前活跃的 MEMTable 的内存占用大小,单位为字节。 TOTAL:当前所有 MEMTable 的内存占用大小,单位为字节。 FREEZE_TRIGGER:触发 MEMTable 冻结的内存大小,单位为字节。 MEM_LIMIT:MEMTable 的内存大小限制,单位:字节。 TOTAL_GB是否已经达到MEM_LIMIT_GB,即已经将MemStore全部写满
-----因版权限制,暂且罗列展示如此
-----2024年8月26日18:37:39
【欢迎关注公众号】:database运维