galera cluster DDL节点间状态不一致的问题
没事用mysql标准半同步、HA环境不要瞎折腾使用存储过程、触发器或galera cluster。
近期某个系统中的galera cluseter环境发生A DDL操作后,B节点未同步的情况,同时B节点的errorlog中有如下警告信息:
2016-07-23 17:31:32 18920 [Warning] WSREP: RBR event 1 Query apply warning: 1, 7866890
2016-07-23 17:31:32 18920 [Warning] WSREP: Ignoring error for TO isolated action: source: cc757ba3-4e3c-11e6-8893-8b18f5f1ec79 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 814 trx_id: -1 seqnos (l: 207, g: 7866890, s: 7866889, d: 7866889, ts: 5701897716366444)
2016-07-25 15:58:17 18920 [ERROR] Slave SQL: Error 'Unknown table 'hs_member.mem'' on query. Default database: 'hs_member'. Query: 'DROP TABLE `mem`', Error_code: 1051
2016-07-25 15:58:17 18920 [Warning] WSREP: RBR event 1 Query apply warning: 1, 7867308
2016-07-25 15:58:17 18920 [Warning] WSREP: Ignoring error for TO isolated action: source: cc757ba3-4e3c-11e6-8893-8b18f5f1ec79 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 1527 trx_id: -1 seqnos (l: 628, g: 7867308, s: 7867307, d: 7867307, ts: 5869102727335473)
根据官方所述:
SCHEMA UPGRADES
Any DDL statement that runs for the database, such as CREATE TABLE or GRANT, upgrades the schema. These DDL statements change the database itself and are non-transactional.
Galera Cluster processes schema upgrades in two different methods:
- Total Order Isolation (TOI) Where the schema upgrades run on all cluster nodes in the same total order sequence, preventing other transations from committing for the duration of the operation.
- Rolling Schema Upgrade (RSU) Where the schema upgrades run locally, affecting only the node on which they are run. The changes do not replicate to the rest of the cluster.
You can set the method for online schema upgrades by using the wsrep_OSU_method parameter in the configuration file, (my.ini or my.cnf, depending on your build) or through the MySQL client. Galera Cluster defaults to the Total Order Isolation method.
根据http://galeracluster.com/documentation-webpages/schemaupgrades.html的说明,在TOI模式下,galera cluster会等待当前未提交的事务提交,然后复制DDL到所有的节点确保一致性。
而事实上,根据实际的情况来看,Galera先复制DDL到了其他节点,并被执行,而此时本地尚未执行。如果本地节点因为某种原因比如长时间具有未提交的dml,则可能因为被取消或者超时而导致本地节点最后DDL执行失败。这就会出现galera cluster节点间schema不一致的情况。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!