Python用来监听解析Mysql Binlog的模块---python-mysql-replication
Pure Python Implementation of MySQL replication protocol build on top of PyMYSQL. This allow you to receive event like insert, update, delete with their datas and raw SQL queries.
python-mysql-replication 是基于python实现的 MySQL复制协议工具,我们可以用它来解析binlog 获取日志的insert,update,delete等事件 ,并基于此做其他业务需求。比如数据更改时失效缓存,监听dml事件通知下游业务方做对应处理。
一.使用场景
1.MySQL to NoSQL database replication
2.MySQL to search engine replication
3.Invalidate cache when something change in database
4.Audit
5.Real time analytics
二.主要模块
1.BinLogStreamReader
python-mysql-replication
的入口的类。这个类支持用户传入mysql配置,slave需要同步的信息等,同时实现了__iter__,注册slave,读packet。
使用该工具时需要实例化一个BinLogStreamReader()对象 stream。BinLogStreamReader 通过 ReportSlave 向主库注册作为一个slave角色,用于接受MySQL的binlog广播 。
文件名字为 binlogstream.py
2.BinLogPacketWrapper
BinLogPacketWrapper类,mysql网络包序列化和反序列化。
存放的文件为packet.py
3.事件类
各个event(select、update、insert、delete、rollback、heartbeat等)对应的实现类,全都继承子BinlogEvent,在BinLogPacketWrapper类中把获取到的event映射到对应的evnet处理类
存放的文件为event.py
4.依赖的基础文件
pymysql中的connnections.py:Connection类,实现连接、读写mysql包(具体包格式由protocol实现)。
pymysql中的protocol.py:MysqlPackge类,具体包的格式和读写。
三.已有功能实现(案例)
Projects | Remark | Github |
binlog2sql | a popular binlog parser that could convert raw binlog to sql and also could generate flashback sql from raw binlog | https://github.com/danfengcao/binlog2sql |
mymongo | MySQL to mongo replication | https://github.com/njordr/mymongo |
MySQLStreamer | MySQLStreamer is a database change data capture and publish system | https://github.com/Yelp/mysql_streamer |
BitSwanPump | A real-time stream processor | https://github.com/LibertyAces/BitSwanPump |
Aventri MySQL Monitor | MySQL Monitor is an application which continuously watches a MySQL database for data updates and publishes information about those changes to a message queue on RabbitMQ. | https://github.com/aventri/mysql-monitor |
elasticsearch-river-mysql | MySQL River Plugin for ElasticSearch | https://github.com/scharron/elasticsearch-river-mysql |
pg_chameleon | Migration and replica from MySQL to PostgreSQL | https://github.com/the4thdoctor/pg_chameleon |
python-mysql-eventprocessor | Daemon interface for handling MySQL binary log events. | https://github.com/jffifa/python-mysql-eventprocessor |
Python MySQL Replication Blinker | This package read events from MySQL binlog and send to blinker's signal. | https://github.com/tarzanjw/python-mysql-replication-blinker |
四.注意事项
1.权限:
可以直接使用复制账号也可以使用其他账号,但是该账号必须 SELECT, REPLICATION SLAVE, REPLICATION CLIENT 权限
2.数据库日志相关的参数
log_bin=on ,binlog_format=row,binlog_row_image=FULL
3.用作表级别同步
除了解析binlog,还可以通过python-mysql-replication
做数据全量加增量迁移。
例如仅仅迁移某些大表而不是整个库的时候,先dump,再增量实时同步
五. 核心类(核心代码)
5.1 BinLogStreamReader
主要功能是:Connect to replication stream and read event。
可以通过参数指定(或排除)部分库和部分表,减少压力。
""" Attributes: ctl_connection_settings: Connection settings for cluster holding schema information resume_stream: Start for event from position or the latest event of binlog or from older available event blocking: When master has finished reading/sending binlog it will send EOF instead of blocking connection. only_events: Array of allowed events ignored_events: Array of ignored events log_file: Set replication start log file log_pos: Set replication start log pos (resume_stream should be true) end_log_pos: Set replication end log pos auto_position: Use master_auto_position gtid to set position only_tables: An array with the tables you want to watch (only works in binlog_format ROW) ##数组的形式['ACCC','BCC','CCC'] ignored_tables: An array with the tables you want to skip ##数组的形式['AAA','BBB','CCC'] only_schemas: An array with the schemas you want to watch ##数组的形式['???','??','??'] ignored_schemas: An array with the schemas you want to skip ##数组的形式['???','???','???'] freeze_schema: If true do not support ALTER TABLE. It's faster. skip_to_timestamp: Ignore all events until reaching specified timestamp. report_slave: Report slave in SHOW SLAVE HOSTS. slave_uuid: Report slave_uuid or replica_uuid in SHOW SLAVE HOSTS(MySQL 8.0.21-) or SHOW REPLICAS(MySQL 8.0.22+) depends on your MySQL version. slave_heartbeat: (seconds) Should master actively send heartbeat on connection. This also reduces traffic in GTID replication on replication resumption (in case many event to skip in binlog). See MASTER_HEARTBEAT_PERIOD in mysql documentation for semantics is_mariadb: Flag to indicate it's a MariaDB server, used with auto_position to point to Mariadb specific GTID. annotate_rows_event: Parameter value to enable annotate rows event in mariadb, used with 'is_mariadb' ignore_decode_errors: If true, any decode errors encountered when reading column data will be ignored. verify_checksum: If true, verify events read from the binary log by examining checksums. enable_logging: When set to True, logs various details helpful for debugging and monitoring When set to False, logging is disabled to enhance performance. """
binlog的event类型也可以显式定义,没有输入参数的化,就是默认的所有类型
if only_events is not None: events = set(only_events) else: events = set( ( QueryEvent, RotateEvent, StopEvent, FormatDescriptionEvent, XAPrepareEvent, XidEvent, GtidEvent, BeginLoadQueryEvent, ExecuteLoadQueryEvent, UpdateRowsEvent, WriteRowsEvent, DeleteRowsEvent, TableMapEvent, HeartbeatLogEvent, NotImplementedEvent, MariadbGtidEvent, RowsQueryLogEvent, MariadbAnnotateRowsEvent, RandEvent, MariadbStartEncryptionEvent, MariadbGtidListEvent, MariadbBinLogCheckPointEvent, UserVarEvent, PreviousGtidsEvent, PartialUpdateRowsEvent, ) )
六.学习网址
1. https://github.com/julien-duponchelle/python-mysql-replication
2.使用 mysql-replication python监听mysql binlog 实时同步数据
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· 实操Deepseek接入个人知识库
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库