Python用来监听解析Mysql Binlog的模块---python-mysql-replication

Pure Python Implementation of MySQL replication protocol build on top of PyMYSQL. This allow you to receive event like insert, update, delete with their datas and raw SQL queries.

python-mysql-replication 是基于python实现的 MySQL复制协议工具,我们可以用它来解析binlog 获取日志的insert,update,delete等事件 ,并基于此做其他业务需求。比如数据更改时失效缓存,监听dml事件通知下游业务方做对应处理。

一.使用场景

1.MySQL to NoSQL database replication

2.MySQL to search engine replication

3.Invalidate cache when something change in database

4.Audit

5.Real time analytics

二.主要模块

1.BinLogStreamReader

python-mysql-replication 的入口的类。这个类支持用户传入mysql配置,slave需要同步的信息等,同时实现了__iter__,注册slave,读packet。

使用该工具时需要实例化一个BinLogStreamReader()对象 stream。BinLogStreamReader 通过 ReportSlave 向主库注册作为一个slave角色,用于接受MySQL的binlog广播 。

文件名字为 binlogstream.py

2.BinLogPacketWrapper

BinLogPacketWrapper类,mysql网络包序列化和反序列化。

存放的文件为packet.py

3.事件类

各个event(select、update、insert、delete、rollback、heartbeat等)对应的实现类,全都继承子BinlogEvent,在BinLogPacketWrapper类中把获取到的event映射到对应的evnet处理类

存放的文件为event.py

4.依赖的基础文件

pymysql中的connnections.py:Connection类,实现连接、读写mysql包(具体包格式由protocol实现)。

pymysql中的protocol.py:MysqlPackge类,具体包的格式和读写。

三.已有功能实现(案例)

 Projects Remark  Github 
binlog2sql  a popular binlog parser that could convert raw binlog to sql and also could generate flashback sql from raw binlog https://github.com/danfengcao/binlog2sql
mymongo MySQL to mongo replication https://github.com/njordr/mymongo
MySQLStreamer MySQLStreamer is a database change data capture and publish system  https://github.com/Yelp/mysql_streamer
BitSwanPump  A real-time stream processor https://github.com/LibertyAces/BitSwanPump
Aventri MySQL Monitor  MySQL Monitor is an application which continuously watches a MySQL database for data updates and publishes information about those changes to a message queue on RabbitMQ. https://github.com/aventri/mysql-monitor
elasticsearch-river-mysql MySQL River Plugin for ElasticSearch https://github.com/scharron/elasticsearch-river-mysql
pg_chameleon Migration and replica from MySQL to PostgreSQL  https://github.com/the4thdoctor/pg_chameleon
python-mysql-eventprocessor Daemon interface for handling MySQL binary log events.  https://github.com/jffifa/python-mysql-eventprocessor
Python MySQL Replication Blinker This package read events from MySQL binlog and send to blinker's signal.  https://github.com/tarzanjw/python-mysql-replication-blinker 

四.注意事项

1.权限:

可以直接使用复制账号也可以使用其他账号,但是该账号必须 SELECT, REPLICATION SLAVE, REPLICATION CLIENT 权限

2.数据库日志相关的参数

log_bin=on ,binlog_format=row,binlog_row_image=FULL

3.用作表级别同步

除了解析binlog,还可以通过python-mysql-replication做数据全量加增量迁移。

例如仅仅迁移某些大表而不是整个库的时候,先dump,再增量实时同步

五. 核心类(核心代码)

5.1 BinLogStreamReader

主要功能是:Connect to replication stream and read event。

  可以通过参数指定(或排除)部分库和部分表,减少压力。

复制代码
        """
        Attributes:
            ctl_connection_settings: Connection settings for cluster holding
                                     schema information
            resume_stream: Start for event from position or the latest event of
                           binlog or from older available event
            blocking: When master has finished reading/sending binlog it will
                      send EOF instead of blocking connection.
            only_events: Array of allowed events
            ignored_events: Array of ignored events
            log_file: Set replication start log file
            log_pos: Set replication start log pos (resume_stream should be
                     true)
            end_log_pos: Set replication end log pos
            auto_position: Use master_auto_position gtid to set position
            only_tables: An array with the tables you want to watch (only works
                         in binlog_format ROW)                            ##数组的形式['ACCC','BCC','CCC']
            ignored_tables: An array with the tables you want to skip     ##数组的形式['AAA','BBB','CCC']
            only_schemas: An array with the schemas you want to watch     ##数组的形式['???','??','??']
            ignored_schemas: An array with the schemas you want to skip   ##数组的形式['???','???','???']
            freeze_schema: If true do not support ALTER TABLE. It's faster.
            skip_to_timestamp: Ignore all events until reaching specified
                               timestamp.
            report_slave: Report slave in SHOW SLAVE HOSTS.
            slave_uuid: Report slave_uuid or replica_uuid in SHOW SLAVE HOSTS(MySQL 8.0.21-) or
                        SHOW REPLICAS(MySQL 8.0.22+) depends on your MySQL version.
            slave_heartbeat: (seconds) Should master actively send heartbeat on
                             connection. This also reduces traffic in GTID
                             replication on replication resumption (in case
                             many event to skip in binlog). See
                             MASTER_HEARTBEAT_PERIOD in mysql documentation
                             for semantics
            is_mariadb: Flag to indicate it's a MariaDB server, used with auto_position
                    to point to Mariadb specific GTID.
            annotate_rows_event: Parameter value to enable annotate rows event in mariadb,
                    used with 'is_mariadb'
            ignore_decode_errors: If true, any decode errors encountered
                                  when reading column data will be ignored.
            verify_checksum: If true, verify events read from the binary log by examining checksums.
            enable_logging: When set to True, logs various details helpful for debugging and monitoring
                            When set to False, logging is disabled to enhance performance.
        """
复制代码

binlog的event类型也可以显式定义,没有输入参数的化,就是默认的所有类型

复制代码
        if only_events is not None:
            events = set(only_events)
        else:
            events = set(
                (
                    QueryEvent,
                    RotateEvent,
                    StopEvent,
                    FormatDescriptionEvent,
                    XAPrepareEvent,
                    XidEvent,
                    GtidEvent,
                    BeginLoadQueryEvent,
                    ExecuteLoadQueryEvent,
                    UpdateRowsEvent,
                    WriteRowsEvent,
                    DeleteRowsEvent,
                    TableMapEvent,
                    HeartbeatLogEvent,
                    NotImplementedEvent,
                    MariadbGtidEvent,
                    RowsQueryLogEvent,
                    MariadbAnnotateRowsEvent,
                    RandEvent,
                    MariadbStartEncryptionEvent,
                    MariadbGtidListEvent,
                    MariadbBinLogCheckPointEvent,
                    UserVarEvent,
                    PreviousGtidsEvent,
                    PartialUpdateRowsEvent,
                )
            )
复制代码

六.学习网址

1. https://github.com/julien-duponchelle/python-mysql-replication

2.使用 mysql-replication python监听mysql binlog 实时同步数据

https://blog.csdn.net/mtl1994/article/details/115555254

posted @   东山絮柳仔  阅读(3022)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· 实操Deepseek接入个人知识库
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库
点击右上角即可分享
微信分享提示