MySQL 8.0 Reference Manual(读书笔记92节--Replication(3))
1. Replication for purposes --Using Replication for Backups
Replication can be used in many different environments for a range of purposes.
To use replication as a backup solution, replicate data from the source to a replica, and then back up the replica. The replica can be paused and shut down without affecting the running operation of the source, so you can produce an effective snapshot of “live” data that would otherwise require the source to be shut down.
How you back up a database depends on its size and whether you are backing up only the data, or the data and the replica state so that you can rebuild the replica in the event of failure. There are therefore two choices:
• If you are using replication as a solution to enable you to back up the data on the source, and the size of your database is not too large, the mysqldump tool may be suitable.
• For larger databases, where mysqldump would be impractical【ɪmˈpræktɪkl 不切实际的;不现实的;不明智的;不善于规划(或组织)的;手不灵巧的;】 or inefficient【ˌɪnɪˈfɪʃnt 效率低的;浪费的;能力差的;】, you can back up the raw data files instead. Using the raw data files option also means that you can back up the binary and relay logs that make it possible to re-create the replica in the event of a replica failure.
Another backup strategy, which can be used for either source or replica servers, is to put the server in a read-only state. The backup is performed against the read-only server, which then is changed back to its usual read/write operational status.
2.Handling an Unexpected Halt of a Replica
In order for replication to be resilient【rɪˈzɪliənt 有弹性(或弹力)的;有适应力的;能复原的;可迅速恢复的;】 to unexpected halts【hɔːlts (使)停止,停下;】 of the server (sometimes described as crash-safe) it must be possible for the replica to recover its state before halting. This section describes the impact of an unexpected halt of a replica during replication, and how to configure a replica for the best chance of recovery to continue replication.
After an unexpected halt of a replica, upon restart the replication SQL thread must recover information about which transactions have been executed already. The information required for recovery is stored in the replica's applier metadata repository. From MySQL 8.0, this repository is created by default as an InnoDB table named mysql.slave_relay_log_info. By using this transactional storage engine the information is always recoverable upon restart. Updates to the applier metadata repository are committed together with the transactions, meaning that the replica's progress information recorded in that repository is always consistent with what has been applied to the database, even in the event of an unexpected server halt.
DML transactions and also atomic DDL update the replication positions in the replica's applier metadata repository in the mysql.slave_relay_log_info table together with applying the changes to the database, as an atomic operation. In all other cases, including DDL statements that are not fully atomic, and exempted storage engines that do not support atomic DDL, the mysql.slave_relay_log_info table might be missing updates associated with replicated data if the server halts unexpectedly. Restoring updates in this case is a manual process.
The recovery process by which a replica recovers from an unexpected halt varies depending on the configuration of the replica. The details of the recovery process are influenced by the chosen method of replication, whether the replica is single-threaded or multithreaded, and the setting of relevant system variables. The overall【ˌoʊvərˈɔːl , ˈoʊvərɔːl 总体的;全面的;综合的;】 aim of the recovery process is to identify what transactions had already been applied on the replica's database before the unexpected halt occurred, and retrieve and apply the transactions that the replica missed following the unexpected halt.
• For GTID-based replication, the recovery process needs the GTIDs of the transactions that were already received or committed by the replica. The missing transactions can be retrieved from the source using GTID auto-positioning, which automatically compares the source's transactions to the replica's transactions and identifies the missing transactions.
• For file position based replication, the recovery process needs an accurate replication SQL thread (applier) position showing the last transaction that was applied on the replica. Based on that position, the replication I/O thread (receiver) retrieves from the source's binary log all of the transactions that should be applied on the replica from that point on.
Using GTID-based replication makes it easiest to configure replication to be resilient to unexpected halts. GTID auto-positioning means the replica can reliably identify and retrieve missing transactions, even if there are gaps in the sequence of applied transactions.
The following information provides combinations of settings that are appropriate for different types of replica to guarantee recovery as far as this is under the control of replication.
The following combination of settings on a replica is the most resilient to unexpected halts:
• When GTID-based replication is in use (gtid_mode=ON), set SOURCE_AUTO_POSITION=1 | MASTER_AUTO_POSITION=1, which activates GTID auto-positioning for the connection to the source to automatically identify and retrieve missing transactions. This option is set using a CHANGE REPLICATION SOURCE TO statement (from MySQL 8.0.23) or CHANGE MASTER TO statement (before MySQL 8.0.23). If the replica has multiple replication channels, you need to set this option for each channel individually. When file position based replication is in use, SOURCE_AUTO_POSITION=1 | MASTER_AUTO_POSITION=1 is not used, and instead the binary log position or relay log position is used to control where replication starts.
• From MySQL 8.0.27, when GTID-based replication is in use (gtid_mode=ON), set GTID_ONLY=1, which makes the replica use only GTIDs in the recovery process, and stop persisting binary log and relay log file names and file positions in the replication metadata repositories. This option is set using a CHANGE REPLICATION SOURCE TO statement. If the replica has multiple replication channels, you need to set this option for each channel individually. With GTID_ONLY=1, during recovery, the file position information is ignored and GTID auto-skip is used to skip transactions that have already been supplied, rather than identifying the correct file position. This strategy is more efficient provided that you purge relay logs using the default setting for relay_log_purge, which means only one relay log file needs to be inspected.
• Set sync_relay_log=1, which instructs the replication receiver thread to synchronize the relay log to disk after each received transaction is written to it. This means the replica's record of the current position read from the source's binary log (in the applier metadata repository) is never ahead of the record of transactions saved in the relay log. Note that although this setting is the safest, it is also the slowest due to the number of disk writes involved. With sync_relay_log > 1, or sync_relay_log=0 (where synchronization is handled by the operating system), in the event of an unexpected halt of a replica there might be committed transactions that have not been synchronized to disk. Such transactions can cause the recovery process to fail if the recovering replica, based on the information it has in the relay log as last synchronized to disk, tries to retrieve and apply the transactions again instead of skipping them. Setting sync_relay_log=1 is particularly important for a multi-threaded replica, where the recovery process fails if gaps in the sequence of transactions cannot be filled using the information in the relay log. For a single-threaded replica, the recovery process only needs to use the relay log if the relevant information is not available in the applier metadata repository.
• Set innodb_flush_log_at_trx_commit=1, which synchronizes the InnoDB logs to disk before each transaction is committed. This setting, which is the default, ensures that InnoDB tables and the InnoDB logs are saved on disk so that there is no longer a requirement for the information in the relay log regarding the transaction. Combined with the setting sync_relay_log=1, this setting further ensures that the content of the InnoDB tables and the InnoDB logs is consistent with the content of the relay log at all times, so that purging the relay log files cannot cause unfillable gaps in the replica's history of transactions in the event of an unexpected halt.
• Set relay_log_info_repository = TABLE, which stores the replication SQL thread position in the InnoDB table mysql.slave_relay_log_info, and updates it together with the transaction commit to ensure a record that is always accurate. This setting is the default from MySQL 8.0, and the FILE setting is deprecated. From MySQL 8.0.23, the use of the system variable itself is deprecated, so omit it and allow it to default. If the FILE setting is used, which was the default in earlier releases, the information is stored in a file in the data directory that is updated after the transaction has been applied. This creates a risk of losing synchrony with the source depending at which stage of processing a transaction the replica halts at, or even corruption of the file itself. With the setting relay_log_info_repository = FILE, recovery is not guaranteed.
• Set relay_log_recovery = ON, which enables automatic relay log recovery immediately following server startup. This global variable defaults to OFF and is read-only at runtime, but you can set it to ON with the --relay-log-recovery option at replica startup following an unexpected halt of a replica. Note that this setting ignores the existing relay log files, in case they are corrupted or inconsistent. The relay log recovery process starts a new relay log file and fetches transactions from the source beginning at the replication SQL thread position recorded in the applier metadata repository. The previous relay log files are removed over time by the replica's normal purge mechanism.
For a multithreaded replica, setting relay_log_recovery = ON automatically handles any inconsistencies and gaps in the sequence of transactions that have been executed from the relay log. These gaps can occur when file position based replication is in use. The relay log recovery process deals with gaps using the same method as the START REPLICA UNTIL SQL_AFTER_MTS_GAPS (or before MySQL 8.0.22, START SLAVE instead of START REPLICA) statement would. When the replica reaches a consistent gap-free state, the relay log recovery process goes on to fetch further transactions from the source beginning at the replication SQL thread position. When GTID-based replication is in use, from MySQL 8.0.18 a multithreaded replica checks first whether MASTER_AUTO_POSITION is set to ON, and if it is, omits the step of calculating the transactions that should be skipped or not skipped, so that the old relay logs are not required for the recovery process.
3.Monitoring Row-based Replication
The current progress of the replication applier (SQL) thread when using row-based replication is monitored through Performance Schema instrument【ˈɪnstrəmənt 器械;(车辆、机器的)仪器,仪表;仪器;手段;器具;文据;促成某事的人(或事物);受利用(或控制)的人;】 stages, enabling you to track the processing of operations and check the amount of work completed and work estimated. When these Performance Schema instrument stages are enabled the events_stages_current table shows stages for applier threads and their progress.
To track progress of all three row-based replication event types (write, update, delete):
• Enable the three Performance Schema stages by issuing:
mysql> UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' -> WHERE NAME LIKE 'stage/sql/Applying batch of row changes%';
• Wait for some events to be processed by the replication applier thread and then check progress by looking into the events_stages_current table. For example to get progress for update events issue:
mysql> SELECT WORK_COMPLETED, WORK_ESTIMATED FROM performance_schema.events_stages_current -> WHERE EVENT_NAME LIKE 'stage/sql/Applying batch of row changes (update)'
• If binlog_rows_query_log_events is enabled, information about queries is stored in the binary log and is exposed in the processlist_info field. To see the original query that triggered this event:
mysql> SELECT db, processlist_state, processlist_info FROM performance_schema.threads -> WHERE processlist_state LIKE 'stage/sql/Applying batch of row changes%' AND thread_id = N;
4.Using Replication with Different Source and Replica Storage Engines
It does not matter for the replication process whether the original table on the source and the replicated table on the replica use different storage engine types. In fact, the default_storage_engine system variable is not replicated.
This provides a number of benefits in the replication process in that you can take advantage of different engine types for different replication scenarios. For example, in a typical scale-out scenario,you want to use InnoDB tables on the source to take advantage of the transactional functionality, but use MyISAM on the replicas where transaction support is not required because the data is only read. When using replication in a data-logging environment you may want to use the Archive storage engine on the replica.
Although the default_storage_engine variable is not replicated, be aware that CREATE TABLE and ALTER TABLE statements that include the engine specification are replicated to the replica correctly.
5.Using Replication for Scale-Out
You can use replication as a scale-out solution; that is, where you want to split up the load of database queries across multiple database servers, within some reasonable limitations.
Because replication works from the distribution of one source to one or more replicas, using replication for scale-out works best in an environment where you have a high number of reads and low number of writes/updates. Most websites fit into this category, where users are browsing the website, reading articles, posts, or viewing products. Updates only occur during session management, or when making a purchase or adding a comment/message to a forum.
Replication in this situation enables you to distribute the reads over the replicas, while still enabling your web servers to communicate with the source when a write is required.
6.Replicating Different Databases to Different Replicas
There may be situations where you have a single source server and want to replicate different databases to different replicas. For example, you may want to distribute different sales data to different departments to help spread the load during data analysis.
You can achieve this separation by configuring the source and replicas as normal, and then limiting the binary log statements that each replica processes by using the --replicate-wild-do-table configuration option on each replica.
补充;
You should not use --replicate-do-db for this purpose when using statementbased replication, since statement-based replication causes this option's effects to vary according to the database that is currently selected. This applies to mixed-format replication as well, since this enables some updates to be replicated using the statement-based format. However, it should be safe to use --replicate-do-db for this purpose if you are using row-based replication only, since in this case the currently selected database has no effect on the option's operation.
7 Improving Replication Performance
这个很有意思;简单但有不失新颖
As the number of replicas connecting to a source increases, the load, although minimal, also increases, as each replica uses a client connection to the source. Also, as each replica must receive a full copy of the source's binary log, the network load on the source may also increase and create a bottleneck.
If you are using a large number of replicas connected to one source, and that source is also busy processing requests (for example, as part of a scale-out solution), then you may want to improve the performance of the replication process.
One way to improve the performance of the replication process is to create a deeper replication structure that enables the source to replicate to only one replica, and for the remaining replicas to connect to this primary replica for their individual replication requirements.
For this to work, you must configure the MySQL instances as follows:
• Source 1 is the primary source where all changes and updates are written to the database. Binary logging is enabled on both source servers, which is the default.
• Source 2 is the replica to the server Source 1 that provides the replication functionality to the remainder of the replicas in the replication structure. Source 2 is the only machine permitted to connect to Source 1. Source 2 has the --log-slave-updates option enabled (which is the default). With this option, replication instructions from Source 1 are also written to Source 2's binary log so that they can then be replicated to the true replicas.
• Replica 1, Replica 2, and Replica 3 act as replicas to Source 2, and replicate the information from Source 2, which actually consists of the upgrades logged on Source 1.
The above solution reduces the client load and the network interface load on the primary source, which should improve the overall performance of the primary source when used as a direct database solution.
8.Switching Sources During Failover
You can tell a replica to change to a new source using the CHANGE REPLICATION SOURCE TO statement (prior to MySQL 8.0.23: CHANGE MASTER TO. The replica does not check whether the databases on the source are compatible【kəmˈpætəbl 兼容的,可共存的;<生>亲和的;协调一致的;(因志趣等相投而)关系好的,和睦相处的;合得来的,意气相投的;可共用的;<植>可异花受精的,可嫁接的;】 with those on the replica; it simply begins reading and executing events from the specified coordinates in the new source's binary log. In a failover situation, all the servers in the group are typically executing the same events from the same binary log file, so changing the source of the events should not affect the structure or integrity of the database, provided that you exercise care in making the change.
Replicas should be run with binary logging enabled (the --log-bin option), which is the default. If you are not using GTIDs for replication, then the replicas should also be run with --log-slave-updates=OFF (logging replica updates is the default). In this way, the replica is ready to become a source without restarting the replica mysqld. Assume that you have the structure shown in Figure 17.4, “Redundancy Using Replication, Initial Structure”.
In this diagram, the Source holds the source database, the Replica* hosts are replicas, and the Web Client machines are issuing database reads and writes. Web clients that issue only reads (and would normally be connected to the replicas) are not shown, as they do not need to switch to a new server in the event of failure.
Each MySQL replica (Replica 1, Replica 2, and Replica 3) is a replica running with binary logging enabled, and with --log-slave-updates=OFF. Because updates received by a replica from the source are not written to the binary log when --log-slave-updates=OFF is specified, the binary log on each replica is initially empty. If for some reason Source becomes unavailable, you can pick one of the replicas to become the new source. For example, if you pick Replica 1, all Web Clients should be redirected to Replica 1, which writes the updates to its binary log. Replica 2 and Replica 3 should then replicate from Replica 1.
The reason for running the replica with --log-slave-updates=OFF is to prevent replicas from receiving updates twice in case you cause one of the replicas to become the new source. If Replica 1 has -- log-slave-updates enabled, which is the default, it writes any updates that it receives from Source in its own binary log. This means that, when Replica 2 changes from Source to Replica 1 as its source, it may receive updates from Replica 1 that it has already received from Source.
Make sure that all replicas have processed any statements in their relay log. On each replica, issue STOP REPLICA IO_THREAD, then check the output of SHOW PROCESSLIST until you see Has read all relay log. When this is true for all replicas, they can be reconfigured to the new setup. On the replica Replica 1 being promoted【prəˈmoʊtɪd 促进;推动;提升;促销;晋升;推销;】 to become the source, issue STOP REPLICA and RESET MASTER.
On the other replicas Replica 2 and Replica 3, use STOP REPLICA and CHANGE REPLICATION SOURCE TO SOURCE_HOST='Replica1' or CHANGE MASTER TO MASTER_HOST='Replica1' (where 'Replica1' represents the real host name of Replica 1). To use CHANGE REPLICATION SOURCE TO, add all information about how to connect to Replica 1 from Replica 2 or Replica 3 (user, password, port). When issuing the statement in this scenario, there is no need to specify the name of the Replica 1 binary log file or log position to read from, since the first binary log file and position 4 are the defaults. Finally, execute START REPLICA on Replica 2 and Replica 3.
Once the new replication setup is in place, you need to tell each Web Client to direct its statements to Replica 1. From that point on, all updates sent by Web Client to Replica 1 are written to the binary log of Replica 1, which then contains every update sent to Replica 1 since Source became unavailable.
The resulting server structure is shown in Figure 17.5, “Redundancy Using Replication, After Source Failure”.
感觉,官网上这个图,画的有问题。
When Source becomes available again, you should make it a replica of Replica 1. To do this, issue on Source the same CHANGE REPLICATION SOURCE TO (or CHANGE MASTER TO) statement as that issued on Replica 2 and Replica 3 previously. Source then becomes a replica of Replica 1 and picks up the Web Client writes that it missed while it was offline.
To make Source a source again, use the preceding procedure as if Replica 1 were unavailable and Source were to be the new source. During this procedure, do not forget to run RESET MASTER on Source before making Replica 1, Replica 2, and Replica 3 replicas of Source. If you fail to do this, the replicas may pick up stale writes from the Web Client applications dating from before the point at which Source became unavailable.
You should be aware that there is no synchronization between replicas, even when they share the same source, and thus some replicas might be considerably ahead of others. This means that in some cases the procedure outlined in the previous example might not work as expected. In practice, however, relay logs on all replicas should be relatively close together.
One way to keep applications informed about the location of the source is to have a dynamic DNS entry for the source host. With BIND, you can use nsupdate to update the DNS dynamically.
9.Switching Sources and Replicas with Asynchronous Connection Failover
Beginning with MySQL 8.0.22, you can use the asynchronous【eɪˈsɪŋkrənəs 不同时存在(或发生)的;非共时的;】 connection failover mechanism to automatically establish an asynchronous (source to replica) replication connection to a new source after the existing connection from a replica to its source fails. The asynchronous connection failover mechanism can be used to keep a replica synchronized with multiple MySQL servers or groups of servers that share data. The list of potential source servers is stored on the replica, and in the event of a connection failure, a new source is selected from the list based on a weighted priority that you set.
From MySQL 8.0.23, the asynchronous connection failover mechanism also supports Group Replication topologies, by automatically monitoring changes to group membership and distinguishing between primary and secondary servers. When you add a group member to the source list and define it as part of a managed group, the asynchronous connection failover mechanism updates the source list to keep it in line with membership changes, adding and removing group members automatically as they join or leave. Only online group members that are in the majority are used for connections and obtaining status. The last remaining member of a managed group is not removed automatically even if it leaves the group, so that the configuration of the managed group is kept. However, you can delete a managed group manually if it is no longer needed.
From MySQL 8.0.27, the asynchronous connection failover mechanism also enables a replica that is part of a managed replication group to automatically reconnect to the sender if the current receiver (the primary of the group) fails. This feature works with Group Replication, on a group configured in single-primary mode, where the group’s primary is a replica that has a replication channel using the mechanism. The feature is designed for a group of senders and a group of receivers to keep synchronized with each other even when some members are temporarily unavailable. It also synchronizes a group of receivers with one or more senders that are not part of a managed group. A replica that is not part of a replication group cannot use this feature.
The requirements for using the asynchronous connection failover mechanism are as follows:
• GTIDs must be in use on the source and the replica (gtid_mode=ON), and the SOURCE_AUTO_POSITION | MASTER_AUTO_POSITION option of the CHANGE REPLICATION SOURCE TO | CHANGE MASTER TO statement must be enabled on the replica, so that GTID auto-positioning is used for the connection to the source.
• The same replication user account and password must exist on all the source servers in the source list for the channel. This account is used for the connection to each of the sources. You can set up different accounts for different channels.
• The replication user account must be given SELECT permissions on the Performance Schema tables, for example, by issuing GRANT SELECT ON performance_schema.* TO 'repl_user';
• The replication user account and password cannot be specified on the statement used to start replication, because they need to be available on the automatic restart for the connection to the alternative source. They must be set for the channel using the CHANGE REPLICATION SOURCE TO | CHANGE MASTER TO statement on the replica, and recorded in the replication metadata repositories.
• If the channel where the asynchronous connection failover mechanism is in use is on the primary of a Group Replication single-primary mode group, from MySQL 8.0.27, asynchronous connection failover between replicas is also active by default. In this situation, the replication channel and the replication user account and password for the channel must be set up on all the secondary servers in the replication group, and on any new joining members. If the new servers are provisioned using MySQL’s clone functionality, this all happens automatically.
【If you do not want asynchronous connection failover to take place between replicas in this situation, disable it by disabling the member action mysql_start_failover_channels_if_primary for the group, using the group_replication_disable_member_action function. When the feature is disabled, you do not need to configure the replication channel on the secondary group members, but if the primary goes offline or into an error state, replication stops for the channel.】
From MySQL Shell 8.0.27 and MySQL 8.0.27, MySQL InnoDB ClusterSet is available to provide disaster tolerance for InnoDB Cluster deployments by linking a primary InnoDB Cluster with one or more replicas of itself in alternate locations, such as different datacenters. Consider using this solution instead to simplify the setup of a new multi-group deployment for replication, failover, and disaster recovery. You can adopt an existing Group Replication deployment as an InnoDB Cluster.
InnoDB ClusterSet and InnoDB Cluster are designed to abstract and simplify the procedures for setting up, managing, monitoring, recovering, and repairing replication groups. InnoDB ClusterSet automatically manages replication from a primary cluster to replica clusters using a dedicated ClusterSet replication channel. You can use administrator commands to trigger a controlled switchover or emergency failover between groups if the primary cluster is not functioning normally. Servers and groups can easily be added to or removed from the InnoDB ClusterSet deployment after the initial setup when demand changes.
9.1 Asynchronous Connection Failover for Sources
To activate asynchronous connection failover for a replication channel set SOURCE_CONNECTION_AUTO_FAILOVER=1 on the CHANGE REPLICATION SOURCE TO statement (from MySQL 8.0.23) or CHANGE MASTER TO statement (before MySQL 8.0.23) for the channel. GTID auto-positioning must be in use for the channel (SOURCE_AUTO_POSITION = 1 | MASTER_AUTO_POSITION = 1).
【When the existing connection to a source fails, the replica first retries the same connection the number of times specified by the SOURCE_RETRY_COUNT | MASTER_RETRY_COUNT option of the CHANGE REPLICATION SOURCE TO | CHANGE MASTER TO statement. The interval between attempts is set by the SOURCE_CONNECT_RETRY | MASTER_CONNECT_RETRY option. When these attempts are exhausted, the asynchronous connection failover mechanism takes over. Note that the defaults for these options, which were designed for a connection to a single source, make the replica retry the same connection for 60 days. To ensure that the asynchronous connection failover mechanism can be activated promptly, set SOURCE_RETRY_COUNT | MASTER_RETRY_COUNT and SOURCE_CONNECT_RETRY | MASTER_CONNECT_RETRY to minimal numbers that just allow a few retry attempts with the same source, in case the connection failure is caused by a transient network outage. Suitable values are SOURCE_RETRY_COUNT=3 | MASTER_RETRY_COUNT=3 and SOURCE_CONNECT_RETRY=10 | MASTER_CONNECT_RETRY=10, which make the replica retry the connection 3 times with 10-second intervals between.】
You also need to set the source list for the replication channel, to specify the sources that are available for failover. You set and manage source lists using the asynchronous_connection_failover_add_source and asynchronous_connection_failover_delete_source functions to add and remove single replication source servers. To add and remove managed groups of servers, use the asynchronous_connection_failover_add_managed and asynchronous_connection_failover_delete_managed functions instead.
The functions name the relevant replication channel and specify the host name, port number, network namespace, and weighted priority (1-100, with 100 being the highest priority) of a MySQL instance to add to or delete from the channel's source list. For a managed group, you also specify the type of managed service (currently only Group Replication is available), and the identifier of the managed group (for Group Replication, this is the value of the group_replication_group_name system variable). When you add a managed group, you only need to add one group member, and the replica automatically adds the rest from the current group membership. When you delete a managed group, you delete the entire group together.
In MySQL 8.0.22, the asynchronous connection failover mechanism is activated following the failure of the replica's connection to the source, and it issues a START REPLICA statement to attempt to connect to a new source. In this release, the connection fails over if the replication receiver thread stops due to the source stopping or due to a network failure. The connection does not fail over in any other situations, such as when the replication threads are stopped by a STOP REPLICA statement.
From MySQL 8.0.23, the asynchronous connection failover mechanism also fails over the connection if another available server on the source list has a higher priority (weight) setting. This feature ensures that the replica stays connected to the most suitable source server at all times, and it applies to both managed groups and single (non-managed) servers. For a managed group, a source’s weight is assigned depending on whether it is a primary or a secondary server. So assuming that you set up the managed group to give a higher weight to a primary and a lower weight to a secondary, when the primary changes, the higher weight is assigned to the new primary, so the replica changes over the connection to it. The asynchronous connection failover mechanism additionally changes connection if the currently connected managed source server leaves the managed group, or is no longer in the majority in the managed group.
When failing over a connection, the source with the highest priority (weight) setting among the alternative【ɔːlˈtɜːrnətɪv 可供替代的;非传统的;另类的;】 sources listed in the source list for the channel is chosen for the first connection attempt. The replica checks first that it can connect to the source server, or in the case of a managed group, that the source server has ONLINE status in the group (not RECOVERING or unavailable). If the highest weighted source is not available, the replica tries with all the listed sources in descending【dɪˈsendɪŋ (次序)下降的,递减的;】 order of weight, then starts again from the highest weighted source. If multiple sources have the same weight, the replica orders them randomly【'rændəmli 随机;随意;未加计划地;】. If the replica needs to start working through the list again, it includes and retries the source to which the original connection failure occurred.
The source lists are stored in the mysql.replication_asynchronous_connection_failover and mysql.replication_asynchronous_connection_failover_managed tables, and can be viewed in the Performance Schema tables replication_asynchronous_connection_failover and replication_asynchronous_connection_failover_managed. The replica uses a monitor thread to track the membership of managed groups and update the source list (thread/sql/ replica_monitor). The setting for the SOURCE_CONNECTION_AUTO_FAILOVER option of the CHANGE REPLICATION SOURCE TO | CHANGE MASTER TO statement, and the source list, are transferred to a clone of the replica during a remote cloning operation.
9.2 Asynchronous Connection Failover for Replicas
From MySQL 8.0.27, asynchronous connection failover for replicas is automatically activated for a replication channel on a Group Replication primary when you set SOURCE_CONNECTION_AUTO_FAILOVER=1 on the CHANGE REPLICATION SOURCE TO statement for the channel. The feature is designed for a group of senders and a group of receivers to keep synchronized【ˈsɪŋkrənaɪzd (使)同步,在时间上一致,同速进行;】 with each other even when some members are temporarily unavailable. When the feature is active and correctly configured, if the primary that is replicating goes offline or into an error state, the new primary starts replication on the same channel when it is elected. The new primary uses the source list for the channel to select the source with the highest priority (weight) setting, which might not be the same as the original source.
To configure this feature, the replication channel and the replication user account and password for the channel must be set up on all the member servers in the replication group, and on any new joining members. Ensure that the SOURCE_RETRY_COUNT and SOURCE_CONNECT_RETRY settings are set to minimal numbers that just allow a few retry attempts, for example 3 and 10. You can set up the replication channel using the CHANGE REPLICATION SOURCE TO statement, or if the new servers are provisioned using MySQL’s clone functionality, this all happens automatically. The SOURCE_CONNECTION_AUTO_FAILOVER setting for the channel is broadcast to group members from the primary when they join. If you later disable SOURCE_CONNECTION_AUTO_FAILOVER for the channel on the primary, this is also broadcast to the secondary servers, and they change the status of the channel to match.
Asynchronous connection failover for replicas is activated and deactivated using the Group Replication member action mysql_start_failover_channels_if_primary, which is enabled by default. You can disable it for the whole group by disabling that member action on the primary, using the group_replication_disable_member_action function, as in this example:
mysql> SELECT group_replication_disable_member_action("mysql_start_failover_channels_if_primary", "AFTER_PRIMARY_ELECTION");
The function can only be changed on a primary, and must be enabled or disabled for the whole group, so you cannot have some members providing failover and others not. When the mysql_start_failover_channels_if_primary member action is disabled, the channel does not need to be configured on secondary members, but if the primary goes offline or into an error state, replication stops for the channel. Note that if there is more than one channel with SOURCE_CONNECTION_AUTO_FAILOVER=1 , the member action covers all the channels, so they cannot be individually enabled and disabled by that method. Set SOURCE_CONNECTION_AUTO_FAILOVER=0 on the primary to disable an individual channel.
The source list for a channel with SOURCE_CONNECTION_AUTO_FAILOVER=1 is broadcast to all group members when they join, and also when it changes. This is the case whether the sources are a managed group for which the membership is updated automatically, or whether they are added or changed manually using the asynchronous_connection_failover_add_source(), asynchronous_connection_failover_delete_source(), asynchronous_connection_failover_add_managed() or asynchronous_connection_failover_delete_managed() functions. All group members receive the current source list as recorded in the mysql.replication_asynchronous_connection_failover and mysql.replication_asynchronous_connection_failover_managed tables. Because the sources do not have to be in a managed group, you can set up the function to synchronize a group of receivers with one or more alternative standalone senders, or even a single sender. However, a standalone replica that is not part of a replication group cannot use this feature.