【转】Duplicate Entry in ALTER TABLE (and in OPTIMIZE TABLE)

A few weeks ago and in MySQL 5.7, I had an ALTER TABLE that failed with a duplicate entry error.  This is old news as it is happening since MySQL 5.6, but I only saw this recently because I am normally using online schema change from the Percona Toolkit (pt-osc) or GitHub's online schema migration (gh-ost).  I do not like that and I am disappointed this has not been improved, so this post is partially a rant, but not only, because I explain why ALTER TABLE is failing and I give solutions to avoid the failure.

So ALTER TABLE can fail with a duplicate entry error, let's see this:

mysql [localhost:8027] {msandbox} (test_jfg) > ALTER TABLE t DROP COLUMN v2;
ERROR 1062 (23000): Duplicate entry '1' for key 't.v'

As you can see, this is not a complicated ALTER.  Above is failing for MySQL 8.0.27, and I am also able to reproduce with 5.7.36 and 5.6.51, but not with 5.5.62.  The full reproduction test case is at the end of this post.

I am not the first one seeing this.  When googling "MySQL ALTER TABLE Duplicate entry", I got these results (and yes, OPTIMIZE TABLE has a similar problem):

 

 

In the links above, we can get reference to the Online DDL Limitations section of the MySQL Documentation:

When running an in-place online DDL operation, the thread that runs the ALTER TABLE statement applies an online log of DML operations that were run concurrently on the same table from other connection threads. When the DML operations are applied, it is possible to encounter a duplicate key entry error (ERROR 1062 (23000): Duplicate entry), even if the duplicate entry is only temporary and would be reverted by a later entry in the online log. This is similar to the idea of a foreign key constraint check in InnoDB in which constraints must hold during a transaction.

It looks like the online DDL improvements of MySQL 5.6 introduced this behavior.  In my humble opinion, this is a regression: a perfectly working ALTER in 5.5 is now throwing an error in 5.6, 5.7 and 8.0.  From my understanding, the error condition is the following

While the ALTER is running,
anything causing a duplicate entry, even a "silent" one,
makes the ALTER fails in the "log apply" phase.

An example of a "silent" duplicate entry is the one "hidden" by INSERT ... ON DUPLICATE KEY UPDATE (IODKU).  Yes, the ALTER fails with a duplicate entry error if an IODKU ends-up doing an UPDATE while the table is altered (what the documentation calls a temporary duplicate entry) !  And the ALTER is not failing at the moment of the succeeding IODKU (or at the moment of the failing INSERT), it will fail much later - maybe hours later - at log apply phase.  This is a terrible default behavior that is very annoying !

Before ranting even more, let me tell you how to avoid this error.  The first solution is to use a blocking DDL by adding LOCK=EXCLUSIVE to the ALTER.  A non-blocking solution, which is also friendlier to replication (an online ALTER might be partially non-blocking on the primary, but it will block the SQL thread on replicas which might cause lag) is to use pt-osc or gh-ost.  Finally, another solution is to configure MySQL not to use DDLs by setting the variable old_alter_table to ON, which can be set at the session or at the global level (this is the only solution I found for OPTIMIZE TABLE).

If you are not interested in the rant, you should stop reading now.

The first annoyance about ALTER failing with a duplicate entry error is the misleading error message.  The first time I saw this, I asked myself "How can this be ?", "Did I do something wrong ?" "Is my data corrupted ?" and I was worried.  For being more exact and less worrying, the error message should be something like "Online ALTER failed during the log apply phase because of a concurrent duplicate entry (or INSERT ... ON DUPLICATE KEY UPDATE)", and ideally, it should give a solution to this problem (add LOCK=EXCLUSIVE to the ALTER, or use SET old_alter_table = ON).  And for this, I opened Bug#104608 - Improve error message for ATLER failing because concurrent Duplicate entry.

The second annoyance is the unfriendly behavior.  For me, this failure mode should not be the default behavior.  I like that MySQL is implementing online DDL, but online DDL that could fail should not be the default behavior.  I would prefer, when introducing an improvement that could fail, that benefiting from the improvement is an option (in this case, it would be LOCK=NONE) with the previous behavior staying the default.  I understand that is is now a little late to go back to the MySQL 5.5 behavior because it has become expected that some DDL are online, but I wish we would not be where we are today on this.  This is now part of the Dark Side of MySQL until this is fixed (and this is why I am calling this situation the ONLINE DDL purgatory in the next bug report).

A third annoyance about this behavior is that this was not improved for 8 years (MySQL 5.6 was GA in February of 2013) !  Improving the error message as I suggested above is one thing, removing this failure condition is another.  I realize this might be complicated, but let's start with the error message.  Once the message is better, I do not think it is unrealistic to hope for something better.  And for this, I opened Bug#104626 - Remove failure of Online ALTER because concurrent Duplicate entry.

That is all I have for today, if you have historical context that might change my mind about this behavior being unfriendly, or if you have other thoughts on the subject, feel free to leave a comment below.

 

Reproducing the Duplicate Entry Error During ALTER TABLE with dbdeployer

 
# Create a sandbox:
dbdeployer deploy single mysql_8.0.27

# Initialize the test environment:
./use <<< "
  CREATE DATABASE test_jfg;
  CREATE TABLE test_jfg.t(
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    v INT, v2 INT, UNIQUE KEY(v))"
 
# Make rows big to be able to quickly generate a big table (each row is close to 8 KB):
seq -f "ALTER TABLE t add column c%02.0f CHAR(255) DEFAULT '';" 1 30 | ./use test_jfg

# Make commit fast so loading is not too long:
./use <<< "set global sync_binlog = 0, global innodb_flush_log_at_trx_commit = 0"

# Fill the table with about 1 GB of data:
seq -f "INSERT INTO t (v) VALUES (%0.0f);" 1 $((1024*1024/8)) | ./use test_jfg

# Generate the error with a failing INSERT:
./use -N <<< "SELECT VERSION()" && 
  ( sleep 1; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1)" || echo "INSERT KO $(date)"; ) & \
  ./use test_jfg <<< "ALTER TABLE t DROP COLUMN c01" && echo "ALTER OK $(date)" || echo "ALTER KO $(date)"
[1] 23767
8.0.27
ERROR 1062 (23000) at line 1: Duplicate entry '1' for key 't.v'
INSERT KO Mon Nov  8 21:56:29 UTC 2021
ERROR 1062 (23000) at line 1: Duplicate entry '1' for key 't.v'
[1]+  Done                    ./use -N <<< "SELECT VERSION()" && ( sleep 1; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1)" || echo "INSERT KO $(date)" )
ALTER KO Mon Nov  8 21:57:23 UTC 2021

# In above, the first duplicate entry is from the INSERT and the 2nd is from the ALTER.
#   We can see that the INSERT completes before the ALTER (because of the online DDL),
#   which is not the case with 5.5.62 below (INSERT blocked and ALTER succeeding).
./use -N <<< "SELECT VERSION()" && 
  ( sleep 1; date; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1)" || echo "INSERT KO $(date)"; ) & \
  ./use test_jfg <<< "ALTER TABLE t DROP COLUMN c01" && echo "ALTER OK $(date)" || echo "ALTER KO $(date)"
[1] 10052
5.5.62
Mon Nov  8 23:31:45 UTC 2021
ERROR 1062 (23000) at line 1: Duplicate entry '1' for key 'v'
ALTER OK Mon Nov  8 23:32:07 UTC 2021
INSERT KO Mon Nov  8 23:32:07 UTC 2021
[1]+  Done                    ./use -N <<< "SELECT VERSION()" && ( sleep 1; date; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1)" || echo "INSERT KO $(date)" )

# And we have a similar error with OPTIMIZE TABLE:
./use -N <<< "SELECT VERSION()" && 
  ( sleep 1; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1)" || echo "INSERT KO $(date)"; ) & \
  ./use test_jfg <<< "OPTIMIZE TABLE t"; echo "OPTIMIZE $(date)"
[1] 31753
8.0.27
ERROR 1062 (23000) at line 1: Duplicate entry '1' for key 't.v'
INSERT KO Mon Nov  8 22:05:48 UTC 2021
Table   Op      Msg_type        Msg_text
test_jfg.t      optimize        note    Table does not support optimize, doing recreate + analyze instead
test_jfg.t      optimize        error   Duplicate entry '1' for key 't.v'
test_jfg.t      optimize        status  Operation failed
[1]+  Done                    ./use -N <<< "SELECT VERSION()" && ( sleep 1; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1)" || echo "INSERT KO $(date)" )
OPTIMIZE Mon Nov  8 22:06:53 UTC 2021

# For completeness, generate the error with an INSERT ON DUPLICATE KEY UPDATE:
./use -N <<< "SELECT VERSION()" && 
  ( sleep 1; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1) ON DUPLICATE KEY UPDATE v2=0" && echo "IODKU OK $(date)"; ) & \
./use test_jfg <<< "ALTER TABLE t DROP COLUMN c02" && echo "ALTER OK $(date)" || echo "ALTER KO $(date)" [1] 2356 8.0.27 IODKU OK Mon Nov 8 22:08:58 UTC 2021 ERROR 1062 (23000) at line 1: Duplicate entry '1' for key 't.v' [1]+ Done ./use -N <<< "SELECT VERSION()" && ( sleep 1; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1) ON DUPLICATE KEY UPDATE v2=0" && echo "IODKU OK $(date)" ) ALTER KO Mon Nov 8 22:10:06 UTC 2021
 
Reference:
https://jfg-mysql.blogspot.com/2021/11/duplicate-entry-in-alter-table-and-in-optimize-table.html
posted @ 2022-07-19 12:24  piperck  阅读(101)  评论(0编辑  收藏  举报