clustered index disadvantages in mysql
instead. If there’s no such index, InnoDB will define a hidden primary key for you
and then cluster on that.* InnoDB clusters records together only within a page. Pages
with adjacent key values may be distant from each other.
These benefits can boost performance tremendously if you design your tables and queries
to take advantage of them. However, clustered indexes also have disadvantages:
• Clustering gives the largest improvement for I/O-bound workloads. If the data
fits in memory the order in which it’s accessed doesn’t really matter, so clustering
doesn’t give much benefit.
• Insert speeds depend heavily on insertion order. Inserting rows in primary key
order is the fastest way to load data into an InnoDB table. It may be a good idea
to reorganize the table with OPTIMIZE TABLE after loading a lot of data if you
didn’t load the rows in primary key order.
• Updating the clustered index columns is expensive, because it forces InnoDB to
move each updated row to a new location.
• Tables built upon clustered indexes are subject to page splits when new rows are
inserted, or when a row’s primary key is updated such that the row must be
moved. A page split happens when a row’s key value dictates that the row must
be placed into a page that is full of data. The storage engine must split the page
into two to accommodate the row. Page splits can cause a table to use more
space on disk.
• Clustered tables can be slower for full table scans, especially if rows are less
densely packed or stored nonsequentially because of page splits.
• Secondary (nonclustered) indexes can be larger than you might expect, because
their leaf nodes contain the primary key columns of the referenced rows.
• Secondary index accesses require two index lookups instead of one.
The last point can be a bit confusing. Why would a secondary index require two
index lookups? The answer lies in the nature of the “row pointers” the secondary
index stores. Remember, a leaf node doesn’t store a pointer to the referenced row’s
physical location; rather, it stores the row’s primary key values.