公告

日历

In the first example, the number of matched records is 1000, which is 1% of the entire table record. The cost of scanning the index and loading the rest of the columns from the table is lower than scanning the entire table, so PostgreSQL uses the idx_task_status index for the TO_DO predicate value.

In the second example, the predicate value of DONE has a very low index selectivity since it matches 95,000 table records. In this case, scanning the task table page by page is faster than doing 95,000 lookups from the idx_task_status index to the task table.

Partial or Filtered Indexes

When the indexed column values are skewed, it’s more efficient to use a Partial or Filtered Index, like the following one:

CREATE INDEX idx_task_status ON task (status) WHERE status <> 'DONE'

Since only the TO_DO and FAILED status values have high index selectivity, it’s better to build the index only for these values and skip the row identifiers having the status value of DONE. This will help us reduce the index size.

Now, because PostgreSQL provides only a pg_indexes_size function that gives us the size of all indexes associated to a given table, we will need to calculate the idx_task_status index size by subtracting the size of the Primary Key index.

As I explained in this article, PostgreSQL creates a default index on the Primary Key, which in our case looks as follows:

| tablename | indexname | indexdef                                                      |
|-----------|-----------|---------------------------------------------------------------|
| task      | task_pkey | CREATE UNIQUE INDEX task_pkey ON public.task USING btree (id) |

When calculating the size of all indexes on the task tables

SELECT
    pg_size_pretty(
        pg_indexes_size(relid)
    ) as "Index Size"
FROM
    pg_statio_user_tables
WHERE
    pg_statio_user_tables.relname = 'task'

We get the value of 2208 kB for the task_pkey index:

| Index | Size |
|-------|------|
| 2208  | kB   |

When creating the Partial Index that contains only the TO_DO and FAILED status values, the overall task table index size is:

| Index | Size |
|-------|------|
| 2264  | kB   |

So, the size of the Partial Index is just 56 kB.

However, if we create the Full Index that includes all values of the status column, the overall task table index size is:

| Index | Size |
|-------|------|
| 2904  | kB   |

So, the Full Index has a size of 696 kB and no advantage of the Partial Index since the index selectivity of the value of DONE is too low to be considered by our query predicate.

Conclusion

When running an SQL query, an index can help us speed up the query execution if the selectivity is high. Otherwise, the database might choose to avoid using an index.

If you have columns with skewed data, using a Partial Index can help you save space, therefore allowing you to store more index and table records in the Buffer Pool.

Copied from: https://vladmihalcea.com/index-selectivity/

posted on 2024-12-06 10:06 ZhangZhihuiAAA 阅读(8) 评论(0) 编辑收藏举报

刷新页面返回顶部

导航

Introduction

Domain Model

Index Selectivity

Partial or Filtered Indexes

Conclusion


Copyright © 2024 ZhangZhihuiAAA Powered by .NET 9.0 on Kubernetes 博客园