复合索引可以覆盖查询条件，为什么优化器却要走全表扫描

分区表、数据严重倾斜个别分区，查询是该分区只走全表扫描

发表于 2026/04/08

作者 Cotes Chung

11 分钟阅读

某分区表：

        
      
test=# \dt
                 关联列表
  架构模式 |       名称       |  类型  | 拥有者
----------+-----------------+-------+--------
 myschema | mytable         | 分区表 | system
 myschema | mytable_deafult | 数据表 | system
 myschema | mytable_p2018   | 数据表 | system
 myschema | mytable_p2019   | 数据表 | system
 myschema | mytable_p2020   | 数据表 | system
 myschema | mytable_p2021   | 数据表 | system
 myschema | mytable_p2022   | 数据表 | system
 myschema | mytable_p2023   | 数据表 | system
 myschema | mytable_p2024   | 数据表 | system
 myschema | mytable_p2025   | 数据表 | system
 myschema | mytable_p2026   | 数据表 | system

表结构如下：

        
      
test=# \d+ mytable;
                                    分区表 "myschema.mytable"
        栏位         |            类型             | 校对规则 |  可空的  | 预设 |   存储   | 统计目标 | 描述
---------------------+-----------------------------+----------+----------+------+----------+----------+------
 id                  | character varying(32 char)  |          | not null |      | extended |          |
 pack_code           | character varying(100 char) |          |          |      | extended |          |
 manufacturer_spec   | character varying(32 char)  |          |          |      | extended |          |
 production_province | character varying(6 char)   |          |          |      | extended |          |
 production_city     | character varying(6 char)   |          |          |      | extended |          |
 production_district | character varying(6 char)   |          |          |      | extended |          |
 dest_name           | character varying(255 char) |          |          |      | extended |          |
 dest_code           | character varying(18 char)  |          |          |      | extended |          |
 contained_type      | character varying(32 char)  |          |          |      | extended |          |
 module_count        | character varying(32 char)  |          |          |      | extended |          |
 cell_count          | character varying(32 char)  |          |          |      | extended |          |
 create_code         | character varying(18 char)  |          |          |      | extended |          |
 create_user_id      | character varying(32 char)  |          |          |      | extended |          |
 create_time         | timestamp without time zone |          | not null |      | plain    |          |
 update_user_id      | character varying(32 char)  |          |          |      | extended |          |
 update_time         | timestamp without time zone |          |          |      | plain    |          |
 del_flag            | smallint                    |          |          | 0    | plain    |          |
 bnum                | character varying(32 char)  |          |          |      | extended |          |
 spec_pack_id        | character varying(32 char)  |          |          |      | extended |          |
 pack_date           | date                        |          |          |      | plain    |          |
 section_type        | character varying(32 char)  |          |          |      | extended |          |
分区键值: RANGE (create_time)
索引：
    "mytable_pkey" PRIMARY KEY, btree (id, create_time)
    "mytable_pack_code_create_time_key" UNIQUE CONSTRAINT, btree (pack_code, create_time)
    "mytable_create_code" btree (create_code)
    "mytable_create_time" btree (create_time)
    "mytable_create_time_create_dest_code" btree (create_time, create_code, dest_code)
    "mytable_create_time_pack_code" btree (create_time, pack_code)
    "mytable_dest_code" btree (dest_ep_code)
    "mytable_pack_dest_code" btree (pack_code, dest_ep_code)
    "mytable_spec_pack_id" btree (spec_pack_id)
分区: mytable_p2018 FOR VALUES FROM ('2018-01-01 00:00:00') TO ('2018-12-31 23:59:59'),
     mytable_p2019 FOR VALUES FROM ('2019-01-01 00:00:00') TO ('2019-12-31 23:59:59'),
     mytable_p2020 FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2020-12-31 23:59:59'),
     mytable_p2021 FOR VALUES FROM ('2021-01-01 00:00:00') TO ('2021-12-31 23:59:59'),
     mytable_p2022 FOR VALUES FROM ('2022-01-01 00:00:00') TO ('2022-12-31 23:59:59'),
     mytable_p2023 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-12-31 23:59:59'),
     mytable_p2024 FOR VALUES FROM ('2024-01-01 00:00:00') TO ('2024-12-31 23:59:59'),
     mytable_p2025 FOR VALUES FROM ('2025-01-01 00:00:00') TO ('2025-12-31 23:59:59'),
     mytable_p2026 FOR VALUES FROM ('2026-01-01 00:00:00') TO ('2026-12-31 23:59:59'),
     mytable_deafult DEFAULT

查看如下语句执行计划：

        
      
EXPLAIN ANALYZE
SELECT
	COUNT(*)
FROM
	mytable
WHERE create_code = '91440300791738553L' OR dest_code = '91440300791738553L')
	AND create_time BETWEEN '2025-03-31 00:00:00' AND '2026-03-31 23:59:59';

[!IMPORTANT]

Aggregate (cost=1011349.78..1011349.79 rows=1 width=8) (actual time=6748.168..6748.170 rows=1 loops=1) -> Append (cost=0.00..1001464.52 rows=3954104 width=0) (actual time=0.041..6574.916 rows=3995824 loops=1) -> Seq Scan on mytable_p2025 (cost=0.00..955834.32 rows=3913205 width=0) (actual time=0.040..6325.847 rows=3954274 loops=1) Filter: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text))) Rows Removed by Filter: 11545726 -> Bitmap Heap Scan on mytable_p2026 mytable_1 (cost=2583.66..25859.68 rows=40899 width=0) (actual time=11.916..83.526 rows=41550 loops=1) Recheck Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Filter: (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 82028 Heap Blocks: exact=20816 -> Bitmap Index Scan on mytable_p2026_create_time (cost=0.00..2573.43 rows=122101 width=0) (actual time=9.856..9.856 rows=123578 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Planning Time: 0.239 ms Execution Time: 6748.320 ms

[!CAUTION]

从上述执行计划可以看出，语句执行主要开销在顺序扫描mytable_p2025这个分区上了，也就是说我们必须把优化的重点放在这里；从表结构可以看到为了提高上述查询语句的效率专门创建了mytable_create_time_create_dest_code索引，理论上该索引完美覆盖了这个查询语句的条件，但是为什么优化器没有选择这个索引呢？

我们让优化器使用这个索引回怎么样呢？

在SELECT语句中加入hint如下：

        
      
SELECT /*+ IndexOnlyScan(bpp mytable_create_time_create_dest_code) */ count(*) ......

[!IMPORTANT]

Index Only Scan using mytable_create_time_create_dest_code on mytable_p2025 (cost=0.56..4008005.27 rows=2618950 width=0) (actual time=0.057..5951.804 rows=2579288 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zon e) AND (create_ep_code = ‘91440300791738553L’::text)) Heap Fetches: 2579288

在执行计划里：

Heap Fetches: 2579288

表示👉 执行过程中，从表（heap）里实际读取了 2579288 行数据 换句话说 👉 通过索引找到行后，又回表访问数据页的次数

也就是说IndexOnlyScan最后还得回表，这就是为什么优化器没有选择走这个索引的原因；

[!CAUTION]

如何降低这个Heap Fetches呢？

VACUUM（最直接）
1 VACCUM ANALYZE mytable;

执行 vaccum 后执行计划变为：

[!IMPORTANT]

Aggregate (cost=721037.32..721037.33 rows=1 width=8) (actual time=5679.473..5679.474 rows=1 loops=1) -> Append (cost=0.56..711128.26 rows=3963622 width=0) (actual time=0.019..5521.362 rows=3993968 loops=1) -> Index Only Scan using mytable_create_time_create_dest_code on mytable_p2025 (cost=0.56..684194.46 rows=3923059 width=0) (actual time=0.018..5310.303 rows=3952547 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Filter: (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 7777771 Heap Fetches: 0 -> Index Only Scan using mytable_create_time_create_dest_code on mytable_p2026 mytable_1 (cost=0.42..7114.67 rows=40562 width=0) (actual time=0.018..45.150 rows=41421 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Filter: (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 81859 Heap Fetches: 0 -> Seq Scan on mytable_deafult mytable_2 (cost=0.00..1.02 rows=1 width=0) (actual time=0.013..0.013 rows=0 loops=1) Filter: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text))) Rows Removed by Filter: 1 Planning Time: 0.305 ms Execution Time: 5679.516 ms

至此，所有能进行的索引优化已经都做了，已经完全避免回表（之前最大瓶颈已解决），已经完全避免回表（之前最大瓶颈已解决），没扫无关分区；但是目前的查询依旧接近6秒，那么下一步如何进行优化呢？

要进一步优化必须先找出查询的瓶颈究竟在哪里？对于这个查询我们可以看到最耗时的步骤就是第一个Index Only Scan，90%以上的代价都花在了这里，看看这里我们做了什么：

[!CAUTION]

WHERE create_time BETWEEN … 命中 ≈ 1170万行

然后

create_code = ‘xxx’ OR dest_code = ‘xxx’

过滤留下 ≈ 400万行

索引入口没有用到最有选择性的条件

当前执行路径（本质）：

1️⃣ 按 create_time 扫索引（很大范围） 2️⃣ 再做 OR 过滤（CPU过滤）

相当于扫描 1170万 → 过滤 → 得到 400万

总结一句话，当前扫描的数据还是太多了

所以要进一步进行优化只能是重写SQL，让扫描的数据下降：

        
      
SELECT COUNT(*) FROM (
    SELECT 1
    FROM mytable
    WHERE create_code = 'xxx'
      AND create_time BETWEEN ...

    UNION ALL

    SELECT 1
    FROM mytable
    WHERE dest_code = 'xxx'
      AND create_time BETWEEN ...
      AND create_code <> 'xxx'
) t;

重建索引：

        
CREATE INDEX idx_mytable_create_code_time on mytable(create_code, create_time);
CREATE INDEX idx_mytable_dest_code_time on mytable(dest_ep_code, create_time)

查询查询效果如下：

        
      
EXPLAIN ANALYZE
SELECT
	(SELECT count(1)
	FROM mytable
	WHERE create_code = '91440300791738553L' AND create_time BETWEEN '2025-03-31 00:00:00' AND '2026-03-31 23:59:59')
	+
	(SELECT count(1)
	FROM mytable
	WHERE dest_code = '91440300791738553L' AND create_time BETWEEN '2025-03-31 00:00:00' AND '2026-03-31 23:59:59' AND create_code <> '91440300791738553L')
AS total;

[!IMPORTANT]

Result (cost=717082.23..717082.24 rows=1 width=8) (actual time=3736.311..3736.314 rows=1 loops=1) InitPlan 1 (returns $0) -> Aggregate (cost=141400.96..141400.97 rows=1 width=8) (actual time=837.900..837.901 rows=1 loops=1) -> Append (cost=0.56..134907.04 rows=2597566 width=0) (actual time=0.021..732.414 rows=2606344 loops=1) -> Index Only Scan using mytable_p2025_create_code_create_time_idx on mytable_p2025 (cost=0.56..120615.38 rows=2569903 width=0) (actual time=0.021..630.117 rows=2579288 loops=1) Index Cond: ((create_code = ‘91440300791738553L’::text) AND (create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Heap Fetches: 0 -> Index Only Scan using mytable_p2026_create_code_create_time_idx on mytable_p2026 mytable_1 (cost=0.42..1302.82 rows=27662 width=0) (actual time=0.019..4.205 rows=27056 loops=1) Index Cond: ((create_code = ‘91440300791738553L’::text) AND (create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Heap Fetches: 0 -> Seq Scan on mytable_deafult mytable_2 (cost=0.00..1.02 rows=1 width=0) (actual time=0.015..0.015 rows=0 loops=1) Filter: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND ((create_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 1 InitPlan 2 (returns $1) -> Aggregate (cost=575681.26..575681.27 rows=1 width=8) (actual time=2898.407..2898.408 rows=1 loops=1) -> Append (cost=0.56..572149.84 rows=1412567 width=0) (actual time=0.026..2840.225 rows=1387624 loops=1) -> Index Only Scan using mytable_p2025_create_time_create_code_dest_c_idx on mytable_p2025 mytable_3 (cost=0.56..559143.92 rows=1398080 width=0) (actual time=0.026..2755.691 rows=1373259 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND (dest_code = ‘91440300791738553L’::text)) Filter: ((create_code)::text <> ‘91440300791738553L’::text) Rows Removed by Filter: 386293 Heap Fetches: 0 -> Index Only Scan using mytable_p2026_create_time_create_code_dest_c_idx on mytable_p2026 mytable_4 (cost=0.42..5942.06 rows=14486 width=0) (actual time=0.036..27.169 rows=14365 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND (dest_code = ‘91440300791738553L’::text)) Filter: ((create_code)::text <> ‘91440300791738553L’::text) Rows Removed by Filter: 4040 Heap Fetches: 0 -> Seq Scan on mytable_deafult mytable_5 (cost=0.00..1.02 rows=1 width=0) (actual time=0.012..0.012 rows=0 loops=1) Filter: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND ((create_code)::text <> ‘91440300791738553L’::text) AND ((dest_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 1 Planning Time: 0.502 ms Execution Time: 3736.352 ms

至此查询效率有了大幅度的提升；

Database, PostgreSQL

Optimization

本文由作者按照 CC BY 4.0 进行授权

复合索引可以覆盖查询条件，为什么优化器却要走全表扫描

热门标签