复合索引可以覆盖查询条件,为什么优化器却要走全表扫描
分区表、数据严重倾斜个别分区,查询是该分区只走全表扫描
复合索引可以覆盖查询条件,为什么优化器却要走全表扫描
某分区表:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
test=# \dt
关联列表
架构模式 | 名称 | 类型 | 拥有者
----------+-----------------+-------+--------
myschema | mytable | 分区表 | system
myschema | mytable_deafult | 数据表 | system
myschema | mytable_p2018 | 数据表 | system
myschema | mytable_p2019 | 数据表 | system
myschema | mytable_p2020 | 数据表 | system
myschema | mytable_p2021 | 数据表 | system
myschema | mytable_p2022 | 数据表 | system
myschema | mytable_p2023 | 数据表 | system
myschema | mytable_p2024 | 数据表 | system
myschema | mytable_p2025 | 数据表 | system
myschema | mytable_p2026 | 数据表 | system
表结构如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
test=# \d+ mytable;
分区表 "myschema.mytable"
栏位 | 类型 | 校对规则 | 可空的 | 预设 | 存储 | 统计目标 | 描述
---------------------+-----------------------------+----------+----------+------+----------+----------+------
id | character varying(32 char) | | not null | | extended | |
pack_code | character varying(100 char) | | | | extended | |
manufacturer_spec | character varying(32 char) | | | | extended | |
production_province | character varying(6 char) | | | | extended | |
production_city | character varying(6 char) | | | | extended | |
production_district | character varying(6 char) | | | | extended | |
dest_name | character varying(255 char) | | | | extended | |
dest_code | character varying(18 char) | | | | extended | |
contained_type | character varying(32 char) | | | | extended | |
module_count | character varying(32 char) | | | | extended | |
cell_count | character varying(32 char) | | | | extended | |
create_code | character varying(18 char) | | | | extended | |
create_user_id | character varying(32 char) | | | | extended | |
create_time | timestamp without time zone | | not null | | plain | |
update_user_id | character varying(32 char) | | | | extended | |
update_time | timestamp without time zone | | | | plain | |
del_flag | smallint | | | 0 | plain | |
bnum | character varying(32 char) | | | | extended | |
spec_pack_id | character varying(32 char) | | | | extended | |
pack_date | date | | | | plain | |
section_type | character varying(32 char) | | | | extended | |
分区键值: RANGE (create_time)
索引:
"mytable_pkey" PRIMARY KEY, btree (id, create_time)
"mytable_pack_code_create_time_key" UNIQUE CONSTRAINT, btree (pack_code, create_time)
"mytable_create_code" btree (create_code)
"mytable_create_time" btree (create_time)
"mytable_create_time_create_dest_code" btree (create_time, create_code, dest_code)
"mytable_create_time_pack_code" btree (create_time, pack_code)
"mytable_dest_code" btree (dest_ep_code)
"mytable_pack_dest_code" btree (pack_code, dest_ep_code)
"mytable_spec_pack_id" btree (spec_pack_id)
分区: mytable_p2018 FOR VALUES FROM ('2018-01-01 00:00:00') TO ('2018-12-31 23:59:59'),
mytable_p2019 FOR VALUES FROM ('2019-01-01 00:00:00') TO ('2019-12-31 23:59:59'),
mytable_p2020 FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2020-12-31 23:59:59'),
mytable_p2021 FOR VALUES FROM ('2021-01-01 00:00:00') TO ('2021-12-31 23:59:59'),
mytable_p2022 FOR VALUES FROM ('2022-01-01 00:00:00') TO ('2022-12-31 23:59:59'),
mytable_p2023 FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-12-31 23:59:59'),
mytable_p2024 FOR VALUES FROM ('2024-01-01 00:00:00') TO ('2024-12-31 23:59:59'),
mytable_p2025 FOR VALUES FROM ('2025-01-01 00:00:00') TO ('2025-12-31 23:59:59'),
mytable_p2026 FOR VALUES FROM ('2026-01-01 00:00:00') TO ('2026-12-31 23:59:59'),
mytable_deafult DEFAULT
查看如下语句执行计划:
1
2
3
4
5
6
7
EXPLAIN ANALYZE
SELECT
COUNT(*)
FROM
mytable
WHERE create_code = '91440300791738553L' OR dest_code = '91440300791738553L')
AND create_time BETWEEN '2025-03-31 00:00:00' AND '2026-03-31 23:59:59';
[!IMPORTANT]
Aggregate (cost=1011349.78..1011349.79 rows=1 width=8) (actual time=6748.168..6748.170 rows=1 loops=1) -> Append (cost=0.00..1001464.52 rows=3954104 width=0) (actual time=0.041..6574.916 rows=3995824 loops=1) -> Seq Scan on mytable_p2025 (cost=0.00..955834.32 rows=3913205 width=0) (actual time=0.040..6325.847 rows=3954274 loops=1) Filter: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text))) Rows Removed by Filter: 11545726 -> Bitmap Heap Scan on mytable_p2026 mytable_1 (cost=2583.66..25859.68 rows=40899 width=0) (actual time=11.916..83.526 rows=41550 loops=1) Recheck Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Filter: (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 82028 Heap Blocks: exact=20816 -> Bitmap Index Scan on mytable_p2026_create_time (cost=0.00..2573.43 rows=122101 width=0) (actual time=9.856..9.856 rows=123578 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Planning Time: 0.239 ms Execution Time: 6748.320 ms
[!CAUTION]
从上述执行计划可以看出,语句执行主要开销在顺序扫描mytable_p2025这个分区上了,也就是说我们必须把优化的重点放在这里;从表结构可以看到为了提高上述查询语句的效率专门创建了mytable_create_time_create_dest_code索引,理论上该索引完美覆盖了这个查询语句的条件,但是为什么优化器没有选择这个索引呢?
我们让优化器使用这个索引回怎么样呢?
在SELECT语句中加入hint如下:
1
SELECT /*+ IndexOnlyScan(bpp mytable_create_time_create_dest_code) */ count(*) ......
[!IMPORTANT]
Index Only Scan using mytable_create_time_create_dest_code on mytable_p2025 (cost=0.56..4008005.27 rows=2618950 width=0) (actual time=0.057..5951.804 rows=2579288 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zon e) AND (create_ep_code = ‘91440300791738553L’::text)) Heap Fetches: 2579288
在执行计划里:
1
Heap Fetches: 2579288
表示👉 执行过程中,从表(heap)里实际读取了 2579288 行数据 换句话说 👉 通过索引找到行后,又回表访问数据页的次数
也就是说IndexOnlyScan最后还得回表,这就是为什么优化器没有选择走这个索引的原因;
[!CAUTION]
如何降低这个Heap Fetches呢?
VACUUM(最直接)
1 VACCUM ANALYZE mytable;
执行 vaccum 后执行计划变为:
[!IMPORTANT]
Aggregate (cost=721037.32..721037.33 rows=1 width=8) (actual time=5679.473..5679.474 rows=1 loops=1) -> Append (cost=0.56..711128.26 rows=3963622 width=0) (actual time=0.019..5521.362 rows=3993968 loops=1) -> Index Only Scan using mytable_create_time_create_dest_code on mytable_p2025 (cost=0.56..684194.46 rows=3923059 width=0) (actual time=0.018..5310.303 rows=3952547 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Filter: (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 7777771 Heap Fetches: 0 -> Index Only Scan using mytable_create_time_create_dest_code on mytable_p2026 mytable_1 (cost=0.42..7114.67 rows=40562 width=0) (actual time=0.018..45.150 rows=41421 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Filter: (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 81859 Heap Fetches: 0 -> Seq Scan on mytable_deafult mytable_2 (cost=0.00..1.02 rows=1 width=0) (actual time=0.013..0.013 rows=0 loops=1) Filter: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND (((create_code)::text = ‘91440300791738553L’::text) OR ((dest_code)::text = ‘91440300791738553L’::text))) Rows Removed by Filter: 1 Planning Time: 0.305 ms Execution Time: 5679.516 ms
至此,所有能进行的索引优化已经都做了,已经完全避免回表(之前最大瓶颈已解决),已经完全避免回表(之前最大瓶颈已解决),没扫无关分区;但是目前的查询依旧接近6秒,那么下一步如何进行优化呢?
要进一步优化必须先找出查询的瓶颈究竟在哪里?对于这个查询我们可以看到最耗时的步骤就是第一个Index Only Scan,90%以上的代价都花在了这里,看看这里我们做了什么:
[!CAUTION]
WHERE create_time BETWEEN … 命中 ≈ 1170万行
然后
create_code = ‘xxx’ OR dest_code = ‘xxx’
过滤留下 ≈ 400万行
索引入口没有用到最有选择性的条件
当前执行路径(本质):
1️⃣ 按 create_time 扫索引(很大范围) 2️⃣ 再做 OR 过滤(CPU过滤)
相当于扫描 1170万 → 过滤 → 得到 400万
总结一句话,当前扫描的数据还是太多了
所以要进一步进行优化只能是重写SQL,让扫描的数据下降:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT COUNT(*) FROM (
SELECT 1
FROM mytable
WHERE create_code = 'xxx'
AND create_time BETWEEN ...
UNION ALL
SELECT 1
FROM mytable
WHERE dest_code = 'xxx'
AND create_time BETWEEN ...
AND create_code <> 'xxx'
) t;
重建索引:
1
2
CREATE INDEX idx_mytable_create_code_time on mytable(create_code, create_time);
CREATE INDEX idx_mytable_dest_code_time on mytable(dest_ep_code, create_time)
查询查询效果如下:
1
2
3
4
5
6
7
8
9
10
11
EXPLAIN ANALYZE
SELECT
(SELECT count(1)
FROM mytable
WHERE create_code = '91440300791738553L' AND create_time BETWEEN '2025-03-31 00:00:00' AND '2026-03-31 23:59:59')
+
(SELECT count(1)
FROM mytable
WHERE dest_code = '91440300791738553L' AND create_time BETWEEN '2025-03-31 00:00:00' AND '2026-03-31 23:59:59' AND create_code <> '91440300791738553L')
AS total;
[!IMPORTANT]
Result (cost=717082.23..717082.24 rows=1 width=8) (actual time=3736.311..3736.314 rows=1 loops=1) InitPlan 1 (returns $0) -> Aggregate (cost=141400.96..141400.97 rows=1 width=8) (actual time=837.900..837.901 rows=1 loops=1) -> Append (cost=0.56..134907.04 rows=2597566 width=0) (actual time=0.021..732.414 rows=2606344 loops=1) -> Index Only Scan using mytable_p2025_create_code_create_time_idx on mytable_p2025 (cost=0.56..120615.38 rows=2569903 width=0) (actual time=0.021..630.117 rows=2579288 loops=1) Index Cond: ((create_code = ‘91440300791738553L’::text) AND (create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Heap Fetches: 0 -> Index Only Scan using mytable_p2026_create_code_create_time_idx on mytable_p2026 mytable_1 (cost=0.42..1302.82 rows=27662 width=0) (actual time=0.019..4.205 rows=27056 loops=1) Index Cond: ((create_code = ‘91440300791738553L’::text) AND (create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone)) Heap Fetches: 0 -> Seq Scan on mytable_deafult mytable_2 (cost=0.00..1.02 rows=1 width=0) (actual time=0.015..0.015 rows=0 loops=1) Filter: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND ((create_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 1 InitPlan 2 (returns $1) -> Aggregate (cost=575681.26..575681.27 rows=1 width=8) (actual time=2898.407..2898.408 rows=1 loops=1) -> Append (cost=0.56..572149.84 rows=1412567 width=0) (actual time=0.026..2840.225 rows=1387624 loops=1) -> Index Only Scan using mytable_p2025_create_time_create_code_dest_c_idx on mytable_p2025 mytable_3 (cost=0.56..559143.92 rows=1398080 width=0) (actual time=0.026..2755.691 rows=1373259 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND (dest_code = ‘91440300791738553L’::text)) Filter: ((create_code)::text <> ‘91440300791738553L’::text) Rows Removed by Filter: 386293 Heap Fetches: 0 -> Index Only Scan using mytable_p2026_create_time_create_code_dest_c_idx on mytable_p2026 mytable_4 (cost=0.42..5942.06 rows=14486 width=0) (actual time=0.036..27.169 rows=14365 loops=1) Index Cond: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND (dest_code = ‘91440300791738553L’::text)) Filter: ((create_code)::text <> ‘91440300791738553L’::text) Rows Removed by Filter: 4040 Heap Fetches: 0 -> Seq Scan on mytable_deafult mytable_5 (cost=0.00..1.02 rows=1 width=0) (actual time=0.012..0.012 rows=0 loops=1) Filter: ((create_time >= ‘2025-03-31 00:00:00’::timestamp without time zone) AND (create_time <= ‘2026-03-31 23:59:59’::timestamp without time zone) AND ((create_code)::text <> ‘91440300791738553L’::text) AND ((dest_code)::text = ‘91440300791738553L’::text)) Rows Removed by Filter: 1 Planning Time: 0.502 ms Execution Time: 3736.352 ms
至此查询效率有了大幅度的提升;