【postgresql案例一】一個LIMIT引發的“血案”
之前給業務排查的時候發現,業務上存在大量的全量查詢,其中很多資訊都是無用的,採用全量查詢存在以下幾個方面問題:
- 查詢資料量大,導致傳送時間長,在資料庫中經常看到慢查詢日誌,經常需要分析
- 當前業務採用sqlachemly方式操作資料庫,資料量大的時候,經常出現python接收慢,並且佔用大量記憶體的情況,之前出現過大併發下全量查詢導致客戶端卡死的情況
根據經驗,提出了加上LIMIT的方法,限制每次查詢返回的行數,並且這也是業務社群上提供的方法。但是,萬萬沒有想到,這樣一個小小的LIMIT,竟然導致業務效能下降百倍,生產環境“癱瘓”兩次,真是苦不堪言啊,下面一起看下這個“血案”究竟是怎樣發生的。
查詢SQL介紹
通過對業務模型梳理,簡化後的業務查詢SQL語句如下:
create table T_A(id int, c_1 int); create index t_a_id on T_A(id); create index t_a_c on T_A(c_1); create table T_B(id int, c_1 int); create index t_b_id on T_B(id); insert into T_A select generate_series(1,200000),generate_series(1,1000000); insert into T_A select generate_series(1,200000),generate_series(1,1000000); insert into T_A select generate_series(1,200000),generate_series(1,1000000); insert into T_A select generate_series(1,200000),generate_series(1,1000000); insert into T_A select generate_series(1,200000),generate_series(1,1000000); truncate table t_b; insert into T_B select generate_series(1,1000),generate_series(1,10); vacuum analyze t_a; vacuum analyze t_b;
查詢SQL如下:
explain (analyze,buffers,verbose) select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 4000;
執行結果
目前使用的是PG9.2的主幹版本,首先在該版本上測試:
postgres=# explain (analyze,buffers,verbose) select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 4000;postgres-#
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..96110.03 rows=4000 width=4) (actual time=0.023..4798.584 rows=2505 loops=1)
Output: t_a.id
Buffers: shared hit=4977865 read=40181 written=2720
-> Merge Left Join (cost=0.42..12013798.38 rows=500004 width=4) (actual time=0.023..4798.168 rows=2505 loops=1)
Output: t_a.id
Merge Cond: (t_a.id = t_b.id)
Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
Rows Removed by Filter: 4997495
Buffers: shared hit=4977865 read=40181 written=2720
-> Index Scan using t_a_id on public.t_a (cost=0.00..12000762.83 rows=5000000 width=8) (actual time=0.008..3570.243 rows=5000000 loops=1)
Output: t_a.id, t_a.c_1
Buffers: shared hit=4977856 read=40181 written=2720
-> Materialize (cost=0.00..45.75 rows=1000 width=8) (actual time=0.009..3.625 rows=24976 loops=1)
Output: t_b.id, t_b.c_1
Buffers: shared hit=9
-> Index Scan using t_b_id on public.t_b (cost=0.00..43.25 rows=1000 width=8) (actual time=0.007..0.413 rows=1000 loops=1)
Output: t_b.id, t_b.c_1
Buffers: shared hit=9
Total runtime: 4798.890 ms
(19 rows)
postgres=# select version();
version
-----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.24 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)
從上面的執行計劃中,可以看出,優化器在對OR條件進行選擇的時候,採用的是index scan的方式,對錶t_a的掃描消耗了3.57s。
對比下,去掉LIMIT的執行計劃
postgres=# explain (analyze,buffers,verbose) select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASCpostgres-# ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=152278.05..153528.06 rows=500004 width=4) (actual time=2409.779..2410.012 rows=2505 loops=1)
Output: t_a.id
Sort Key: t_a.id
Sort Method: quicksort Memory: 214kB
Buffers: shared hit=3149 read=18983
-> Hash Left Join (cost=27.50..91270.73 rows=500004 width=4) (actual time=0.514..2408.678 rows=2505 loops=1)
Output: t_a.id
Hash Cond: (t_a.id = t_b.id)
Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
Rows Removed by Filter: 4997495
Buffers: shared hit=3146 read=18983
-> Seq Scan on public.t_a (cost=0.00..72124.00 rows=5000000 width=8) (actual time=0.030..809.089 rows=5000000 loops=1)
Output: t_a.id, t_a.c_1
Buffers: shared hit=3144 read=18980
-> Hash (cost=15.00..15.00 rows=1000 width=8) (actual time=0.463..0.463 rows=1000 loops=1)
Output: t_b.id, t_b.c_1
Buckets: 1024 Batches: 1 Memory Usage: 40kB
Buffers: shared hit=2 read=3
-> Seq Scan on public.t_b (cost=0.00..15.00 rows=1000 width=8) (actual time=0.003..0.198 rows=1000 loops=1)
Output: t_b.id, t_b.c_1
Buffers: shared hit=2 read=3
Total runtime: 2410.330 ms
(22 rows)
從執行計劃上可以看到,這裡採用的是全表掃描方式,耗時只有809ms。
是的,全表掃描只有809ms,索引掃描耗時3.57ms,這部分時間差距是整個語句的差距,沒有看錯。
由於目前使用的是PG92版本,再測試一個PG9.6.8的版本
postgres=# explain analyze select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 4000;postgres-# ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.75..1992.77 rows=4000 width=4) (actual time=0.024..4631.075 rows=2505 loops=1)
-> Merge Left Join (cost=1.75..248880.44 rows=500004 width=4) (actual time=0.023..4630.797 rows=2505 loops=1)
Merge Cond: (t_a.id = t_b.id)
Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
Rows Removed by Filter: 4997495
-> Index Scan using t_a_id on t_a (cost=0.43..235843.67 rows=5000000 width=8) (actual time=0.009..3393.133 rows=5000000 loops=1)
-> Materialize (cost=0.28..45.77 rows=1000 width=8) (actual time=0.009..2.465 rows=24976 loops=1)
-> Index Scan using t_b_id on t_b (cost=0.28..43.27 rows=1000 width=8) (actual time=0.008..0.384 rows=1000 loops=1)
Planning time: 0.586 ms
Execution time: 4631.266 ms
(10 rows)
postgres=# select version();
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.8 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)
發現加上LIMIT之後,執行時間差不多,採用的仍然是索引掃描方式(本地測試,PG10不存在這個問題,一方面存在並行查詢,二是有優化,關閉並行查詢之後,採用hash join的方式,不會選擇index scan)。
問題分析
仔細分析這個SQL語句,整個語句比較簡單,但是存在一個OR語法,並且是兩個不同的表:
select T_A.id from T_A left outer join T_B on T_A.id=T_B.id where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 4000;
查詢資料,分析執行計劃變慢的原因可能是:
由於後面存在order by id limit 4000,優化器認為只需要獲取id的前4000條元組即可,並且id是主鍵,因此會預設選擇id的索引掃描,認為這種方式是最快的,但不幸的是,由於前面是帶有OR的條件判斷,並且值和id無關,因此需要遍歷元組判斷是否滿足條件,導致掃描的檔案塊增多(每次從索引找到對應的元組是否滿足條件),從而效能下降。當然,這個語句只有在大資料量下,存在執行計劃發生變化。
如果採用LIMIT 400000條件,執行計劃如下:
postgres=# explain analyze select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 400000;postgres-#
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=145439.95..146439.95 rows=400000 width=4) (actual time=1976.531..1977.094 rows=2505 loops=1)
-> Sort (cost=145439.95..146689.96 rows=500004 width=4) (actual time=1976.530..1976.848 rows=2505 loops=1)
Sort Key: t_a.id
Sort Method: quicksort Memory: 214kB
-> Hash Left Join (cost=27.50..91271.62 rows=500004 width=4) (actual time=0.445..1975.517 rows=2505 loops=1)
Hash Cond: (t_a.id = t_b.id)
Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
Rows Removed by Filter: 4997495
-> Seq Scan on t_a (cost=0.00..72124.00 rows=5000000 width=8) (actual time=0.037..559.274 rows=5000000 loops=1)
-> Hash (cost=15.00..15.00 rows=1000 width=8) (actual time=0.394..0.394 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 48kB
-> Seq Scan on t_b (cost=0.00..15.00 rows=1000 width=8) (actual time=0.004..0.153 rows=1000 loops=1)
Planning time: 1.164 ms
Execution time: 1977.321 ms
(14 rows)
看到執行返回結果一致,從索引掃描變回順序掃描,從merge join變成hash join,執行時間大大減少。
最後,經過細緻分析,建議業務在ORDER BY上增加一個沒有影響的排序列,如ORDER BY T_A.id,T_A.c_1
執行計劃如下:
postgres=# explain analyze select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC,T_A.c_1 ASC limit 4000;postgres-#
QUERY PLAN
-------------------------------------------------------------------------------------------------------
--------------------------
Limit (cost=123686.35..123696.35 rows=4000 width=8) (actual time=1921.081..1921.665 rows=2505 loops=1
)
-> Sort (cost=123686.35..124936.36 rows=500004 width=8) (actual time=1921.081..1921.401 rows=2505
loops=1)
Sort Key: t_a.id, t_a.c_1
Sort Method: quicksort Memory: 214kB
-> Hash Left Join (cost=27.50..91271.62 rows=500004 width=8) (actual time=0.402..1919.781 ro
ws=2505 loops=1)
Hash Cond: (t_a.id = t_b.id)
Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
Rows Removed by Filter: 4997495
-> Seq Scan on t_a (cost=0.00..72124.00 rows=5000000 width=8) (actual time=0.063..565.
419 rows=5000000 loops=1)
-> Hash (cost=15.00..15.00 rows=1000 width=8) (actual time=0.330..0.330 rows=1000 loop
s=1)
Buckets: 1024 Batches: 1 Memory Usage: 48kB
-> Seq Scan on t_b (cost=0.00..15.00 rows=1000 width=8) (actual time=0.005..0.11
7 rows=1000 loops=1)
Planning time: 0.330 ms
Execution time: 1921.997 ms
(14 rows)
這樣修改之後,不影響查詢結果,執行計劃和之前保持一致,但是需要注意的是,不能再T_A表的(id,c_1)上建立組合索引,否則和之前一樣,執行計劃發生改變。
結束語
這個問題在我們的生產環境上出現了2次,後來排查,又發現了類似的問題,修改之後,效能沒有下降。並且在mysql5.5上測試,發現類似同樣的情況。
後來分析,最優的解決方式應該是採用UNION ALL的方式轉換OR條件,但是目前pg底層不能自動轉換。可喜的是,社群已經注意到這個問題,著手開發pacth進行底層效能優化,這樣效能會明顯提升
Convert join OR clauses into UNION queries
https://commitfest.postgresql.org/18/1001/
由於對資料庫底層的執行計劃不是很瞭解,因此對這個執行計劃改變過程不是很理解,希望高手能給解答下。