【postgresql案例一】一個LIMIT引發的“血案”

阿新 • • 發佈：2019-02-10

之前給業務排查的時候發現，業務上存在大量的全量查詢，其中很多資訊都是無用的，採用全量查詢存在以下幾個方面問題：

查詢資料量大，導致傳送時間長，在資料庫中經常看到慢查詢日誌，經常需要分析
當前業務採用sqlachemly方式操作資料庫，資料量大的時候，經常出現python接收慢，並且佔用大量記憶體的情況，之前出現過大併發下全量查詢導致客戶端卡死的情況

根據經驗，提出了加上LIMIT的方法，限制每次查詢返回的行數，並且這也是業務社群上提供的方法。但是，萬萬沒有想到，這樣一個小小的LIMIT，竟然導致業務效能下降百倍，生產環境“癱瘓”兩次，真是苦不堪言啊，下面一起看下這個“血案”究竟是怎樣發生的。

查詢SQL介紹

通過對業務模型梳理，簡化後的業務查詢SQL語句如下：

create table T_A(id int, c_1 int);
create index t_a_id on T_A(id);
create index t_a_c on T_A(c_1);
create table T_B(id int, c_1 int);
create index t_b_id on T_B(id);


insert into T_A select generate_series(1,200000),generate_series(1,1000000);
insert into T_A select generate_series(1,200000),generate_series(1,1000000);
insert into T_A select generate_series(1,200000),generate_series(1,1000000);
insert into T_A select generate_series(1,200000),generate_series(1,1000000);
insert into T_A select generate_series(1,200000),generate_series(1,1000000);

truncate table t_b;
insert into T_B select generate_series(1,1000),generate_series(1,10);


vacuum analyze t_a;
vacuum analyze t_b;

查詢SQL如下：

explain (analyze,buffers,verbose) select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 4000;

執行結果

目前使用的是PG9.2的主幹版本，首先在該版本上測試：

postgres=# explain (analyze,buffers,verbose) select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 4000;postgres-#
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..96110.03 rows=4000 width=4) (actual time=0.023..4798.584 rows=2505 loops=1)
   Output: t_a.id
   Buffers: shared hit=4977865 read=40181 written=2720
   ->  Merge Left Join  (cost=0.42..12013798.38 rows=500004 width=4) (actual time=0.023..4798.168 rows=2505 loops=1)
         Output: t_a.id
         Merge Cond: (t_a.id = t_b.id)
         Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
         Rows Removed by Filter: 4997495
         Buffers: shared hit=4977865 read=40181 written=2720
         ->  Index Scan using t_a_id on public.t_a 
  (cost=0.00..12000762.83 rows=5000000 width=8) (actual time=0.008..3570.243 rows=5000000 loops=1)
               Output: t_a.id, t_a.c_1
               Buffers: shared hit=4977856 read=40181 written=2720
         ->  Materialize  (cost=0.00..45.75 rows=1000 width=8) (actual time=0.009..3.625 rows=24976 loops=1)
               Output: t_b.id, t_b.c_1
               Buffers: shared hit=9
               ->  Index Scan using t_b_id on public.t_b  (cost=0.00..43.25 rows=1000 width=8) (actual time=0.007..0.413 rows=1000 loops=1)
                     Output: t_b.id, t_b.c_1
                     Buffers: shared hit=9
 Total runtime: 4798.890 ms
(19 rows)

postgres=# select version();
                                                     version
-----------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.24 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)

從上面的執行計劃中，可以看出，優化器在對OR條件進行選擇的時候，採用的是index scan的方式，對錶t_a的掃描消耗了3.57s。

對比下，去掉LIMIT的執行計劃

postgres=# explain (analyze,buffers,verbose) select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASCpostgres-# ;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=152278.05..153528.06 rows=500004 width=4) (actual time=2409.779..2410.012 rows=2505 loops=1)
   Output: t_a.id
   Sort Key: t_a.id
   Sort Method: quicksort  Memory: 214kB
   Buffers: shared hit=3149 read=18983
   ->  Hash Left Join  (cost=27.50..91270.73 rows=500004 width=4) (actual time=0.514..2408.678 rows=2505 loops=1)
         Output: t_a.id
         Hash Cond: (t_a.id = t_b.id)
         Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
         Rows Removed by Filter: 4997495
         Buffers: shared hit=3146 read=18983
         ->  Seq Scan on public.t_a  (cost=0.00..72124.00 rows=5000000 width=8) (actual time=0.030..809.089 rows=5000000 loops=1)
               Output: t_a.id, t_a.c_1
               Buffers: shared hit=3144 read=18980
         ->  Hash  (cost=15.00..15.00 rows=1000 width=8) (actual time=0.463..0.463 rows=1000 loops=1)
               Output: t_b.id, t_b.c_1
               Buckets: 1024  Batches: 1  Memory Usage: 40kB
               Buffers: shared hit=2 read=3
               ->  Seq Scan on public.t_b  (cost=0.00..15.00 rows=1000 width=8) (actual time=0.003..0.198 rows=1000 loops=1)
                     Output: t_b.id, t_b.c_1
                     Buffers: shared hit=2 read=3
 Total runtime: 2410.330 ms
(22 rows)

從執行計劃上可以看到，這裡採用的是全表掃描方式，耗時只有809ms。

是的，全表掃描只有809ms，索引掃描耗時3.57ms，這部分時間差距是整個語句的差距，沒有看錯。

由於目前使用的是PG92版本，再測試一個PG9.6.8的版本

postgres=# explain analyze select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 4000;postgres-# ;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.75..1992.77 rows=4000 width=4) (actual time=0.024..4631.075 rows=2505 loops=1)
   ->  Merge Left Join  (cost=1.75..248880.44 rows=500004 width=4) (actual time=0.023..4630.797 rows=2505 loops=1)
         Merge Cond: (t_a.id = t_b.id)
         Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
         Rows Removed by Filter: 4997495
         ->  Index Scan using t_a_id on t_a  (cost=0.43..235843.67 rows=5000000 width=8) (actual time=0.009..3393.133 rows=5000000 loops=1)
         ->  Materialize  (cost=0.28..45.77 rows=1000 width=8) (actual time=0.009..2.465 rows=24976 loops=1)
               ->  Index Scan using t_b_id on t_b  (cost=0.28..43.27 rows=1000 width=8) (actual time=0.008..0.384 rows=1000 loops=1)
 Planning time: 0.586 ms
 Execution time: 4631.266 ms
(10 rows)

postgres=# select version();
                                                    version
----------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.8 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)

發現加上LIMIT之後，執行時間差不多，採用的仍然是索引掃描方式（本地測試，PG10不存在這個問題，一方面存在並行查詢，二是有優化，關閉並行查詢之後，採用hash join的方式，不會選擇index scan）。

問題分析

仔細分析這個SQL語句，整個語句比較簡單，但是存在一個OR語法，並且是兩個不同的表：

 select T_A.id from T_A left outer join T_B on T_A.id=T_B.id where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 4000;

查詢資料，分析執行計劃變慢的原因可能是：

由於後面存在order by id limit 4000，優化器認為只需要獲取id的前4000條元組即可，並且id是主鍵，因此會預設選擇id的索引掃描，認為這種方式是最快的，但不幸的是，由於前面是帶有OR的條件判斷，並且值和id無關，因此需要遍歷元組判斷是否滿足條件，導致掃描的檔案塊增多（每次從索引找到對應的元組是否滿足條件），從而效能下降。當然，這個語句只有在大資料量下，存在執行計劃發生變化。

如果採用LIMIT 400000條件，執行計劃如下：

postgres=# explain analyze select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC limit 400000;postgres-#
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=145439.95..146439.95 rows=400000 width=4) (actual time=1976.531..1977.094 rows=2505 loops=1)
   ->  Sort  (cost=145439.95..146689.96 rows=500004 width=4) (actual time=1976.530..1976.848 rows=2505 loops=1)
         Sort Key: t_a.id
         Sort Method: quicksort  Memory: 214kB
         ->  Hash Left Join  (cost=27.50..91271.62 rows=500004 width=4) (actual time=0.445..1975.517 rows=2505 loops=1)
               Hash Cond: (t_a.id = t_b.id)
               Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
               Rows Removed by Filter: 4997495
               ->  Seq Scan on t_a  (cost=0.00..72124.00 rows=5000000 width=8) (actual time=0.037..559.274 rows=5000000 loops=1)
               ->  Hash  (cost=15.00..15.00 rows=1000 width=8) (actual time=0.394..0.394 rows=1000 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 48kB
                     ->  Seq Scan on t_b  (cost=0.00..15.00 rows=1000 width=8) (actual time=0.004..0.153 rows=1000 loops=1)
 Planning time: 1.164 ms
 Execution time: 1977.321 ms
(14 rows)

看到執行返回結果一致，從索引掃描變回順序掃描，從merge join變成hash join，執行時間大大減少。

最後，經過細緻分析，建議業務在ORDER BY上增加一個沒有影響的排序列，如ORDER BY T_A.id，T_A.c_1

執行計劃如下：

postgres=# explain analyze select T_A.id from T_A left outer join T_B on T_A.id=T_B.id
where (T_A.c_1 = 999 or T_B.c_1=1 ) order by T_A.id ASC,T_A.c_1 ASC limit 4000;postgres-#
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------
--------------------------
 Limit  (cost=123686.35..123696.35 rows=4000 width=8) (actual time=1921.081..1921.665 rows=2505 loops=1
)
   ->  Sort  (cost=123686.35..124936.36 rows=500004 width=8) (actual time=1921.081..1921.401 rows=2505
loops=1)
         Sort Key: t_a.id, t_a.c_1
         Sort Method: quicksort  Memory: 214kB
         ->  Hash Left Join  (cost=27.50..91271.62 rows=500004 width=8) (actual time=0.402..1919.781 ro
ws=2505 loops=1)
               Hash Cond: (t_a.id = t_b.id)
               Filter: ((t_a.c_1 = 999) OR (t_b.c_1 = 1))
               Rows Removed by Filter: 4997495
               ->  Seq Scan on t_a  (cost=0.00..72124.00 rows=5000000 width=8) (actual time=0.063..565.
419 rows=5000000 loops=1)
               ->  Hash  (cost=15.00..15.00 rows=1000 width=8) (actual time=0.330..0.330 rows=1000 loop
s=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 48kB
                     ->  Seq Scan on t_b  (cost=0.00..15.00 rows=1000 width=8) (actual time=0.005..0.11
7 rows=1000 loops=1)
 Planning time: 0.330 ms
 Execution time: 1921.997 ms
(14 rows)

這樣修改之後，不影響查詢結果，執行計劃和之前保持一致，但是需要注意的是，不能再T_A表的（id，c_1）上建立組合索引，否則和之前一樣，執行計劃發生改變。

結束語

這個問題在我們的生產環境上出現了2次，後來排查，又發現了類似的問題，修改之後，效能沒有下降。並且在mysql5.5上測試，發現類似同樣的情況。

後來分析，最優的解決方式應該是採用UNION ALL的方式轉換OR條件，但是目前pg底層不能自動轉換。可喜的是，社群已經注意到這個問題，著手開發pacth進行底層效能優化，這樣效能會明顯提升

Convert join OR clauses into UNION queries

https://commitfest.postgresql.org/18/1001/

由於對資料庫底層的執行計劃不是很瞭解，因此對這個執行計劃改變過程不是很理解，希望高手能給解答下。

【postgresql案例一】一個LIMIT引發的“血案”

查詢SQL介紹

執行結果

問題分析

結束語

【postgresql案例一】一個LIMIT引發的“血案”

【案例一】移動端購物車基本功能實現，具體操作類似淘寶購車。

【PHP學習筆記】一個基於PHP的簡版後臺HTTP介面測試案例

【小家java】一個基於POI的Excel的匯入、匯出工具處理類（支援xls，xlsx格式），另有SpringMVC的匯入、匯出案例講解

【CS229筆記一】監督學習，線性回歸，LMS算法，正態方程，概率解釋和局部加權線性回歸

【數據庫】- 一個值只有0和1的字段，到底要不要建索引？

【10.21總結】一個滲透測試練習例項——發現未知的漏洞(Race condition)

【10.20總結】一個漏洞提交頁面的提權漏洞

【TOP100案例專訪】噹噹網工程師林嘉琦談雙11大促經驗及APM實踐

【docker學習一】CentOS7.5+Docker安裝及使用「安裝、檢視、pull、建立、進入映象」

【軟件測試】一個冬天，如何從手工測試轉職成為測試開發？

【NOJ1002】【演算法實驗一】【分治演算法】歸併排序

【Java必會】一個保姆與兩隻寵物的“代理”故事（動態代理）

【Appnium 小試牛刀一】根據元素係數座標，計算不同機型相對座標，進行元素點選操作

【工具集一】PL\SQL developer

【Spring筆記一】Hello，Spring

【JiYF笨男孩】一個有理想，有目標的笨男孩！

【CSS筆記一】開始學習CSS，為網頁新增樣式

【一道面試題】一個".java"原始檔中是否可以包括多個類（不是內部類）？有什麼限制？

【職場心路】一個老DBA的自白

【postgresql案例一】一個LIMIT引發的“血案”

查詢SQL介紹

執行結果

問題分析

結束語

相關推薦