1. 程式人生 > >PostgreSQL 百億資料 秒級響應 正則及模糊查詢

PostgreSQL 百億資料 秒級響應 正則及模糊查詢

原文: https://yq.aliyun.com/articles/7444?spm=5176.blog7549.yqblogcon1.6.2wcXO2

摘要: 正則匹配和模糊匹配通常是搜尋引擎的特長,但是如果你使用的是 PostgreSQL 資料庫照樣能實現,並且效能不賴,加上分散式方案 (譬如 plproxy, pg_shard, fdw shard, pg-xc, pg-xl, greenplum),處理百億以上資料量的正則匹配和模糊匹配效果槓槓的,.

正則匹配和模糊匹配通常是搜尋引擎的特長,但是如果你使用的是 PostgreSQL 資料庫照樣能實現,並且效能不賴,加上分散式方案 (譬如 plproxy, pg_shard, fdw shard, pg-xc, pg-xl, greenplum),處理百億以上資料量的正則匹配和模糊匹配效果槓槓的,同時還不失資料庫固有的功能,一舉多得。

物聯網中有大量的資料,除了數字資料,還有字串類的資料,例如條形碼,車牌,手機號,郵箱,姓名等等。
假設使用者需要在大量的感測資料中進行模糊檢索,甚至規則表示式匹配,有什麼高效的方法呢?
這種場景還挺多,例如市面上發現了一批藥品可能有問題,需要對藥品條碼進行規則表示式查詢,找出複合條件的藥品流向。
又比如在偵查行動時,線索的檢索,如使用者提供的殘缺的電話號碼,郵箱,車牌,IP地址,QQ號碼,微訊號碼等等。
根據這些資訊加上時間的疊加,模糊匹配和關聯,最終找出罪犯。
可以看出,模糊匹配,正則表示式匹配,和人臉拼圖有點類似,需求非常的迫切。

首先對應用場景進行一下分類,以及現有技術下能使用的優化手段。
.1. 帶字首的模糊查詢,例如 like 'ABC%',在PG中也可以寫成 ~ '^ABC'
可以使用btree索引優化,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。

.2. 帶字尾的模糊查詢,例如 like '%ABC',在PG中也可以寫成 ~ 'ABC$'
可以使用reverse函式btree索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。

.3. 不帶字首和字尾的模糊查詢,例如 like '%AB_C%',在PG中也可以寫成 ~ 'AB.C'
可以使用pg_trgm的gin索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。

.4. 正則表示式查詢,例如 ~ '[\d]+def1.?[a|b|0|8]{1,3}'
可以使用pg_trgm的gin索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。

PostgreSQL pg_trgm外掛自從9.1開始支援模糊查詢使用索引,從9.3開始支援規則表示式查詢使用索引,大大提高了PostgreSQL在刑偵方面的能力。
程式碼見
https://github.com/postgrespro/pg_trgm_pro

pg_trgm外掛的原理,將字串前加2個空格,後加1個空格,組成一個新的字串,並將這個新的字串按照每3個相鄰的字元拆分成多個token。
當使用規則表示式或者模糊查詢進行匹配時,會檢索出他們的近似度,再進行filter。
GIN索引的圖例:
26721394885162976
從btree檢索到匹配的token時,指向對應的list, 從list中儲存的ctid找到對應的記錄。
因為一個字串會拆成很多個token,所以沒插入一條記錄,會更新多條索引,這也是GIN索引需要fastupdate的原因。
正則匹配是怎麼做到的呢?
詳見 https://raw.githubusercontent.com/postgrespro/pg_trgm_pro/master/trgm_regexp.c
實際上它是將正則表示式轉換成了NFA格式,然後掃描多個TOKEN,進行bit and|or匹配。
正則組合如果轉換出來的的bit and|or很多的話,就需要大量的recheck,效能也不能好到哪裡去。

下面針對以上四種場景,例項講解如何優化。

.1. 帶字首的模糊查詢,例如 like 'ABC%',在PG中也可以寫成 ~ '^ABC'
可以使用btree索引優化,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。
例子,1000萬隨機產生的MD5資料的前8個字元。

postgres=# create table tb(info text);  
CREATE TABLE  
postgres=# insert into tb select substring(md5(random()::text),1,8) from generate_series(1,10000000);  
INSERT 0 10000000  
postgres=# create index idx_tb on tb(info);  
CREATE INDEX  
postgres=# select * from tb limit 1;  
   info     
----------  
 376821ab  
(1 row)  
postgres=# explain select * from tb where info ~ '^376821' limit 10;  
                                  QUERY PLAN                                     
-------------------------------------------------------------------------------  
 Limit  (cost=0.43..0.52 rows=10 width=9)  
   ->  Index Only Scan using idx_tb on tb  (cost=0.43..8.46 rows=1000 width=9)  
         Index Cond: ((info >= '376821'::text) AND (info < '376822'::text))  
         Filter: (info ~ '^376821'::text)  
(4 rows)  
postgres=# select * from tb where info ~ '^376821' limit 10;  
   info     
----------  
 376821ab  
(1 row)  
Time: 0.536 ms  
postgres=# set enable_indexscan=off;  
SET  
Time: 1.344 ms  
postgres=# set enable_bitmapscan=off;  
SET  
Time: 0.158 ms  
postgres=# explain select * from tb where info ~ '^376821' limit 10;  
                           QUERY PLAN                             
----------------------------------------------------------------  
 Limit  (cost=0.00..1790.55 rows=10 width=9)  
   ->  Seq Scan on tb  (cost=0.00..179055.00 rows=1000 width=9)  
         Filter: (info ~ '^376821'::text)  
(3 rows)  
Time: 0.505 ms  

帶字首的模糊查詢,不使用索引需要5483毫秒。
帶字首的模糊查詢,使用索引只需要0.5毫秒。

postgres=# select * from tb where info ~ '^376821' limit 10;  
   info     
----------  
 376821ab  
(1 row)  
Time: 5483.655 ms  

.2. 帶字尾的模糊查詢,例如 like '%ABC',在PG中也可以寫成 ~ 'ABC$'
可以使用reverse函式btree索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。

postgres=# create index idx_tb1 on tb(reverse(info));  
CREATE INDEX  
postgres=# explain select * from tb where reverse(info) ~ '^ba128' limit 10;  
                                         QUERY PLAN                                           
--------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..28.19 rows=10 width=9)  
   ->  Index Scan using idx_tb1 on tb  (cost=0.43..138778.43 rows=50000 width=9)  
         Index Cond: ((reverse(info) >= 'ba128'::text) AND (reverse(info) < 'ba129'::text))  
         Filter: (reverse(info) ~ '^ba128'::text)  
(4 rows)  

postgres=# select * from tb where reverse(info) ~ '^ba128' limit 10;  
   info     
----------  
 220821ab  
 671821ab  
 305821ab  
 e65821ab  
 536821ab  
 376821ab  
 668821ab  
 4d8821ab  
 26c821ab  
(9 rows)  
Time: 0.506 ms  

帶字尾的模糊查詢,使用索引只需要0.5毫秒。

.3. 不帶字首和字尾的模糊查詢,例如 like '%AB_C%',在PG中也可以寫成 ~ 'AB.C'
可以使用pg_trgm的gin索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。

postgres=# create extension pg_trgm;  
postgres=# explain select * from tb where info ~ '5821a';  
                                 QUERY PLAN                                   
----------------------------------------------------------------------------  
 Bitmap Heap Scan on tb  (cost=103.75..3677.71 rows=1000 width=9)  
   Recheck Cond: (info ~ '5821a'::text)  
   ->  Bitmap Index Scan on idx_tb_2  (cost=0.00..103.50 rows=1000 width=0)  
         Index Cond: (info ~ '5821a'::text)  
(4 rows)  
Time: 0.647 ms  

postgres=# select * from tb where info ~ '5821a';  
   info     
----------  
 5821a8a3  
 945821af  
 45821a74  
 9fe5821a  
 5821a7e0  
 5821af2a  
 1075821a  
 e5821ac9  
 d265821a  
 45f5821a  
 df5821a4  
 de5821af  
 71c5821a  
 375821a3  
 fc5821af  
 5c5821ad  
 e65821ab  
 5821adde  
 c35821a6  
 5821a642  
 305821ab  
 5821a1c8  
 75821a5c  
 ce95821a  
 a65821ad  
(25 rows)  
Time: 3.808 ms  

前後模糊查詢,使用索引只需要3.8毫秒。

.4. 正則表示式查詢,例如 ~ '[\d]+def1.?[a|b|0|8]{1,3}'
可以使用pg_trgm的gin索引,或者拆列用多列索引疊加bit and或bit or進行優化(只適合固定長度的端字串,例如char(8))。

前後模糊查詢,使用索引只需要108毫秒。

postgres=# select * from tb where info ~ 'e65[\d]{2}a[b]{1,2}8' limit 10;  
   info     
----------  
 4e6567ab  
 1e6530ab  
 e6500ab8  
 ae6583ab  
 e6564ab7  
 5e6532ab  
 e6526abf  
 e6560ab6  
(8 rows)  
Time: 108.577 ms  

時間主要花費在排他上面。
檢索了14794行,remove了14793行。大量的時間花費在無用功上,但是比全表掃還是好很多。

postgres=# explain (verbose,analyze,buffers,costs,timing) select * from tb where info ~ 'e65[\d]{2}a[b]{1,2}8' limit 10;  
                                                            QUERY PLAN                                                              
----------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=511.75..547.49 rows=10 width=9) (actual time=89.934..120.567 rows=1 loops=1)  
   Output: info  
   Buffers: shared hit=13054  
   ->  Bitmap Heap Scan on public.tb  (cost=511.75..4085.71 rows=1000 width=9) (actual time=89.930..120.562 rows=1 loops=1)  
         Output: info  
         Recheck Cond: (tb.info ~ 'e65[\d]{2}a[b]{1,2}8'::text)  
         Rows Removed by Index Recheck: 14793  
         Heap Blocks: exact=12929  
         Buffers: shared hit=13054  
         ->  Bitmap Index Scan on idx_tb_2  (cost=0.00..511.50 rows=1000 width=0) (actual time=67.589..67.589 rows=14794 loops=1)  
               Index Cond: (tb.info ~ 'e65[\d]{2}a[b]{1,2}8'::text)  
               Buffers: shared hit=125  
 Planning time: 0.493 ms  
 Execution time: 120.618 ms  
(14 rows)  
Time: 124.693 ms  

優化:
使用gin索引後,需要考慮效能問題,因為info欄位被打散成了多個char(3)的token,從而涉及到非常多的索引條目,如果有非常高併發的插入,最好把gin_pending_list_limit設大,來提高插入效率,降低實時合併索引帶來的RT升高。
使用了fastupdate後,會在每次vacuum表時,自動將pengding的資訊合併到GIN索引中。
還有一點,查詢不會有合併的動作,對於沒有合併的GIN資訊是使用遍歷的方式搜尋的。

壓測高併發的效能:

create table tbl(id serial8, crt_time timestamp, sensorid int, sensorloc point, info text) with (autovacuum_enabled=on, autovacuum_vacuum_threshold=0.000001,autovacuum_vacuum_cost_delay=0);  
CREATE INDEX trgm_idx ON tbl USING GIN (info gin_trgm_ops) with (fastupdate='on', gin_pending_list_limit='6553600');  
alter sequence tbl_id_seq cache 10000;  

修改配置,讓資料庫的autovacuum快速迭代合併gin。

vi $PGDATA/postgresql.conf  
autovacuum_naptime=1s  
maintenance_work_mem=1GB  
autovacuum_work_mem=1GB  
autovacuum = on  
autovacuum_max_workers = 3  
log_autovacuum_min_duration = 0  
autovacuum_vacuum_cost_delay=0  

$ pg_ctl reload  

建立一個測試函式,用來產生隨機的測試資料。

postgres=# create or replace function f() returns void as $$  
  insert into tbl (crt_time,sensorid,info) values ( clock_timestamp(),trunc(random()*500000),substring(md5(random()::text),1,8) );  
$$ language sql strict;  

vi test.sql  
select f();  

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 48 -j 48 -T 10000  

progress: 50.0 s, 52800.9 tps, lat 0.453 ms stddev 0.390  
progress: 51.0 s, 52775.8 tps, lat 0.453 ms stddev 0.398  
progress: 52.0 s, 53173.2 tps, lat 0.449 ms stddev 0.371  
progress: 53.0 s, 53010.0 tps, lat 0.451 ms stddev 0.390  
progress: 54.0 s, 53360.9 tps, lat 0.448 ms stddev 0.365  
progress: 55.0 s, 53285.0 tps, lat 0.449 ms stddev 0.362  
progress: 56.0 s, 53662.1 tps, lat 0.445 ms stddev 0.368  
progress: 57.0 s, 53283.8 tps, lat 0.448 ms stddev 0.385  
progress: 58.0 s, 53703.4 tps, lat 0.445 ms stddev 0.355  
progress: 59.0 s, 53818.7 tps, lat 0.444 ms stddev 0.344  
progress: 60.0 s, 53889.2 tps, lat 0.443 ms stddev 0.361  
progress: 61.0 s, 53613.8 tps, lat 0.446 ms stddev 0.355  
progress: 62.0 s, 53339.9 tps, lat 0.448 ms stddev 0.392  
progress: 63.0 s, 54014.9 tps, lat 0.442 ms stddev 0.346  
progress: 64.0 s, 53112.1 tps, lat 0.450 ms stddev 0.374  
progress: 65.0 s, 53706.1 tps, lat 0.445 ms stddev 0.367  
progress: 66.0 s, 53720.9 tps, lat 0.445 ms stddev 0.353  
progress: 67.0 s, 52858.1 tps, lat 0.452 ms stddev 0.415  
progress: 68.0 s, 53218.9 tps, lat 0.449 ms stddev 0.387  
progress: 69.0 s, 53403.0 tps, lat 0.447 ms stddev 0.377  
progress: 70.0 s, 53179.9 tps, lat 0.449 ms stddev 0.377  
progress: 71.0 s, 53232.4 tps, lat 0.449 ms stddev 0.373  
progress: 72.0 s, 53011.7 tps, lat 0.451 ms stddev 0.386  
progress: 73.0 s, 52685.1 tps, lat 0.454 ms stddev 0.384  
progress: 74.0 s, 52937.8 tps, lat 0.452 ms stddev 0.377  

按照這個速度,一天能支援超過40億資料入庫。

接下來對比一下字串分離的例子,這個例子適用於字串長度固定,並且很小的場景,如果字串長度不固定,這種方法沒用。
適用splict的方法,測試資料不盡人意,所以還是用pg_trgm比較靠譜。

postgres=# create table t_split(id int, crt_time timestamp, sensorid int, sensorloc point, info text, c1 char(1), c2 char(1), c3 char(1), c4 char(1), c5 char(1), c6 char(1), c7 char(1), c8 char(1));  
CREATE TABLE  
Time: 2.123 ms  

postgres=# insert into t_split(id,crt_time,sensorid,info,c1,c2,c3,c4,c5,c6,c7,c8) select id,ct,sen,info,substring(info,1,1),substring(info,2,1),substring(info,3,1),substring(info,4,1),substring(info,5,1),substring(info,6,1),substring(info,7,1),substring(info,8,1) from (select id, clock_timestamp() ct, trunc(random()*500000) sen, substring(md5(random()::text), 1, 8) info from generate_series(1,10000000) t(id)) t;  
INSERT 0 10000000  
Time: 81829.274 ms  

postgres=# create index idx1 on t_split (c1);  
postgres=# create index idx2 on t_split (c2);  
postgres=# create index idx3 on t_split (c3);  
postgres=# create index idx4 on t_split (c4);  
postgres=# create index idx5 on t_split (c5);  
postgres=# create index idx6 on t_split (c6);  
postgres=# create index idx7 on t_split (c7);  
postgres=# create index idx8 on t_split (c8);  
postgres=# create index idx9 on t_split using gin (info gin_trgm_ops);  

postgres=# select * from t_split limit 1;  
 id |          crt_time          | sensorid | sensorloc |   info   | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8   
----+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----  
  1 | 2016-03-02 09:58:03.990639 |   161958 || 33eed779 | 3  | 3  | e  | e  | d  | 7  | 7  | 9  
(1 row)  

postgres=# select * from t_split where info ~ '^3[\d]?eed[\d]?79$' limit 10;  
 id |          crt_time          | sensorid | sensorloc |   info   | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8   
----+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----  
  1 | 2016-03-02 09:58:03.990639 |   161958 || 33eed779 | 3  | 3  | e  | e  | d  | 7  | 7  | 9  
(1 row)  
Time: 133.041 ms  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_split where info ~ '^3[\d]?eed[\d]?79$' limit 10;  
                                                            QUERY PLAN                                                              
----------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=575.75..612.78 rows=10 width=57) (actual time=92.406..129.838 rows=1 loops=1)  
   Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8  
   Buffers: shared hit=13798  
   ->  Bitmap Heap Scan on public.t_split  (cost=575.75..4278.56 rows=1000 width=57) (actual time=92.403..129.833 rows=1 loops=1)  
         Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8  
         Recheck Cond: (t_split.info ~ '^3[\d]?eed[\d]?79$'::text)  
         Rows Removed by Index Recheck: 14690  
         Heap Blocks: exact=13669  
         Buffers: shared hit=13798  
         ->  Bitmap Index Scan on idx9  (cost=0.00..575.50 rows=1000 width=0) (actual time=89.576..89.576 rows=14691 loops=1)  
               Index Cond: (t_split.info ~ '^3[\d]?eed[\d]?79$'::text)  
               Buffers: shared hit=129  
 Planning time: 0.385 ms  
 Execution time: 129.883 ms  
(14 rows)  

Time: 130.678 ms  


postgres=# select * from t_split where c1='3' and c3='e' and c4='e' and c5='d' and c7='7' and c8='9' and c2 between '0' and '9' and c6 between '0' and '9' limit 10;  
 id |          crt_time          | sensorid | sensorloc |   info   | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8   
----+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----  
  1 | 2016-03-02 09:58:03.990639 |   161958 || 33eed779 | 3  | 3  | e  | e  | d  | 7  | 7  | 9  
(1 row)  

Time: 337.367 ms  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_split where c1='3' and c3='e' and c4='e' and c5='d' and c7='7' and c8='9' and c2 between '0' and '9' and c6 between '0' and '9' limit 10;  
                                                                                                                 QUERY PLAN                                                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=33582.31..41499.35 rows=1 width=57) (actual time=339.230..344.675 rows=1 loops=1)  
   Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8  
   Buffers: shared hit=7581  
   ->  Bitmap Heap Scan on public.t_split  (cost=33582.31..41499.35 rows=1 width=57) (actual time=339.228..344.673 rows=1 loops=1)  
         Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8  
         Recheck Cond: ((t_split.c3 = 'e'::bpchar) AND (t_split.c8 = '9'::bpchar) AND (t_split.c5 = 'd'::bpchar))  
         Filter: ((t_split.c2 >= '0'::bpchar) AND (t_split.c2 <= '9'::bpchar) AND (t_split.c6 >= '0'::bpchar) AND (t_split.c6 <= '9'::bpchar) AND (t_split.c1 = '3'::bpchar) AND (t_split.c4 = 'e'::bpchar) AND (t_split.c7 = '7'::bpchar))  
         Rows Removed by Filter: 2480  
         Heap Blocks: exact=2450  
         Buffers: shared hit=7581  
         ->  BitmapAnd  (cost=33582.31..33582.31 rows=2224 width=0) (actual time=338.512..338.512 rows=0 loops=1)  
               Buffers: shared hit=5131  
               ->  Bitmap Index Scan on idx3  (cost=0.00..11016.93 rows=596333 width=0) (actual time=104.418..104.418 rows=624930 loops=1)  
                     Index Cond: (t_split.c3 = 'e'::bpchar)  
                     Buffers: shared hit=1711  
               ->  Bitmap Index Scan on idx8  (cost=0.00..11245.44 rows=608667 width=0) (actual time=100.185..100.185 rows=625739 loops=1)  
                     Index Cond: (t_split.c8 = '9'::bpchar)  
                     Buffers: shared hit=1712  
               ->  Bitmap Index Scan on idx5  (cost=0.00..11319.44 rows=612667 width=0) (actual time=99.480..99.480 rows=624269 loops=1)  
                     Index Cond: (t_split.c5 = 'd'::bpchar)  
                     Buffers: shared hit=1708  
 Planning time: 0.262 ms  
 Execution time: 344.731 ms  
(23 rows)  

Time: 346.424 ms  

postgres=# select * from t_split where info ~ '^33.+7.+9$' limit 10;  
   id   |          crt_time          | sensorid | sensorloc |   info   | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8   
--------+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----  
      1 | 2016-03-02 09:58:03.990639 |   161958 || 33eed779 | 3  | 3  | e  | e  | d  | 7  | 7  | 9  
  24412 | 2016-03-02 09:58:04.186359 |   251599 || 33f07429 | 3  | 3  | f  | 0  | 7  | 4  | 2  | 9  
  24989 | 2016-03-02 09:58:04.191112 |   214569 || 334587d9 | 3  | 3  | 4  | 5  | 8  | 7  | d  | 9  
  50100 | 2016-03-02 09:58:04.398499 |   409819 || 33beb7b9 | 3  | 3  | b  | e  | b  | 7  | b  | 9  
  92623 | 2016-03-02 09:58:04.745372 |   280100 || 3373e719 | 3  | 3  | 7  | 3  | e  | 7  | 1  | 9  
 106054 | 2016-03-02 09:58:04.855627 |   155192 || 33c575c9 | 3  | 3  | c  | 5  | 7  | 5  | c  | 9  
 107070 | 2016-03-02 09:58:04.863827 |   464325 || 337dd729 | 3  | 3  | 7  | d  | d  | 7  | 2  | 9  
 135152 | 2016-03-02 09:58:05.088217 |   240500 || 336271d9 | 3  | 3  | 6  | 2  | 7  | 1  | d  | 9  
 156425 | 2016-03-02 09:58:05.25805  |   218202 || 333e7289 | 3  | 3  | 3  | e  | 7  | 2  | 8  | 9  
 170210 | 2016-03-02 09:58:05.368371 |   132530 || 33a8d789 | 3  | 3  | a  | 8  | d  | 7  | 8  | 9  
(10 rows)  

Time: 20.431 ms  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_split where info ~ '^33.+7.+9$' limit 10;  
                                                           QUERY PLAN                                                              
---------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=43.75..80.78 rows=10 width=57) (actual time=19.573..21.212 rows=10 loops=1)  
   Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8  
   Buffers: shared hit=566  
   ->  Bitmap Heap Scan on public.t_split  (cost=43.75..3746.56 rows=1000 width=57) (actual time=19.571..21.206 rows=10 loops=1)  
         Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8  
         Recheck Cond: (t_split.info ~ '^33.+7.+9$'::text)  
         Rows Removed by Index Recheck: 647  
         Heap Blocks: exact=552  
         Buffers: shared hit=566  
         ->  Bitmap Index Scan on idx9  (cost=0.00..43.50 rows=1000 width=0) (actual time=11.712..11.712 rows=39436 loops=1)  
               Index Cond: (t_split.info ~ '^33.+7.+9$'::text)  
               Buffers: shared hit=14  
 Planning time: 0.301 ms  
 Execution time: 21.255 ms  
(14 rows)  

Time: 21.995 ms  


postgres=# select * from t_split where c1='3' and c2='3' and c8='9' and (c4='7' or c5='7' or c6='7') limit 10;  
   id   |          crt_time          | sensorid | sensorloc |   info   | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8   
--------+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----  
      1 | 2016-03-02 09:58:03.990639 |   161958 || 33eed779 | 3  | 3  | e  | e  | d  | 7  | 7  | 9  
  24412 | 2016-03-02 09:58:04.186359 |   251599 || 33f07429 | 3  | 3  | f  | 0  | 7  | 4  | 2  | 9  
  24989 | 2016-03-02 09:58:04.191112 |   214569 || 334587d9 | 3  | 3  | 4  | 5  | 8  | 7  | d  | 9  
  50100 | 2016-03-02 09:58:04.398499 |   409819 || 33beb7b9 | 3  | 3  | b  | e  | b  | 7  | b  | 9  
  92623 | 2016-03-02 09:58:04.745372 |   280100 || 3373e719 | 3  | 3  | 7  | 3  | e  | 7  | 1  | 9  
 106054 | 2016-03-02 09:58:04.855627 |   155192 || 33c575c9 | 3  | 3  | c  | 5  | 7  | 5  | c  | 9  
 107070 | 2016-03-02 09:58:04.863827 |   464325 || 337dd729 | 3  | 3  | 7  | d  | d  | 7  | 2  | 9  
 135152 | 2016-03-02 09:58:05.088217 |   240500 || 336271d9 | 3  | 3  | 6  | 2  | 7  | 1  | d  | 9  
 156425 | 2016-03-02 09:58:05.25805  |   218202 || 333e7289 | 3  | 3  | 3  | e  | 7  | 2  | 8  | 9  
 170210 | 2016-03-02 09:58:05.368371 |   132530 || 33a8d789 | 3  | 3  | a  | 8  | d  | 7  | 8  | 9  
(10 rows)  

Time: 37.739 ms  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_split where c1='3' and c2='3' and c8='9' and (c4='7' or c5='7' or c6='7') limit 10;  
                                                                                               QUERY PLAN                                                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.00..8135.78 rows=10 width=57) (actual time=0.017..35.532 rows=10 loops=1)  
   Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8  
   Buffers: shared hit=1755  
   ->  Seq Scan on public.t_split  (cost=0.00..353093.00 rows=434 width=57) (actual time=0.015..35.526 rows=10 loops=1)  
         Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8  
         Filter: ((t_split.c1 = '3'::bpchar) AND (t_split.c2 = '3'::bpchar) AND (t_split.c8 = '9'::bpchar) AND ((t_split.c4 = '7'::bpchar) OR (t_split.c5 = '7'::bpchar) OR (t_split.c6 = '7'::bpchar)))  
         Rows Removed by Filter: 170200  
         Buffers: shared hit=1755  
 Planning time: 0.210 ms  
 Execution time: 35.572 ms  
(10 rows)  

Time: 36.260 ms  

postgres=# select * from t_split where info ~ '^3.?[b-g]+ed[\d]+79' order by info <-> '^3.?[b-g]+ed[\d]+79' limit 10;  
   id    |          crt_time          | sensorid | sensorloc |   info   | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8   
---------+----------------------------+----------+-----------+----------+----+----+----+----+----+----+----+----  
       1 | 2016-03-02 09:58:03.990639 |   161958 || 33eed779 | 3  | 3  | e  | e  | d  | 7  | 7  | 9  
 1308724 | 2016-03-02 09:58:14.590901 |   458822 || 3fed9479 | 3  | f  | e  | d  | 9  | 4  | 7  | 9  
 2866024 | 2016-03-02 09:58:27.20105  |   106467 || 3fed2279 | 3  | f  | e  | d  | 2  | 2  | 7  | 9  
 4826729 | 2016-03-02 09:58:42.907431 |   228023 || 3ded9879 | 3  | d  | e  | d  | 9  | 8  | 7  | 9  
 6113373 | 2016-03-02 09:58:53.211146 |   499702 || 36fed479 | 3  | 6  | f  | e  | d  | 4  | 7  | 9  
 1768237 | 2016-03-02 09:58:18.310069 |   345027 || 30fed079 | 3  | 0  | f  | e  | d  | 0  | 7  | 9  
 1472324 | 2016-03-02 09:58:15.913629 |   413283 || 3eed5798 | 3  | e  | e  | d  | 5  | 7  | 9  | 8  
 8319056 | 2016-03-02 09:59:10.902137 |   336740 || 3ded7790 | 3  | d  | e  | d  | 7  | 7  | 9  | 0  
 8576573 | 2016-03-02 09:59:12.962923 |   130223 || 3eed5793 | 3  | e  | e  | d  | 5  | 7  | 9  | 3  
(9 rows)  

Time: 268.661 ms  

postgres=# explain (analyze,verbose,timing,buffers,costs) select * from t_split where info ~ '^3.?[b-g]+ed[\d]+79' order by info <-> '^3.?[b-g]+ed[\d]+79' limit 10;  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=4302.66..4302.69 rows=10 width=57) (actual time=269.214..269.217 rows=9 loops=1)  
   Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8, ((info <-> '^3.?[b-g]+ed[\d]+79'::text))  
   Buffers: shared hit=52606  
   ->  Sort  (cost=4302.66..4305.16 rows=1000 width=57) (actual time=269.212..269.212 rows=9 loops=1)  
         Output: id, crt_time, sensorid, sensorloc, info, c1, c2, c3, c4, c5, c6, c7, c8, ((info <-> '^3.?[b-g]+ed[\d]+79'::text))  
         Sort Key: ((t_split.info <-> '^3.?[b-g]+ed[\d]+79'::text))  
         Sort Method: quicksort  Memory: 26kB  
         Buffers: shared hit=52606  
         ->  Bitmap Heap Scan on public.t_split  (cost=575.75..4281.06 rows=1000 width=57) (actual time=100.771..269.180 rows=9 loo