PostgreSQL 生成空間熱力圖

PostgreSQL · 發表 2018-10-05 17:41:15

摘要：標籤 PostgreSQL , 熱力圖 , 空間切割 , 平行計算 , parallel safe 背景結合空間資料，計算基於地理位置資訊的熱力圖，在空間資料視覺化場景中是一個非常常見的需求。結合流計算，可以實現實時的熱力圖計算。結合平行計算，可以高效率的...

背景

結合空間資料，計算基於地理位置資訊的熱力圖，在空間資料視覺化場景中是一個非常常見的需求。

結合流計算，可以實現實時的熱力圖計算。

結合平行計算，可以高效率的對海量資料進行熱力圖計算。

計算熱力圖中bucket的方法

ofollow,noindex" target="_blank">https://www.postgresql.org/docs/devel/static/functions-math.html

width_bucket(operand dp, b1 dp, b2 dp, count int)
int
return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2;
returns 0 or count+1 for an input outside the range
width_bucket(5.35, 0.024, 10.06, 5)
3

width_bucket(operand numeric, b1 numeric, b2 numeric, count int)
int
return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2;
returns 0 or count+1 for an input outside the range
width_bucket(5.35, 0.024, 10.06, 5)
3

例如

postgres=# select width_bucket(1,1,10,10);
 width_bucket
--------------
1
(1 row)

postgres=# select width_bucket(0,1,10,10);
 width_bucket
--------------
0
(1 row)

postgres=# select width_bucket(10,1,10,10);
 width_bucket
--------------
11
(1 row)

postgres=# select width_bucket(9.9,1,10,10);
 width_bucket
--------------
10
(1 row)

width_bucket(
p1 -- 輸入值
p2 -- 邊界值（最小，包含）
p3 -- 邊界值（最大，不包含）
p4 -- 切割份數
)

當小於最小邊界值時，返回0
當大於等於最大邊界值時，返回p4+1

例如x軸的邊界是1,10000，y軸的邊界是1,10000。

x,y兩個方向分別切割為50個bucket，一共2500個bucket，求一個點落在哪個bucket：

width_bucket(pos[0], 1, 10001, 50),-- x軸落在哪列bucket
width_bucket(pos[1], 1, 10001, 50),-- y軸落在哪列bucket

例子

1、建表

create table tbl_pos(
id int,
info text,-- 資訊
val float8,-- 取值
pos point-- 位置
);

2、寫入1億個點

vi test.sql
insert into tbl_pos values ( random()*100000, md5(random()::text), random()*1000, point((random()*10000::int), (random()*10000::int)) );



pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 2000000

3、熱力圖計算

強制平行計算

postgres=# set min_parallel_table_scan_size =0;
SET
postgres=# set min_parallel_index_scan_size =0;
SET
postgres=# set parallel_setup_cost =0;
SET
postgres=# set parallel_tuple_cost =0;
SET
postgres=# set max_parallel_workers_per_gather =28;
SET
postgres=# alter table tbl_pos set (parallel_workers =28);
ALTER TABLE

熱力圖計算SQL，效率還不錯：

select
width_bucket(pos[0], 0, 10001, 50),-- x軸落在哪列bucket
width_bucket(pos[1], 0, 10001, 50),-- y軸落在哪列bucket
avg(val),
min(val),
max(val),
stddev(val),
count(*)
from tbl_pos
group by 1,2;


QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate(cost=1252812.00..1252928.00 rows=200 width=48) (actual time=2632.324..2672.909 rows=2500 loops=1)
Group Key: (width_bucket(pos[0], '0'::double precision, '10001'::double precision, 50)), (width_bucket(pos[1], '0'::double precision, '10001'::double precision, 50))
->Sort(cost=1252812.00..1252826.00 rows=5600 width=96) (actual time=2632.290..2648.544 rows=72500 loops=1)
Sort Key: (width_bucket(pos[0], '0'::double precision, '10001'::double precision, 50)), (width_bucket(pos[1], '0'::double precision, '10001'::double precision, 50))
Sort Method: external mergeDisk: 9824kB
->Gather(cost=1252460.37..1252463.37 rows=5600 width=96) (actual time=2532.132..2564.905 rows=72500 loops=1)
Workers Planned: 28
Workers Launched: 28
->Partial HashAggregate(cost=1252460.37..1252463.37 rows=200 width=96) (actual time=2522.428..2523.559 rows=2500 loops=29)
Group Key: width_bucket(pos[0], '0'::double precision, '10001'::double precision, 50), width_bucket(pos[1], '0'::double precision, '10001'::double precision, 50)
->Parallel Seq Scan on tbl_pos(cost=0.00..1189951.79 rows=3571919 width=16) (actual time=0.030..1302.462 rows=3448276 loops=29)
 Planning time: 0.154 ms
 Execution time: 2676.288 ms
(13 rows)

樣本

postgres=# select
width_bucket(pos[0], 0, 10001, 10),-- x軸落在哪列bucket
width_bucket(pos[1], 0, 10001, 10),-- y軸落在哪列bucket
avg(val),
min(val),
max(val),
stddev(val),
count(*)
from tbl_pos
group by 1,2;
 width_bucket | width_bucket |avg|min|max|stddev|count
--------------+--------------+------------------+----------------------+------------------+------------------+---------
1 |1 | 499.638668709335 | 0.000637955963611603 | 999.998900108039 | 288.562996477433 | 1002686
1 |2 | 499.772206697849 |0.00113388523459435 | 999.999452847987 | 288.505295714968 | 1000891
1 |3 |500.44455454312 |0.00135181471705437 | 999.997937120497 |288.45102360668 |999911
1 |4 | 500.234164866407 |0.00214902684092522 | 999.999100342393 | 288.707167816157 | 1000473
1 |5 | 499.793710464008 | 0.000125262886285782 | 999.999575316906 | 288.672382834812 |999036
1 |6 | 500.366854944369 |0.00212574377655983 | 999.999585561454 | 288.558891852102 |998866
1 |7 | 499.825623783545 | 0.000547617673873901 | 999.999700114131 | 288.582317248892 | 1000902
1 |8 | 499.393569281915 |0.00330200418829918 | 999.999083112925 | 288.561094278074 | 1000193
1 |9 | 499.713056248083 |0.00243959948420525 | 999.999618623406 | 288.709997455837 | 1000017
1 |10 | 500.312448499828 |0.00238511711359024 | 999.999850522727 | 288.865560266629 |998469
2 |1 | 499.848655048635 |0.00146497040987015 | 999.999508261681 | 288.639402346948 | 1000917
2 |2 | 500.084846394446 |0.0005294568836689 | 999.999178107828 | 288.704696698903 |997594
2 |3 |499.99258346144 |0.00163912773132324 |999.99839020893 | 288.507497234907 | 1001310
2 |4 | 499.817295558208 |0.00184541568160057 | 999.997940845788 | 288.767308817191 | 1000607
2 |5 |499.87314410326 |0.00135786831378937 | 999.999302905053 | 288.593077096809 |998588
2 |6 | 499.825467223571 | 0.000847037881612778 | 999.998526647687 | 288.789326889728 | 1000426
2 |7 |499.50907809986 |7.4971467256546e-05 |999.9989871867 | 288.535982009648 | 1001179
2 |8 | 499.850422744194 | 0.000966247171163559 | 999.999921303242 | 288.516738657089 | 1000745
2 |9 | 500.110417044655 | 0.000320374965667725 | 999.999660998583 |288.77420504779 |999978
2 |10 | 500.135548004555 | 0.000233296304941177 | 999.999852851033 | 288.520964728395 |998363
........

取出資料，即可渲染。

結合流計算，可以在FEED資料寫入時，實時的進行計算。而不是QUERY發起時計算。參考本文末尾的文件。

小結

PostgreSQL非常適合於時空資料的分析，包括本文提到的熱力圖分析。

使用平行計算，即查即算，1億個點，差不多耗時2.7秒。

如果使用流式計算，寫入時即算，查詢時查的是結果，效率更高。

參考

1、求bucket值

https://www.postgresql.org/docs/devel/static/functions-math.html

width_bucket(operand dp, b1 dp, b2 dp, count int)
int
return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2;
returns 0 or count+1 for an input outside the range
width_bucket(5.35, 0.024, 10.06, 5)
3

width_bucket(operand numeric, b1 numeric, b2 numeric, count int)
int
return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2;
returns 0 or count+1 for an input outside the range
width_bucket(5.35, 0.024, 10.06, 5)
3

2、求geometry物件的x,y,z值

http://postgis.net/docs/manual-2.4/reference.html

ST_X — Return the X coordinate of the point, or NULL if not available. Input must be a point.
ST_XMax — Returns X maxima of a bounding box 2d or 3d or a geometry.
ST_XMin — Returns X minima of a bounding box 2d or 3d or a geometry.
ST_Y — Return the Y coordinate of the point, or NULL if not available. Input must be a point.
ST_YMax — Returns Y maxima of a bounding box 2d or 3d or a geometry.
ST_YMin — Returns Y minima of a bounding box 2d or 3d or a geometry.
ST_Z — Return the Z coordinate of the point, or NULL if not available. Input must be a point.
ST_ZMax — Returns Z minima of a bounding box 2d or 3d or a geometry.
ST_Zmflag — Returns ZM (dimension semantic) flag of the geometries as a small int. Values are: 0=2d, 1=3dm, 2=3dz, 3=4d.
ST_ZMin — Returns Z minima of a bounding box 2d or 3d or a geometry.

3、求point物件的x,y值