YDB與spark SQL在百億級資料上的效能對比測試
按照時間逆序排序可以說是很多日誌系統的硬指標。在延雲YDB系統中,我們改變了傳統的暴力排序方式,通過索引技術,可以超快對資料進行單列排序,不需要全表暴力掃描,這個技術我們稱之為blockSort,目前支援tlong,tdouble,tint,tfloat四種資料型別。
由於blockSort是藉助搜尋的索引來實現的,所以,採用blockSort的排序,不需要暴力掃描,效能有大幅度的提升。
blockSort的排序,並非是預計算的方式,可以全表進行排序,也可以基於任意的過濾篩選條件進行過濾排序。
為此,我們針對spark sql與YDB在排序效能上做了一個比較性的測試
機器配置
一共虛擬出來,1臺master,4臺slave
其中slave機器的配置如下
4臺slave是執行在兩臺24core,128G的物理機上的,我們在物理機上做的虛擬機器,配置如下
注意,硬碟為SSD磁碟,不是普通的磁碟。
測試資料(100億條,一共有兩列資料)
tradetime:tlong型別
待測試的排序欄位,高緯值(幾乎沒重複值),格式為為yyyyMMddHHmmss格式的隨機時間,通過new Date(System.currentTimeMillis()-(long)(Math.random()*10000000000000l))來生成。
為0~1000之間的整數,用於驗證篩選條件與排序結合的效能,本身不用於排序。
測試結果(時間單位為秒)
amtint 列篩選 |
篩選後 條數 |
排序 方式 |
ydb blocksort |
spark |
無篩選 |
100億 |
降序 |
3.3 |
1118 |
升序 |
3.6 |
1085 |
||
100 TO 900 |
80億 |
降序 |
1.5 |
1093 |
升序 |
1.3 |
1070 |
||
100 TO 600 |
50億 |
降序 |
1.53 |
1104 |
升序 |
1.38 |
867 |
||
100 TO 200 |
10億 |
降序 |
7.00 |
1115 |
升序 |
1.11 |
1131 |
||
100 TO 110 |
1億 |
降序 |
2.1 |
1160 |
升序 |
3.44 |
1114 |
||
100 TO 101 |
0.1億 |
降序 |
10.67 |
1089 |
升序 |
7.0 |
1110 |
測試過程
一、偽造資料
hadoop fs -mkdir -p /data/example/demo/blocksort_time/
hadoop fs -ls /data/example/demo/blocksort_time/
hadoop fs -rm -r /data/example/demo/blocksort_time/
hadoop fs -mkdir -p /data/example/demo/blocksort_time/
hadoop fs -ls /data/example/demo/blocksort_time/
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_1.txt &
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_2.txt &
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_3.txt &
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_4.txt &
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_5.txt &
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_6.txt &
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_7.txt &
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_8.txt &
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_9.txt &
hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_10.txt &
二、建立相關資料表
--###建立文字表####
drop table blocksort_time_txt;
CREATE external table blocksort_time_txt(
tradetime bigint,
amtint int
)
row format delimited fields terminated by ','
stored as
INPUTFORMAT 'cn.net.ycloud.ydb.handle.YdbCombineInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location '/data/example/demo/blocksort_time'
TBLPROPERTIES (
'ydb.combine.input.format.raw.format'='org.apache.hadoop.mapred.TextInputFormat'
);
drop table blocksort_time_commontxt;
CREATE external table blocksort_time_commontxt(
tradetime bigint,
amtint int
)
row format delimited fields terminated by ','
location '/data/example/demo/blocksort_time'
;
--##建立YDB表##
/*ydb.pushdown('->')*/
create table blocksort_time_ydb(
tradetime tlong,
amtint int
)
/*('<-')pushdown.ydb*/;
----匯入ydb資料
insert into table ydbpartion
select 'blocksort_time_ydb', 'ydb_default_partion', '',
YROW(
'tradetime',tradetime,
'amtint',amtint
)
from blocksort_time_txt;
----資料預覽
/*ydb.pushdown('->')*/
select * from blocksort_time_ydb where ydbpartion='ydb_default_partion' limit 20
/*('<-')pushdown.ydb*/
----總資料量-一百億
四、效能測試
(一)全表
逆序
--使用ydb的blocksort
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:true' and
ydbkv='blocksort.limit:30'
order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;
使用spark文字
select tradetime, amtint from blocksort_time_commontxt order by tradetime desc limit 30
升序
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:false' and
ydbkv='blocksort.limit:30'
order by tradetime limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt order by tradetime limit 30
(二)檢索後匹配80%的資料 (篩選條件為 amtint like '([100 TO 900] )')
看命中條數,不排序
/*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' andamtint like '([100 TO 900] )'
/*('<-')pushdown.ydb*/
;
降序排序
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 900] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:true' and
ydbkv='blocksort.limit:30'
order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt where amtint>='100' and amtint <='900' order by tradetime desc limit 30
升序排序
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 900] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:false' and
ydbkv='blocksort.limit:30'
order by tradetime limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt where amtint>='100' and amtint <='900' order by tradetime limit 30;
(三)檢索後匹配50%的資料 (篩選條件為 amtint like '([100 TO 600] )')
/*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' and amtint like '([100 TO 600] )'
/*('<-')pushdown.ydb*/
;
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 600] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:true' and
ydbkv='blocksort.limit:30'
order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt where amtint>='100' and amtint <='600' order by tradetime desc limit 30;
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 600] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:false' and
ydbkv='blocksort.limit:30'
order by tradetime limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt where amtint>='100' and amtint <='600' order by tradetime limit 30;
(三)檢索後匹配10%的資料 (篩選條件為 amtint like '([100 TO 200] )')
/*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' and amtint like '([100 TO 200] )'
/*('<-')pushdown.ydb*/
;
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 200] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:true' and
ydbkv='blocksort.limit:30'
order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt
where amtint>='100' and amtint <='200' order by tradetime desc limit 30;
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 200] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:false' and
ydbkv='blocksort.limit:30'
order by tradetime limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt
where amtint>='100' and amtint <='200' order by tradetime limit 30;
(三)檢索後匹配1%的資料 (篩選條件為 amtint like '([100 TO 110] )')
/*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' and amtint like '([100 TO 110] )'
/*('<-')pushdown.ydb*/
;
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 110] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:true' and
ydbkv='blocksort.limit:30'
order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt
where amtint>='100' and amtint <='110' order by tradetime desc limit 30;
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 110] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:false' and
ydbkv='blocksort.limit:30'
order by tradetime limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt where amtint>='100' and amtint <='110' order by tradetime limit 30;
(三)檢索後匹配0.1%的資料 (篩選條件為 amtint like '([100 TO 101] )')
/*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' and amtint like '([100 TO 101] )'
/*('<-')pushdown.ydb*/
;
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 101] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:true' and
ydbkv='blocksort.limit:30'
order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt
where amtint>='100' and amtint <='101' order by tradetime desc limit 30;
/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 101] )' and
ydbkv='blocksort.field:tradetime' and
ydbkv='blocksort.desc:false' and
ydbkv='blocksort.limit:30'
order by tradetime limit 30
/*('<-')pushdown.ydb*/;
select tradetime, amtint from blocksort_time_commontxt where amtint>='100' and amtint <='101' order by tradetime limit 30;