1. 程式人生 > >YDB與spark SQL在百億級資料上的效能對比測試

YDB與spark SQL在百億級資料上的效能對比測試

        按照時間逆序排序可以說是很多日誌系統的硬指標。在延雲YDB系統中,我們改變了傳統的暴力排序方式,通過索引技術,可以超快對資料進行單列排序,不需要全表暴力掃描,這個技術我們稱之為blockSort,目前支援tlong,tdouble,tint,tfloat四種資料型別。

        由於blockSort是藉助搜尋的索引來實現的,所以,採用blockSort的排序,不需要暴力掃描,效能有大幅度的提升。

        blockSort的排序,並非是預計算的方式,可以全表進行排序,也可以基於任意的過濾篩選條件進行過濾排序。

為此,我們針對spark sql與YDB在排序效能上做了一個比較性的測試

機器配置

一共虛擬出來,1臺master,4臺slave


其中slave機器的配置如下

4臺slave是執行在兩臺24core,128G的物理機上的,我們在物理機上做的虛擬機器,配置如下

注意,硬碟為SSD磁碟,不是普通的磁碟。


測試資料100億條,一共有兩列資料

tradetime:tlong型別

        待測試的排序欄位,高緯值(幾乎沒重複值),格式為為yyyyMMddHHmmss格式的隨機時間,通過new Date(System.currentTimeMillis()-(long)(Math.random()*10000000000000l))來生成。

amtint:int型別

       為0~1000之間的整數,用於驗證篩選條件與排序結合的效能,本身不用於排序。

測試結果(時間單位為秒)

amtint

列篩選

篩選後

條數

排序

方式

ydb

blocksort

spark

無篩選

100億

降序

3.3

1118

升序

3.6

1085

100 TO 900

80億

降序

1.5

1093

升序

1.3

1070

100 TO 600

50億

降序

1.53

1104

升序

1.38

867

100 TO 200

10億

降序

7.00

1115

升序

1.11

1131

100 TO 110

1億

降序

2.1

1160

升序

3.44

1114

100 TO 101

0.1億

降序

10.67

1089

升序

7.0

1110


測試過程

一、偽造資料


 hadoop fs -mkdir -p /data/example/demo/blocksort_time/
 hadoop fs -ls /data/example/demo/blocksort_time/
 hadoop fs -rm -r /data/example/demo/blocksort_time/
 hadoop fs -mkdir -p /data/example/demo/blocksort_time/
 hadoop fs -ls /data/example/demo/blocksort_time/
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_1.txt &
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_2.txt  &
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_3.txt  &
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_4.txt  &
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_5.txt  &
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_6.txt &
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_7.txt &
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_8.txt &
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_9.txt &
 hadoop jar ./lib/ydb-1.1.5-pg.jar cn.net.ycloud.ydb.server.reader.kafka.KafkaMakeBlockSortDataTime 1000000000 /data/example/demo/blocksort_time/2000_time_10.txt &

二、建立相關資料表


 --###建立文字表####
 drop table blocksort_time_txt;
CREATE external  table blocksort_time_txt(
     tradetime bigint,
     amtint int
)
row format delimited fields terminated by ','
stored as
    INPUTFORMAT 'cn.net.ycloud.ydb.handle.YdbCombineInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location '/data/example/demo/blocksort_time'
TBLPROPERTIES (
    'ydb.combine.input.format.raw.format'='org.apache.hadoop.mapred.TextInputFormat'
);

 drop table blocksort_time_commontxt;
CREATE external  table blocksort_time_commontxt(
     tradetime bigint,
     amtint int
)
row format delimited fields terminated by ','
location '/data/example/demo/blocksort_time'

;
--##建立YDB表##
/*ydb.pushdown('->')*/     
    create table blocksort_time_ydb(
     tradetime tlong,
     amtint int
    )
/*('<-')pushdown.ydb*/;
     

三、匯入資料

----匯入ydb資料

insert into table  ydbpartion
 select 'blocksort_time_ydb', 'ydb_default_partion', '',
    YROW(
    'tradetime',tradetime,
    'amtint',amtint
   )
from blocksort_time_txt;


----資料預覽

/*ydb.pushdown('->')*/
select * from blocksort_time_ydb where ydbpartion='ydb_default_partion' limit 20
/*('<-')pushdown.ydb*/


----總資料量-一百億


四、效能測試

(一)全表

逆序

--使用ydb的blocksort

/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:true' and
    ydbkv='blocksort.limit:30'
     order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;


使用spark文字

select tradetime, amtint from blocksort_time_commontxt   order by tradetime desc limit 30


升序

/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:false' and
    ydbkv='blocksort.limit:30'
     order by tradetime limit 30
/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt   order by tradetime limit 30


(二)檢索後匹配80%的資料 (篩選條件為 amtint like '([100 TO 900] )')

看命中條數,不排序

 /*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' andamtint like '([100 TO 900] )'
/*('<-')pushdown.ydb*/

;



降序排序

/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 900] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:true' and
    ydbkv='blocksort.limit:30'
     order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt  where amtint>='100' and amtint <='900' order by tradetime desc limit 30




升序排序

/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where  amtint like '([100 TO 900] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:false' and
    ydbkv='blocksort.limit:30'
     order by tradetime limit 30
/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt where amtint>='100' and amtint <='900' order by tradetime  limit 30;


(三)檢索後匹配50%的資料 (篩選條件為 amtint like '([100 TO 600] )')



 /*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' and amtint like '([100 TO 600] )'
/*('<-')pushdown.ydb*/
;


/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 600] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:true' and
    ydbkv='blocksort.limit:30'
     order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt  where amtint>='100' and amtint <='600' order by tradetime desc limit 30;


/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where  amtint like '([100 TO 600] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:false' and
    ydbkv='blocksort.limit:30'
     order by tradetime limit 30
/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt  where amtint>='100' and amtint <='600' order by tradetime  limit 30;


(三)檢索後匹配10%的資料 (篩選條件為 amtint like '([100 TO 200] )')


 /*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' and amtint like '([100 TO 200] )'
/*('<-')pushdown.ydb*/
;


/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 200] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:true' and
    ydbkv='blocksort.limit:30'
     order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt  where amtint>='100' and amtint <='200' order by tradetime desc limit 30;


/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where  amtint like '([100 TO 200] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:false' and
    ydbkv='blocksort.limit:30'
     order by tradetime limit 30
/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt  where amtint>='100' and amtint <='200' order by tradetime  limit 30;


(三)檢索後匹配1%的資料 (篩選條件為 amtint like '([100 TO 110] )')


 /*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' and amtint like '([100 TO 110] )'
/*('<-')pushdown.ydb*/

;



/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 110] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:true' and
    ydbkv='blocksort.limit:30'
     order by tradetime desc limit 30

/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt  where amtint>='100' and amtint <='110' order by tradetime desc limit 30;


/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where  amtint like '([100 TO 110] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:false' and
    ydbkv='blocksort.limit:30'
     order by tradetime limit 30
/*('<-')pushdown.ydb*/;



select tradetime, amtint from blocksort_time_commontxt  where amtint>='100' and amtint <='110' order by tradetime  limit 30;


(三)檢索後匹配0.1%的資料 (篩選條件為 amtint like '([100 TO 101] )')

 /*ydb.pushdown('->')*/
select count(*) from blocksort_time_ydb where ydbpartion='ydb_default_partion' and amtint like '([100 TO 101] )'
/*('<-')pushdown.ydb*/
;


/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where amtint like '([100 TO 101] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:true' and
    ydbkv='blocksort.limit:30'
     order by tradetime desc limit 30
/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt  where amtint>='100' and amtint <='101' order by tradetime desc limit 30;


/*ydb.pushdown('->')*/
select tradetime, amtint from blocksort_time_ydb where  amtint like '([100 TO 101] )' and
    ydbkv='blocksort.field:tradetime'  and
    ydbkv='blocksort.desc:false' and
    ydbkv='blocksort.limit:30'
     order by tradetime limit 30
/*('<-')pushdown.ydb*/;


select tradetime, amtint from blocksort_time_commontxt  where amtint>='100' and amtint <='101' order by tradetime  limit 30;