1. 程式人生 > >對HAWQ進行TPC-DS測試

對HAWQ進行TPC-DS測試

hawk跑TPC-DS

建立資料夾,把TPC工具放入

cd /tpcds/v2.1.0/tools/

./dsdgen -DIR /opt/3t_data -SCALE 3000-parallel 20 -child 20 -TERMINATE N

[[email protected] /]# mkdir tpcds_3t

[[email protected] /]# ls

bin  boot  cgroups_test  dev  etc  hadoop  home  lib  lib64  lost+found  media  mnt  opt  proc  root  sbin  selinux  srv  sys  tmp  tpcds_3t  usr  var

[[email protected] /]# cd tpcds_3t

[[email protected] tpcds_3t]# ls

DSTools.zip

[[email protected] tpcds_3t]#

解壓工具包,進入tools編譯

[[email protected] tpcds_3t]# unzip DSTools.zip

-----

[[email protected] tpcds_3t]# ls

DSTools.zip  TPCDSVersion1.3.1

[[email protected] tpcds_3t]# cd TPCDSVersion1.3.1/

[[email protected] TPCDSVersion1.3.1]# ls

answer_sets  dbgen2  query_templates  query_variants  specification  tools

[[email protected] TPCDSVersion1.3.1]# cd tools

[[email protected] tools]# make

[[email protected] tools]# ./dsqgen –help

多執行緒生成資料,後臺執行

nohup ./dsdgen -DIR /opt/3t_data -SCALE 3000 -parallel 30 -child 1  -TERMINATE N &

檢視後臺程序

Jobs –l

修改query_template下query1-99模板,在行尾加define _END = "";

#!/bin/bash

COUNTER=1

while [ $COUNTER -lt 100 ]

do

    echo $COUNTER

    echo "define _END = \"\";">>query$COUNTER.tpl

    COUNTER=`expr $COUNTER + 1`

done

生成查詢語句

./dsqgen -output_dir /opt/tpc_3t_queries/ -input /tpcds_3t/TPCDSVersion1.3.1/query_templates/templates.lst -scale 3000 -dialect ansi -directory /tpcds_3t/TPCDSVersion1.3.1/query_templates -rngseed 05092045000

[[email protected] tools]# su gpadmin

[[email protected] tools]$ psql

psql (8.2.15)

Type "help" for help.

gpadmin=#

gpadmin=# create database tpcds_3t;

CREATE DATABASE

gpadmin=# \l

                 List of databases

   Name    |  Owner  | Encoding | Access privileges

-----------+---------+----------+-------------------

 gpadmin   | gpadmin | UTF8     |

 postgres  | gpadmin | UTF8     |

 template0 | gpadmin | UTF8     |

 template1 | gpadmin | UTF8     |

 tpcds     | gpadmin | UTF8     |

 tpch      | gpadmin | UTF8     |

(6 rows)

gpadmin=# \c tpcds

You are now connected to database "tpcds" as user "gpadmin".

tpcds=#

生成表

tpcds=# \d

                       List of relations

 Schema |         Name          | Type  |  Owner  |   Storage  

--------+-----------------------+-------+---------+-------------

 public | customer_address      | table | gpadmin | append only

 public | customer_demographics | table | gpadmin | append only

 public | date_dim              | table | gpadmin | append only

 public | dbgen_version         | table | gpadmin | append only

 public | income_band           | table | gpadmin | append only

 public | inventory             | table | gpadmin | append only

 public | item                  | table | gpadmin | append only

 public | promotion             | table | gpadmin | append only

 public | reason                | table | gpadmin | append only

 public | ship_mode             | table | gpadmin | append only

 public | store_returns         | table | gpadmin | append only

 public | store_sales           | table | gpadmin | append only

 public | time_dim              | table | gpadmin | append only

 public | warehouse             | table | gpadmin | append only

 public | web_page              | table | gpadmin | append only

 public | web_site              | table | gpadmin | append only

(16 rows)

拷貝yaml檔案到資料路徑

[[email protected] ds_data]# pwd

/opt/ds_data

[[email protected] ds_data]# ls –s

批量修改yaml檔案(資料庫名、埠號,資料路徑,資料檔名等)

[[email protected] ds_data]# sed -i 's/5432/5430/g' *.yaml

載入表

[[email protected] ds_data]# gpload -f call_center.yaml

2016-05-06 16:14:39|INFO|gpload session started 2016-05-06 16:14:39

2016-05-06 16:14:39|INFO|setting schema 'public' for table 'call_center'

2016-05-06 16:14:39|INFO|started gpfdist -p 8081 -P 8082 -f "data1g/call_center.dat" -t 30

2016-05-06 16:14:46|INFO|running time: 6.75 seconds

2016-05-06 16:14:46|INFO|rows Inserted          = 6

2016-05-06 16:14:46|INFO|rows Updated           = 0

2016-05-06 16:14:46|INFO|data formatting errors = 0

2016-05-06 16:14:46|INFO|gpload succeeded

[[email protected] ds_data]#

批量載入指令碼

#!/bin/bash

for f in *.yaml

do

    gpload -f $f

done

載入後查看錶大小

select relname, 

       pg_size_pretty(pg_relation_size(relname)) 

from pg_stat_user_tables 

where schemaname = 'public' 

order by pg_relation_size(relname) desc; 

生成99條sql的日誌檔案

COUNTER=1

while [ $COUNTER -lt 100 ]

do

    echo $COUNTER

    touch query$COUNTER.log

    chown gpadmin query$COUNTER.log

    COUNTER=`expr $COUNTER + 1`

done

在每一條sql之前加入\timing

執行sql批處理

time

for f in query*

do

    log=${f}".log"

    echo $log

    psql -d tpcds -f $f > $log;

done

[[email protected] query_templates]$ ./sql.sh

合併測試結果

[[email protected] query_templates]# cat query*.log > 1g_result.log

執行完成後清除快取

free –m

echo 3 > /proc/sys/vm/drop_caches

表載入,載入機發送速率約120MB,接收速率約50MB(這樣至少要8個小時,為什麼不切割加?