1. 程式人生 > >greenplum 報錯 valid segments to start the array

greenplum 報錯 valid segments to start the array

utility non-zero fail 錯誤 異常 通過 for 信息 silent

greenplum 集群啟動報錯 Do not have enough valid segments to start the array.

前提:

集群配置完成後,有些集群配置需要優化調整一下:

設置work_mem 64MB

查看配置

gpconfig -s work_mem

Values on all segments are consistent
GUC          : work_mem
Master  value: 32MB
Segment value: 32MB

修改配置

gpconfig -c work_mem  -v 64M  

重啟集群加載配置

重新加載配置文件 postgresql.conf 和 pg_hba.conf

gpstop -u   

重啟報錯如下:

查看報錯日誌:

/home/gpadmin/gpAdminLogs/gpstart_20180904.log 

[INFO]:-----------------------------------------------------
[INFO]:-   Successful segment starts                                            = 0
[WARNING]:-Failed segment starts                                                = 32   <<<<<<<<
[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
[INFO]:-----------------------------------------------------
[INFO]:-Successfully started 0 of 32 segment instances <<<<<<<<
[INFO]:-----------------------------------------------------
[WARNING]:-Segment instance startup failures reported
[WARNING]:-Failed start 32 of 32 segment instances <<<<<<<
[WARNING]:-Review /home/gpadmin/gpAdminLogs/gpstart_20180904.log    
[INFO]:-----------------------------------------------------
[INFO]:-Commencing parallel segment instance shutdown, please wait...
[ERROR]:-gpstart error: Do not have enough valid segments to start the array.

解決辦法:

根據報錯信息,在網上搜了一下,發現這個是個很粗的報錯,參數設置過大、主機異常、配置錯誤都會報這個錯。。。。。。。

試著根據提示修改了一下master節點的配置,將修改的配置註銷,再次重啟集群,發現集群還是無法啟動。報錯如下:

20180904:18:53:20:108168 gpstart:cndh1322-6-15:gpadmin-[INFO]:-Starting Master instance in admin mode
20180904:19:03:21:108168 gpstart:cndh1322-6-15:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode
20180904:19:03:21:108168 gpstart:cndh1322-6-15:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: ‘env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /usr/local/gpdata/gpmaster/gpseg-1 -l /usr/local/gpdata/gpmaster/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 --gp_dbid=
1 --gp_num_contents_in_cluster=0 --silent-mode=true -i -M master --gp_contentid=-1 -x 34 -c gp_role=utility " start‘
rc=1, stdout=‘waiting for server to start............................................................................................................................................................
.....................................................................................................................................................................................................
.....................................................................................................................................................................................................
..................................................... stopped waiting

查看master 啟動日誌 發現報錯內容如下:

more /usr/local/gpdata/gpmaster/gpseg-1/pg_log/startup.log 

2018-09-04 11:04:07.898274 GMT,,,p127931,th1064769408,,,,0,,,seg-1,,,,,"FATAL","22023","invalid value for parameter ""work_mem"": ""64M""",,"Valid units for this parameter are ""kB"", ""MB"", and "
"GB"".",,,,,,"set_config_option","guc.c",4874,

通過以上錯誤內容可以看出是配置參數錯誤導致的!
修改配置 ,gpconfig -c work_mem -v 64MB 不能寫成 64M,服務認為配置錯誤,所以集群無法啟動,將master 節點的配置在之前排查錯誤過程中已經註銷了,為啥還不能啟動哪?然後登陸一臺segment 節點發現 segment節點的配置文件也已經被修改了,所以segment進程起不來。

最終解決;

快速啟動,進入維護模式:

gpstart  -a -m 

調整參數:

gpconfig -c work_mem  -v 64MB       

啟動集群、集群可以正常啟動;

gpstart 

故障總結:
1:使用gpconfig 修改參數會傳遞到集群每一個節點的配置文件;
2:gpconfig與集群耦合較松,輸入的錯誤也會被寫入配置;
3:修改參數前先查詢配置現有值,參照原始參數進行修改;

greenplum 報錯 valid segments to start the array