greenplum 報錯 valid segments to start the array
阿新 • • 發佈:2018-09-05
utility non-zero fail 錯誤 異常 通過 for 信息 silent greenplum 集群啟動報錯 Do not have enough valid segments to start the array.
前提:
集群配置完成後,有些集群配置需要優化調整一下:
設置work_mem 64MB
查看配置
gpconfig -s work_mem
Values on all segments are consistent
GUC : work_mem
Master value: 32MB
Segment value: 32MB
修改配置
gpconfig -c work_mem -v 64M
重啟集群加載配置
重新加載配置文件 postgresql.conf 和 pg_hba.conf
gpstop -u
重啟報錯如下:
查看報錯日誌:
/home/gpadmin/gpAdminLogs/gpstart_20180904.log [INFO]:----------------------------------------------------- [INFO]:- Successful segment starts = 0 [WARNING]:-Failed segment starts = 32 <<<<<<<< [INFO]:- Skipped segment starts (segments are marked down in configuration) = 0 [INFO]:----------------------------------------------------- [INFO]:-Successfully started 0 of 32 segment instances <<<<<<<< [INFO]:----------------------------------------------------- [WARNING]:-Segment instance startup failures reported [WARNING]:-Failed start 32 of 32 segment instances <<<<<<< [WARNING]:-Review /home/gpadmin/gpAdminLogs/gpstart_20180904.log [INFO]:----------------------------------------------------- [INFO]:-Commencing parallel segment instance shutdown, please wait... [ERROR]:-gpstart error: Do not have enough valid segments to start the array.
解決辦法:
根據報錯信息,在網上搜了一下,發現這個是個很粗的報錯,參數設置過大、主機異常、配置錯誤都會報這個錯。。。。。。。
試著根據提示修改了一下master節點的配置,將修改的配置註銷,再次重啟集群,發現集群還是無法啟動。報錯如下:
20180904:18:53:20:108168 gpstart:cndh1322-6-15:gpadmin-[INFO]:-Starting Master instance in admin mode 20180904:19:03:21:108168 gpstart:cndh1322-6-15:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode 20180904:19:03:21:108168 gpstart:cndh1322-6-15:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1 Command was: ‘env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /usr/local/gpdata/gpmaster/gpseg-1 -l /usr/local/gpdata/gpmaster/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 --gp_dbid= 1 --gp_num_contents_in_cluster=0 --silent-mode=true -i -M master --gp_contentid=-1 -x 34 -c gp_role=utility " start‘ rc=1, stdout=‘waiting for server to start............................................................................................................................................................ ..................................................................................................................................................................................................... ..................................................................................................................................................................................................... ..................................................... stopped waiting
查看master 啟動日誌 發現報錯內容如下:
more /usr/local/gpdata/gpmaster/gpseg-1/pg_log/startup.log
2018-09-04 11:04:07.898274 GMT,,,p127931,th1064769408,,,,0,,,seg-1,,,,,"FATAL","22023","invalid value for parameter ""work_mem"": ""64M""",,"Valid units for this parameter are ""kB"", ""MB"", and "
"GB"".",,,,,,"set_config_option","guc.c",4874,
通過以上錯誤內容可以看出是配置參數錯誤導致的!
修改配置 ,gpconfig -c work_mem -v 64MB 不能寫成 64M,服務認為配置錯誤,所以集群無法啟動,將master 節點的配置在之前排查錯誤過程中已經註銷了,為啥還不能啟動哪?然後登陸一臺segment 節點發現 segment節點的配置文件也已經被修改了,所以segment進程起不來。
最終解決;
快速啟動,進入維護模式:
gpstart -a -m
調整參數:
gpconfig -c work_mem -v 64MB
啟動集群、集群可以正常啟動;
gpstart
故障總結:
1:使用gpconfig 修改參數會傳遞到集群每一個節點的配置文件;
2:gpconfig與集群耦合較松,輸入的錯誤也會被寫入配置;
3:修改參數前先查詢配置現有值,參照原始參數進行修改;
greenplum 報錯 valid segments to start the array