1. 程式人生 > >記一次hadoop大資料叢集生產事故

記一次hadoop大資料叢集生產事故

陸續對原有的hadoop、hbase叢集做了擴容,增加了幾個節點,中間沒有重啟過,今天早上發現一個hregionserver服務停止了,就先啟動服務,沒想到啟動之後一直有訪問資料的出錯,嘗試對整個hbase叢集進行重啟出現了下面的錯誤:

$ start-hbase.sh

master running as process 112580. Stop it first.

The authenticity of host 'szc-l0104567 (192.168.1.81)' can't be established.

RSA key fingerprint is 76:e5:12:90:de:59:e1:da:02:f3:f1:2a:9a:a6:f8:c4.

Are you sure you want to continue connecting (yes/no)? The authenticity of host 'szc-l0104566 (192.168.1.80)' can't be established.

RSA key fingerprint is cd:d1:ad:98:ca:36:b5:ec:c3:1d:be:b8:8c:ae:bc:80.

Are you sure you want to continue connecting (yes/no)? The authenticity of host 'szc-l0124500 (192.168.1

.95)' can't be established.

RSA key fingerprint is ec:3e:83:b0:bf:f0:3b:6d:7e:fa:e8:1d:7e:67:ed:27.

通過字面意思可以看出是驗證出現問題,在驗證主機名無密碼通過的時候需要通過輸入“yes”才能進行下去。

我是通過拷貝authorized_keys檔案到對方節點的.ssh目錄進行無密碼通過的,這樣只能通過針對IP的無密碼驗證,不需要輸入“yes”,但是在第一次無密碼驗證主機名的時候還需要輸入“yes”,所以在新增hregionser節點的時候,無密碼驗證就需要手動的去做第一次的主機名無密碼驗證,或者用下面的方式做無密碼驗證。

方式2:

ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

方式3:

在執行完第一步拷貝authorized_keys或者ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] 之後再執行下面的步驟

ssh -o stricthostkeychecking=no HOSTNAME