Cygwin搭建hadoop開發環境
這篇文章不具體講一些細的概念東西,如要了解cygwin和hadoop可以去參考其他的文章,該文闡述從下載cygwin到搭建hadoop環境,裡面的圖片部門來自網上資料,因為本人當時部署時沒有儲存自己執行圖片,但是步驟是一樣的。
對於hadoop是個龐大的生態系統,裡面光一些技術要點多達幾十種,但所謂千里之行,始於足下,對於我這樣的技術小白來說,如果想弄一個完全分散式的hadoop環境無異天方夜譚,首先我不懂linux,再說也沒那麼多機器搭建完全分散式環境。但是cygwin的出現可以讓我不用在我本機裝linux環境,cygwin就是一款在windows環境下模擬linux。接下來開始進入cygwin的下載,這個可以在官網下載
一:安裝Cygwin
點選exe檔案
點選下一步
這裡的區別是直接是線上下載然後安裝,其他的是下載到本地但不安裝,推薦預設的就是第一個,點選下一步
這個預設是放在C盤,也可以放其他地方,點選下一步
這個是下載內容,另外指定一個盤放,點選下一步
預設選擇第一個,點選下一步
這裡選擇下載的伺服器地址,用預設的也可以但是非常慢,有可能半天一天具體視網路情況,推薦一個http://mirrors.163.com這是我用的最快的,裡面沒有的話點選Add按鈕新增進去,然後選擇它,點選下一步
這裡選擇具體的元件具體點選前面的skip就會出現目前的版本,代表你已經選中,我下了有:
Devel裡面的binutils、gcc-core、gcc-g++、gcc-mingw-core、gcc-ming-g++、gdb
Net的openssh和openssl元件,用於hadoop需要的ssh訪問,操作方法同上
Base的sed,用於eclipse連線hadoop開發
還可以下載一些vim等,這個根據自己的需要,還有就算這次沒有選中,安裝以後還是可以增加元件包的,最好選擇第一次下載的網址,選擇別的我沒試過不知道有沒有問題,完成這些後點擊下一步就開始下載了,我用那個163的地址下載時間不超過10分鐘。
安裝完成以後桌面生成一個快捷方式,點選圖示
執行cygwin的ssh-host-config
*** Info: Generating missing SSH host keys
ssh-keygen: generating new host keys: RSA1 RSA DSA ECDSA ED25519
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file
*** Info: StrictModes is set to 'yes' by default.
*** Info: This is the recommended setting, but it requires that the POSIX
*** Info: permissions of the user's home directory, the user's .ssh
*** Info: directory, and the user's ssh key files are tight so that
*** Info: only the user has write permissions.
*** Info: On the other hand, StrictModes don't work well with default
*** Info: Windows permissions of a home directory mounted with the
*** Info: 'noacl' option, and they don't work at all if the home
*** Info: directory is on a FAT or FAT32 partition.
*** Query: Should StrictModes be used? (yes/no) no
*** Info: Privilege separation is set to 'sandbox' by default since
*** Info: OpenSSH 6.1. This is unsupported by Cygwin and has to be set
*** Info: to 'yes' or 'no'.
*** Info: However, using privilege separation requires a non-privileged account
*** Info: called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/README.privsep.
*** Query: Should privilege separation be used? (yes/no) no
*** Info: Updating /etc/sshd_config file
*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: []
*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users -- a capability
*** Info: sshd requires. You need to have or to create a privileged
*** Info: account. This script will help you do so.
*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
*** Info: or later. On these systems, it's not possible to use the LocalSystem
*** Info: account for services that can change the user id without an
*** Info: explicit password (such as passwordless logins [e.g. public key
*** Info: authentication] via sshd).
*** Info: If you want to enable that functionality, it's required to create
*** Info: a new account with special privileges (unless a similar account
*** Info: already exists). This account is then used to run these special
*** Info: servers.
*** Info: Note that creating a new user requires that the current account
*** Info: have Administrator privileges itself.
*** Info: No privileged account could be found.
*** Info: This script plans to use 'cyg_server'.
*** Info: 'cyg_server' will only be used by registered services.
*** Query: Do you want to use a different name? (yes/no) no
*** Query: Create new privileged user account 'cyg_server'? (yes/no) yes
*** Info: Please enter a password for new user cyg_server. Please be sure
*** Info: that this password matches the password rules given on your system.
*** Info: Entering no password will exit the configuration.
*** Query: Please enter the password:
*** Query: Reenter:
*** Info: User 'cyg_server' has been created with password 'cyg_server'.
*** Info: If you change the password, please remember also to change the
*** Info: password for the installed services which use (or will soon use)
*** Info: the 'cyg_server' account.
*** Info: Also keep in mind that the user 'cyg_server' needs read permissions
*** Info: on all users' relevant files for the services running as 'cyg_server'.
*** Info: In particular, for the sshd server all users' .ssh/authorized_keys
*** Info: files must have appropriate permissions to allow public key
*** Info: authentication. (Re-)running ssh-user-config for each user will set
*** Info: these permissions correctly. [Similar restrictions apply, for
*** Info: instance, for .rhosts files if the rshd server is running, etc].
*** Info: The sshd service has been installed under the 'cyg_server'
*** Info: account. To start the service now, call `net start sshd' or
*** Info: `cygrunsrv -S sshd'. Otherwise, it will start automatically
*** Info: after the next reboot.
*** Info: Host configuration finished. Have fun!
上面會提示建立一個使用者cyg_server,並提示你輸入該使用者的密碼,我們這裡輸入和使用者名稱一樣的密碼cyg_server,後面會用到。
請注意cyg_server使用者的建立是強制的,沒有這個使用者即使sshd裝好也不行的,後面使用的時候會出現Connection closed的錯誤,本人就在這裡栽了跟頭,浪費了好多時間。
好了到服務裡看一下,會多出來一個CYGWIN sshd,可以把它設定成手動啟動,然後我們啟動它
回到cygwin環境,執行 ssh localhost命令。
在第一步詢問中輸入yes,在第二步要求輸入密碼時,輸入使用者密碼,密碼上面已經設定了
在cygwin中輸入ssh-keygen,一路回車即可
然後在cygwin下依次執行如下命令:
cd ~/.ssh
cp id_rsa.pub authorized_keys
完成後一路exit退出cygwin環境,再開啟cygwin環境,執行ssh localhost,發現如下圖不需要密碼即可進入,就代表成功了。
二:部署hadoop
我這裡用的是hadoop第一代產品,就是簡單的namenode,datanode,jobstracker,taskstracker,secondenamenode。我這裡提供一個下載好了的0.20.2版本,具體的也可以去apache hadooop官網下載
解壓後把hadoop包放入Cygwin的目錄下
配置一些資訊,首先jdk是必須的,這裡要提前說一個重點,一般我們的jdk是放C盤的Program Files裡面這裡就會設定到空格,這樣後面就會有錯誤。網上說加引號,建立軟連結,什麼反斜槓等等,我都試了讀不好使,在此我建議乾脆一點直接把以前jdk拿出Program Files資料夾單獨一個資料夾,這裡就不能再用空格的名字了。
jdk的環境變數配置我就不說了,在path中新增,這個根據自己安裝目錄
;C:\cygwin64\bin;C:\cygwin64\usr\sbin;
環境變數中-->新建變數-->CYGWIN-->對應的值為:ntsec tty
修改hadoop的一些配置檔案:
hadoop-env.sh,把前面的#號去掉
export JAVA_HOME=/java/jdk1.7.0_45
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
再次開啟cygwin圖示,這時切換到hadoop命令列
輸入hadoop namenode -format 這是格式化hdfs系統,再啟動所有start-all.sh
我這裡啟動出現了問題,namenode沒起來,只有mr起來了,訪問地址也訪問不了,這裡有個日誌可以看,在hadoop/logs裡面有專門的namenode日誌
裡面提示,我的9000埠已經被使用,我開啟超級管理員介面執行netstat -aon|findstr "9000" 發現有個ppap的程序佔用了埠9000,開啟工作管理員直接找到這個程序,原來是pptv的一個程序,然後結束這個程序。當然我遇到的這種情況大家不一定會遇到,但如果出現某個節點起不來第一時間看日誌,出現端口占用一定提前結束程序。
再次啟動
這次namenode起來了,以此在瀏覽器輸入訪問地址http://localhost:50070 http://localhost:50030可以分別訪問hdfs和mr
到這裡一個簡單的hadoop環境搭建完畢,對於自己這樣的小白弄完這一步,可以給自己鼓個掌!