1. 程式人生 > >Cygwin搭建hadoop開發環境

Cygwin搭建hadoop開發環境

這篇文章不具體講一些細的概念東西,如要了解cygwin和hadoop可以去參考其他的文章,該文闡述從下載cygwin到搭建hadoop環境,裡面的圖片部門來自網上資料,因為本人當時部署時沒有儲存自己執行圖片,但是步驟是一樣的。

對於hadoop是個龐大的生態系統,裡面光一些技術要點多達幾十種,但所謂千里之行,始於足下,對於我這樣的技術小白來說,如果想弄一個完全分散式的hadoop環境無異天方夜譚,首先我不懂linux,再說也沒那麼多機器搭建完全分散式環境。但是cygwin的出現可以讓我不用在我本機裝linux環境,cygwin就是一款在windows環境下模擬linux。接下來開始進入cygwin的下載,這個可以在官網下載

一:安裝Cygwin

點選exe檔案

點選下一步


這裡的區別是直接是線上下載然後安裝,其他的是下載到本地但不安裝,推薦預設的就是第一個,點選下一步

這個預設是放在C盤,也可以放其他地方,點選下一步


這個是下載內容,另外指定一個盤放,點選下一步

預設選擇第一個,點選下一步

這裡選擇下載的伺服器地址,用預設的也可以但是非常慢,有可能半天一天具體視網路情況,推薦一個http://mirrors.163.com這是我用的最快的,裡面沒有的話點選Add按鈕新增進去,然後選擇它,點選下一步


這裡選擇具體的元件具體點選前面的skip就會出現目前的版本,代表你已經選中,我下了有:

Devel裡面的binutils、gcc-core、gcc-g++、gcc-mingw-core、gcc-ming-g++、gdb

Net的openssh和openssl元件,用於hadoop需要的ssh訪問,操作方法同上

Base的sed,用於eclipse連線hadoop開發

還可以下載一些vim等,這個根據自己的需要,還有就算這次沒有選中,安裝以後還是可以增加元件包的,最好選擇第一次下載的網址,選擇別的我沒試過不知道有沒有問題,完成這些後點擊下一步就開始下載了,我用那個163的地址下載時間不超過10分鐘。

安裝完成以後桌面生成一個快捷方式,點選圖示

執行cygwin的ssh-host-config

然後按照提示一步一步來

*** Info: Generating missing SSH host keys
ssh-keygen: generating new host keys: RSA1 RSA DSA ECDSA ED25519
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file

*** Info: StrictModes is set to 'yes' by default.
*** Info: This is the recommended setting, but it requires that the POSIX
*** Info: permissions of the user's home directory, the user's .ssh
*** Info: directory, and the user's ssh key files are tight so that
*** Info: only the user has write permissions.
*** Info: On the other hand, StrictModes don't work well with default
*** Info: Windows permissions of a home directory mounted with the
*** Info: 'noacl' option, and they don't work at all if the home
*** Info: directory is on a FAT or FAT32 partition.
*** Query: Should StrictModes be used? (yes/no) no

*** Info: Privilege separation is set to 'sandbox' by default since
*** Info: OpenSSH 6.1.  This is unsupported by Cygwin and has to be set
*** Info: to 'yes' or 'no'.
*** Info: However, using privilege separation requires a non-privileged account
*** Info: called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/README.privsep.
*** Query: Should privilege separation be used? (yes/no) no
*** Info: Updating /etc/sshd_config file

*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: []
*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users -- a capability
*** Info: sshd requires.  You need to have or to create a privileged
*** Info: account.  This script will help you do so.

*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
*** Info: or later.  On these systems, it's not possible to use the LocalSystem
*** Info: account for services that can change the user id without an
*** Info: explicit password (such as passwordless logins [e.g. public key
*** Info: authentication] via sshd).

*** Info: If you want to enable that functionality, it's required to create
*** Info: a new account with special privileges (unless a similar account
*** Info: already exists). This account is then used to run these special
*** Info: servers.

*** Info: Note that creating a new user requires that the current account
*** Info: have Administrator privileges itself.

*** Info: No privileged account could be found.

*** Info: This script plans to use 'cyg_server'.
*** Info: 'cyg_server' will only be used by registered services.
*** Query: Do you want to use a different name? (yes/no) no
*** Query: Create new privileged user account 'cyg_server'? (yes/no) yes
*** Info: Please enter a password for new user cyg_server.  Please be sure
*** Info: that this password matches the password rules given on your system.
*** Info: Entering no password will exit the configuration.
*** Query: Please enter the password:
*** Query: Reenter:

*** Info: User 'cyg_server' has been created with password 'cyg_server'.
*** Info: If you change the password, please remember also to change the
*** Info: password for the installed services which use (or will soon use)
*** Info: the 'cyg_server' account.

*** Info: Also keep in mind that the user 'cyg_server' needs read permissions
*** Info: on all users' relevant files for the services running as 'cyg_server'.
*** Info: In particular, for the sshd server all users' .ssh/authorized_keys
*** Info: files must have appropriate permissions to allow public key
*** Info: authentication. (Re-)running ssh-user-config for each user will set
*** Info: these permissions correctly. [Similar restrictions apply, for
*** Info: instance, for .rhosts files if the rshd server is running, etc].


*** Info: The sshd service has been installed under the 'cyg_server'
*** Info: account.  To start the service now, call `net start sshd' or
*** Info: `cygrunsrv -S sshd'.  Otherwise, it will start automatically
*** Info: after the next reboot.

*** Info: Host configuration finished. Have fun!

上面會提示建立一個使用者cyg_server,並提示你輸入該使用者的密碼,我們這裡輸入和使用者名稱一樣的密碼cyg_server,後面會用到。
請注意cyg_server使用者的建立是強制的,沒有這個使用者即使sshd裝好也不行的,後面使用的時候會出現Connection closed的錯誤,本人就在這裡栽了跟頭,浪費了好多時間。

好了到服務裡看一下,會多出來一個CYGWIN sshd,可以把它設定成手動啟動,然後我們啟動它

回到cygwin環境,執行 ssh localhost命令。


在第一步詢問中輸入yes,在第二步要求輸入密碼時,輸入使用者密碼,密碼上面已經設定了

在cygwin中輸入ssh-keygen,一路回車即可

然後在cygwin下依次執行如下命令:

cd ~/.ssh
cp id_rsa.pub authorized_keys

完成後一路exit退出cygwin環境,再開啟cygwin環境,執行ssh localhost,發現如下圖不需要密碼即可進入,就代表成功了。

二:部署hadoop

我這裡用的是hadoop第一代產品,就是簡單的namenode,datanode,jobstracker,taskstracker,secondenamenode。我這裡提供一個下載好了的0.20.2版本,具體的也可以去apache hadooop官網下載

解壓後把hadoop包放入Cygwin的目錄下

配置一些資訊,首先jdk是必須的,這裡要提前說一個重點,一般我們的jdk是放C盤的Program Files裡面這裡就會設定到空格,這樣後面就會有錯誤。網上說加引號,建立軟連結,什麼反斜槓等等,我都試了讀不好使,在此我建議乾脆一點直接把以前jdk拿出Program Files資料夾單獨一個資料夾,這裡就不能再用空格的名字了。

jdk的環境變數配置我就不說了,在path中新增,這個根據自己安裝目錄

;C:\cygwin64\bin;C:\cygwin64\usr\sbin;

環境變數中-->新建變數-->CYGWIN-->對應的值為:ntsec tty

修改hadoop的一些配置檔案:

hadoop-env.sh,把前面的#號去掉

export JAVA_HOME=/java/jdk1.7.0_45

core-site.xml
<configuration> 
<property> 
<name>fs.default.name</name> 
<value>hdfs://localhost:9000</value> 
</property> 
</configuration>

hdfs-site.xml
<configuration> 
<property> 
<name>dfs.replication</name> 
<value>1</value> 
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name> 
<value>localhost:9001</value> 
</property> 
</configuration>


再次開啟cygwin圖示,這時切換到hadoop命令列

輸入hadoop namenode -format 這是格式化hdfs系統,再啟動所有start-all.sh

我這裡啟動出現了問題,namenode沒起來,只有mr起來了,訪問地址也訪問不了,這裡有個日誌可以看,在hadoop/logs裡面有專門的namenode日誌

裡面提示,我的9000埠已經被使用,我開啟超級管理員介面執行netstat  -aon|findstr  "9000" 發現有個ppap的程序佔用了埠9000,開啟工作管理員直接找到這個程序,原來是pptv的一個程序,然後結束這個程序。當然我遇到的這種情況大家不一定會遇到,但如果出現某個節點起不來第一時間看日誌,出現端口占用一定提前結束程序。

再次啟動


這次namenode起來了,以此在瀏覽器輸入訪問地址http://localhost:50070 http://localhost:50030可以分別訪問hdfs和mr


到這裡一個簡單的hadoop環境搭建完畢,對於自己這樣的小白弄完這一步,可以給自己鼓個掌!