1. 程式人生 > >Windows環境下搭建Hadoop(2.6.0)+Hive(2.2.0)環境並連線Kettle(6.0)

Windows環境下搭建Hadoop(2.6.0)+Hive(2.2.0)環境並連線Kettle(6.0)

前提:配置JDK1.8環境,並配置相應的環境變數,JAVA_HOME

一.Hadoop的安裝

  1.1 下載Hadoop (2.6.0) http://hadoop.apache.org/releases.html

    1.1.1 下載對應版本的winutils(https://github.com/steveloughran/winutils)並將其bin目錄下的檔案,全部複製到hadoop的安裝目錄的bin檔案下,進行替換。

  1.2 解壓hadoop-2.6.0.tar.gz到指定目錄,並配置相應的環境變數。

    1.2.1  新建HADOOP_HOME環境變數,並將其新增到path目錄(;%HADOOP_HOME%\bin)

    1.2.2 開啟cmd視窗,輸入hadoop version 命令進行驗證,環境變數是否正常

  1.3 對Hadoop進行配置:(無同名配置檔案,可通過其同名template檔案複製,再進行編輯)

    1.3.1 編輯core-site.xml檔案:(在hadoop的安裝目錄下建立workplace資料夾,在workplace下建立tmp和name資料夾)

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <
value>/E:/software/hadoop-2.6.0/workplace/tmp</value> </property> <property> <name>dfs.name.dir</name> <value>/E:/software/hadoop-2.6.0/workplace/name</value> </property> <property> <name>fs.default.name</
name> <value>hdfs://127.0.0.1:9000</value> </property> <property> <name>hadoop.proxyuser.gl.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.gl.groups</name> <value>*</value> </property> </configuration>

  1.3.4 編輯hdfs-site.xml檔案

<configuration>
    <!-- 這個引數設定為1,因為是單機版hadoop -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>    
        <name>dfs.namenode.name.dir</name>    
        <value>file:/hadoop/data/dfs/namenode</value>    
    </property>    
    <property>    
        <name>dfs.datanode.data.dir</name>    
        <value>file:/hadoop/data/dfs/datanode</value>  
    </property>
</configuration>

  1.3.5 編輯mapred-site.xml

<configuration>
    <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
    </property>
    <property>
      <name>mapred.job.tracker</name>
      <value>127.0.0.1:9001</value>
    </property>
</configuration>

  1.3.5 編輯yarn-site.xml檔案

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

  1.4 初始化並啟動Hadoop

    1.4.1 在cmd視窗輸入: hadoop namenode –format(或者hdfs namenode -format)  命令,對節點進行初始化

    1.4.2 進入Hadoop安裝目錄下的sbin:(E:\software\hadoop-2.6.0\sbin),點選執行 start-all.bat 批處理檔案,啟動hadoop

    1.4.3 驗證hadoop是否成功啟動:新建cmd視窗,輸入:jps 命令,檢視所有執行的服務,如果NameNode、NodeManager、DataNode、ResourceManager服務都存在,則說明啟動成功。

  1.5 檔案傳輸測試

    1.5.1 建立輸入目錄,新建cmd視窗,輸入如下命令:(hdfs://localhost:9000 該路徑為core-site.xml檔案中配置的 fs.default.name 路徑)

      hadoop fs -mkdir hdfs://localhost:9000/user/

      hadoop fs -mkdir hdfs://localhost:9000/user/wcinput

    1.5.2 上傳資料到指定目錄:在cmd視窗輸入如下命令:

      hadoop fs -put E:\temp\MM.txt hdfs://localhost:9000/user/wcinput

      hadoop fs -put E:\temp\react文件.txt hdfs://localhost:9000/user/wcinput

    1.5.3 檢視檔案是否上傳成功,在cmd視窗輸入如下命令:

      hadoop fs -ls hdfs://localhost:9000/user/wcinput

                1.5.4 前臺頁面顯示情況:

      1.5.4 在前臺檢視hadoop的執行情況:(資源管理介面:http://localhost:8088/)

            1.5.5 節點管理介面(http://localhost:50070/) 

      1.5.6 通過前臺檢視Hadoop的檔案系統:(點選進入Utilities下拉選單,再點選Browse the file system,依次進入user和wcinput,即可檢視到上傳的檔案列表)如下圖:

 二. 安裝並配置Hive

  2.1 下載Hive 地址:http://mirror.bit.edu.cn/apache/hive/

  2.2 將apache-hive-2.2.0-bin.tar.gz解壓到指定的安裝目錄,並配置環境變數

    2.2.1   新建HIVE_HOME環境變數,並將其新增到path目錄(;%HIVE_HOME%\bin)

    2.2.2  開啟cmd視窗,輸入hive version 命令進行驗證環境變數是否正常

  2.3 配置hive-site.xml檔案(不解釋,可以直接看配置的相關描述)

 <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>
    <property>
    <name>hive.exec.local.scratchdir</name>    
    <value>E:/software/apache-hive-2.2.0-bin/scratch_dir</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>    
    <value>E:/software/apache-hive-2.2.0-bin/resources_dir/${hive.session.id}_resources</value>    
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.querylog.location</name>
    <value>E:/software/apache-hive-2.2.0-bin/querylog_dir</value>
    <description>Location of Hive run time structured log file</description>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>E:/software/apache-hive-2.2.0-bin/operation_dir</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
   <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>
      Enforce metastore schema version consistency.
      True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
            schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
            proper metastore schema migration. (Default)
      False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
    </description>
  </property>
  <!--配置使用者名稱和密碼-->
  <property>
    <name>hive.jdbc_passwd.auth.zhangsan</name>
    <value>123</value>
  </property>

  2.4 在安裝目錄下建立配置檔案中相應的資料夾:

    E:\software\apache-hive-2.2.0-bin\scratch_dir

    E:\software\apache-hive-2.2.0-bin\resources_dir

    E:\software\apache-hive-2.2.0-bin\querylog_dir

    E:\software\apache-hive-2.2.0-bin\operation_dir

  2.5 對Hive元資料庫進行初始化:(將mysql-connector-java-*.jar拷貝到安裝目錄lib下)

    進入安裝目錄:apache-hive-2.2.0-bin/bin/,在新建cmd視窗執行如下命令(MySql資料庫中會產生相應的使用者和資料表)

    hive --service schematool

   (程式會自動進入***apache-hive-2.2.0-bin\scripts\metastore\upgrade\mysql\檔案中讀取對應版本的sql檔案)

  2.6  啟動Hive ,新建cmd視窗,並輸入以下命令:

    啟動metastore : hive --service metastore  

    啟動hiveserver2:hive --service hiveserver2   

    (HiveServer2(HS2)是伺服器介面,使遠端客戶端執行對hive的查詢和檢索結果(更詳細的介紹這裡)。目前基於Thrift RPC的實現,是HiveServer的改進版本,並支援多客戶端併發和身份驗證。

    它旨在為JDBC和ODBC等開放API客戶端提供更好的支援。

  2.7 驗證:

   新建cmd視窗,並輸入以下 hive 命令,進入hive庫互動操作介面:

hive> create table test_table(id INT, name string);
hive> create table student(id int,name string,gender int,address string);
hive> show tables;
student
test_table
2 rows selected (1.391 seconds)

三. 通過Kettle 6.0 連線Hadoop2.6.0 + Hive 2.2.0 大資料環境進行資料遷移

    (待續)