1. 程式人生 > >利用sqoop把資料從SqlServer匯入到Hadoop

利用sqoop把資料從SqlServer匯入到Hadoop

一、sqoop安裝和配置

1.下載和解壓,設定環境變數(略)

2.配置

修改sqoop/server/conf/catalina.properties common.loader==SQOOP_HOME新增/usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib/*.jar

修改sqoop/server/conf/sqoop.properties

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/src/hadoop-2.6.1/etc/hadoop

 sqoop2支援hadoop的simple>和kerberos兩種驗證機制,需要配置,否則會報authentication異常 org.apache.sqoop.security.authentication.type=SIMPLE org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler org.apache.sqoop.security.authentication.anonymous=true

同時,把有LOGDIR, BASEDIR引用的均替換為實際的絕對路徑

3.然後到對應目錄新建目錄 mkdir hadoop_lib

把hadoop相關依賴jar包拷貝到該目錄,把sqoop/server/bin/*.jar和sqoop/server/lib/*.jar拷貝到該目錄

cp /usr/local/src/hadoop-2.6.1/share/hadoop/common/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp /usr/local/src/hadoop-2.6.1/share/hadoop/common/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp /usr/local/src/hadoop-2.6.1/share/hadoop/hdfs/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp /usr/local/src/hadoop-2.6.1/share/hadoop/hdfs/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/mapreduce/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/mapreduce/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/tools/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/tools/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/yarn/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/yarn/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/httpfs/tomcat/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/sqoop-1.99.4-bin-hadoop200/server/bin/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/sqoop-1.99.4-bin-hadoop200/server/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

4.賦予許可權

sudo chmod 777 -R /usr/local/src/sqoop-1.99.4-bin-hadoop200

5.配置Hadoop

    Sqoop server 需要模擬使用者訪問叢集內外的HDFS和其他資源,所以,需要配置Hadoop通過所謂proxyuser系統顯示地允許這種模擬。也就是要在hadoop 目錄的etc/hadoop/core-site.xml 中增加下面兩個屬性。兩個value的地方寫*或實際使用者名稱均可。

<property>   <name>hadoop.proxyuser.sqoop2.hosts</name>   <value>*</value>        </property> <property>   <name>hadoop.proxyuser.sqoop2.groups</name>   <value>*</value>              </property>

6.驗證

輸入sqoop2-tool verify

二、下載連結驅動

1.到微軟官網下載sqljdbc

把jar檔案放到sqoop安裝目錄的lib資料夾裡

cp sqljdbc41.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/server/lib

2.下載SQL Server-Hadoop Connector

三、設定環境變數

新增MSSQL_CONNECTOR_HOME,讓它指向sqoop

sqoop import --connect 'jdbc:sqlserver://172.17.220.41;username=sa;[email protected];database=BKQV' --table PMIX --columns "dob,storeid" --where "dob=20180901" --target-dir /user/tmp -m 3