sqoop安裝及使用

簡介：

　　sqoop是一款用於hadoop和關係型資料庫之間資料匯入匯出的工具。你可以通過sqoop把資料從資料庫（比如mysql,oracle）匯入到hdfs中；也可以把資料從hdfs中匯出到關係型資料庫中。通過將sqoop的操作命令轉化為Hadoop的MapReduce作業進行匯入匯出，(通常只涉及到Map任務)即sqoop生成的Job主要是併發執行MapTask實現資料並行傳輸以提升資料傳送速度和效率，如果使用Shell指令碼來實現多執行緒資料傳送則存在很大的難度Sqoop2(sqoop1.99.7)需要在Hadoop安裝目錄下的配置檔案中設定代理，屬於重量級嵌入安裝，文中我們使用qoop1(Sqoop1.4.6)。

前提：（若不知道如何安裝請看我前面寫的hadoop分類的文章）

CloudDeskTop上安裝了： hadoop-2.7.3  jdk1.7.0_79  mysql-5.5.32 sqoop-1.4.6 hive-1.2.2

master01和master02安裝了： hadoop-2.7.3 jdk1.7.0_79

slave01、slave02、slave03安裝了： hadoop-2.7.3 jdk1.7.0_79 zookeeper-3.4.10

一、安裝：

1、上傳安裝包sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz到/install/目錄下

2、解壓：

[hadoop@CloudDeskTop install]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /software/

3、配置環境：

[hadoop@CloudDeskTop software]$ su -lc "vi /etc/profile"

JAVA_HOME=/software/jdk1.7.0_79

HADOOP_HOME=/software/hadoop-2.7.3

SQOOP_HOME=/software/sqoop-1.4.6

PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/lib:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin

export PATH JAVA_HOME HADOOP_HOME SQOOP_HOME

4、配置完環境後，執行如下語句，立即生效配置檔案：

[hadoop@CloudDeskTop software]$ source /etc/profile

5、進入/software/sqoop-1.4.6/lib/目錄，上傳mysql-connector-java-5.1.43-bin.jar包

這個地方的資料庫驅動包必須選擇該版本(5.1.43)，因為Sqoop需要對接MySql資料庫，如果選擇的資料庫驅動包不是這個版本，很容易出錯。

6、配置sqoop

[hadoop@CloudDeskTop software]$ cd /software/sqoop-1.4.6/bin/

[hadoop@CloudDeskTop bin]$ vi configure-sqoop

註釋掉如下程式碼：用這個符號“:<<COMMENT”作為起始符，“COMMENT”作為結束符；

127 :<<COMMENT

128 ## Moved to be a runtime check in sqoop.

129 if [ ! -d "${HBASE_HOME}" ]; then

130   echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."

131   echo 'Please set $HBASE_HOME to the root of your HBase installation.'

132 fi

133

134 ## Moved to be a runtime check in sqoop.

135 if [ ! -d "${HCAT_HOME}" ]; then

136   echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."

137   echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'

138 fi

139

140 if [ ! -d "${ACCUMULO_HOME}" ]; then

141   echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."

142   echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'

143 fi

144 if [ ! -d "${ZOOKEEPER_HOME}" ]; then

145   echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."

146   echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'

147 fi

148 COMMENT

二、啟動（沒說明的都預設是在hadoop使用者下操作）

【0、在CloudDeskTop的root使用者下啟動mysql】

[root@CloudDeskTop ~]# cd /software/mysql-5.5.32/sbin/ && ./mysqld start && lsof -i:3306 && cd -

【1、在slave節點啟動zookeeper叢集（小弟中選個leader和follower）】

　　cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh start && cd - && jps
　　cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh status && cd -

【2、master01啟動HDFS叢集】cd /software/ && start-dfs.sh && jps

【3、master01啟動YARN叢集】cd /software/ && start-yarn.sh && jps

【YARN叢集啟動時，不會把另外一個備用主節點的YARN叢集拉起來啟動，所以在master02執行語句:】

cd /software/ && yarn-daemon.sh start resourcemanager && jps

【4、檢視程序】

【6、查詢sqoop版本來判斷sqoop是否安裝成功】

　[hadoop@CloudDeskTop software]$ sqoop version

三、測試

　　說明：匯入與匯出操作的方向是以HDFS叢集為基準參考點來定義的，如果資料從HDFS叢集流出則表示匯出，如果資料流入HDFS叢集則表示匯入Hive表中的資料實際上是儲存到HDFS叢集中的，因此對Hive表的匯入與匯出實際上都是在操作HDFS叢集中的檔案。

首先，在本地建立資料：

在hive資料庫建表後上傳到叢集中表存放資料的路徑下：

[hadoop@CloudDeskTop test]$ hdfs dfs -put testsqoop.out /user/hive/warehouse/mmzs.db/testsqoop

目標一、將hdfs叢集的資料匯入到mysql資料庫中

1、在hive資料庫mmzs中建立表，並匯入資料

[hadoop@CloudDeskTop software]$ cd /software/hive-1.2.2/bin/

[hadoop@CloudDeskTop bin]$ ./hive

hive> show databases;

OK

default

mmzs

mmzsmysql

Time taken: 0.373 seconds, Fetched: 3 row(s)

hive> create table if not exists mmzs.testsqoop(id int,name string,age int) row format delimited fields terminated by '\t';

OK

Time taken: 0.126 seconds

hive> select * from mmzs.testsqoop;

OK

1    ligang    2

2    chenghua    3

3    liqin    1

4    zhanghua    4

5    wanghua    1

6    liulinjin    5

7    wangxiaochuan    6

8    guchuan    2

9    xiaoyong    4

10    huping    6

Time taken: 0.824 seconds, Fetched: 10 row(s)

2、在mysql資料庫中建立相同欄位的表

[root@CloudDeskTop bin]# cd ~

[root@CloudDeskTop ~]# cd /software/mysql-5.5.32/bin/

[root@CloudDeskTop bin]# ./mysql -uroot -p123456 -P3306 -h192.168.154.134 -e "create database mmzs character set utf8"

[root@CloudDeskTop bin]# ./mysql -uroot -p123456 -h192.168.154.134 -P3306 -Dmmzs

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 12

Server version: 5.5.32 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show tables;

Empty set (0.00 sec)

mysql> create table if not exists testsqoop(uid int(11),uname varchar(30),age int)engine=innodb charset=utf8

    -> ;

Query OK, 0 rows affected (0.06 sec)

mysql> desc testsqoop;

+-------+-------------+------+-----+---------+-------+

| Field | Type        | Null | Key | Default | Extra |

+-------+-------------+------+-----+---------+-------+

| uid   | int(11)     | YES  |     | NULL    |       |

| uname | varchar(30) | YES  |     | NULL    |       |

| age   | int(11)     | YES  |     | NULL    |       |

+-------+-------------+------+-----+---------+-------+

3 rows in set (0.00 sec)

mysql> select * from testsqoop;

Empty set (0.01 sec)

3、使用Sqoop將Hive表中的資料匯出到MySql資料庫中(整個HDFS檔案匯出)

[hadoop@CloudDeskTop software]$ sqoop-export --help

17/12/30 21:54:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:

   --connect <jdbc-uri>                         Specify JDBC connect

                                                string

   --connection-manager <class-name>            Specify connection manager

                                                class name

   --connection-param-file <properties-file>    Specify connection

                                                parameters file

   --driver <class-name>                        Manually specify JDBC

                                                driver class to use

   --hadoop-home <hdir>                         Override

                                                $HADOOP_MAPRED_HOME_ARG

   --hadoop-mapred-home <dir>                   Override

                                                $HADOOP_MAPRED_HOME_ARG

   --help                                       Print usage instructions

-P                                              Read password from console

   --password <password>                        Set authentication

                                                password

   --password-alias <password-alias>            Credential provider

                                                password alias

   --password-file <password-file>              Set authentication

                                                password file path

   --relaxed-isolation                          Use read-uncommitted

                                                isolation for imports

   --skip-dist-cache                            Skip copying jars to

                                                distributed cache

   --username <username>                        Set authentication

                                                username

   --verbose                                    Print more information

                                                while working

Export control arguments:

   --batch                                                    Indicates

                                                              underlying

                                                              statements

                                                              to be

                                                              executed in

                                                              batch mode

   --call <arg>                                               Populate the

                                                              table using

                                                              this stored

                                                              procedure

                                                              (one call

                                                              per row)

   --clear-staging-table                                      Indicates

                                                              that any

                                                              data in

                                                              staging

                                                              table can be

                                                              deleted

   --columns <col,col,col...>                                 Columns to

                                                              export to

                                                              table

   --direct                                                   Use direct

                                                              export fast

                                                              path

   --export-dir <dir>                                         HDFS source

                                                              path for the

                                                              export

-m,--num-mappers <n>                                          Use 'n' map

                                                              tasks to

                                                              export in

                                                              parallel

   --mapreduce-job-name <name>                                Set name for

                                                              generated

                                                              mapreduce

                                                              job

   --staging-table <table-name>                               Intermediate

                                                              staging

                                                              table

   --table <table-name>                                       Table to

                                                              populate

   --update-key <key>                                         Update

                                                              records by

                                                              specified

                                                              key column

   --update-mode <mode>                                       Specifies

                                                              how updates

                                                              are

                                                              performed

                                                              when new

                                                              rows are

                                                              found with

                                                              non-matching

                                                              keys in

                                                              database

   --validate                                                 Validate the

                                                              copy using

                                                              the

                                                              configured

                                                              validator

   --validation-failurehandler <validation-failurehandler>    Fully

                                                              qualified

                                                              class name

                                                              for

                                                              ValidationFa

                                                              ilureHandler

   --validation-threshold <validation-threshold>              Fully

                                                              qualified

                                                              class name

                                                              for

                                                              ValidationTh

                                                              reshold

   --validator <validator>                                    Fully

                                                              qualified

                                                              class name

                                                              for the

                                                              Validator

Input parsing arguments:

   --input-enclosed-by <char>               Sets a required field encloser

   --input-escaped-by <char>                Sets the input escape

                                            character

   --input-fields-terminated-by <char>      Sets the input field separator

   --input-lines-terminated-by <char>       Sets the input end-of-line

                                            char

   --input-optionally-enclosed-by <char>    Sets a field enclosing

                                            character

Output line formatting arguments:

   --enclosed-by <char>               Sets a required field enclosing

                                      character

   --escaped-by <char>                Sets the escape character

   --fields-terminated-by <char>      Sets the field separator character

   --lines-terminated-by <char>       Sets the end-of-line character

   --mysql-delimiters                 Uses MySQL's default delimiter set:

                                      fields: ,  lines: \n  escaped-by: \

                                      optionally-enclosed-by: '

   --optionally-enclosed-by <char>    Sets a field enclosing character

Code generation arguments:

   --bindir <dir>                        Output directory for compiled

                                         objects

   --class-name <name>                   Sets the generated class name.

                                         This overrides --package-name.

                                         When combined with --jar-file,

                                         sets the input class.

   --input-null-non-string <null-str>    Input null non-string

                                         representation

   --input-null-string <null-str>        Input null string representation

   --jar-file <file>                     Disable code generation; use

                                         specified jar

   --map-column-java <arg>               Override mapping for specific

                                         columns to java types

   --null-non-string <null-str>          Null non-string representation

   --null-string <null-str>              Null string representation

   --outdir <dir>                        Output directory for generated

                                         code

   --package-name <name>                 Put auto-generated classes in

                                         this package

HCatalog arguments:

   --hcatalog-database <arg>                        HCatalog database name

   --hcatalog-home <hdir>                           Override $HCAT_HOME

   --hcatalog-partition-keys <partition-key>        Sets the partition

                                                    keys to use when

                                                    importing to hive

   --hcatalog-partition-values <partition-value>    Sets the partition

                                                    values to use when

                                                    importing to hive

   --hcatalog-table <arg>                           HCatalog table name

   --hive-home <dir>                                Override $HIVE_HOME

   --hive-partition-key <partition-key>             Sets the partition key

                                                    to use when importing

                                                    to hive

   --hive-partition-value <partition-value>         Sets the partition

                                                    value to use when

                                                    importing to hive

   --map-column-hive <arg>                          Override mapping for

                                                    specific column to

                                                    hive types.

Generic Hadoop command-line arguments:

(must preceed any tool-specific arguments)

Generic options supported are

-conf <configuration file>     specify an application configuration file

-D <property=value>            use value for given property

-fs <local|namenode:port>      specify a namenode

-jt <local|resourcemanager:port>    specify a ResourceManager

-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster

-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.

-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

At minimum, you must specify --connect, --export-dir, and --table

#-m是指定map任務的個數

[hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2

[hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2

17/12/30 22:02:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

17/12/30 22:02:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

17/12/30 22:02:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

17/12/30 22:02:04 INFO tool.CodeGenTool: Beginning code generation

17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1

17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1

17/12/30 22:02:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3

注: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.java使用或覆蓋了已過時的 API。

注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。

17/12/30 22:02:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.jar

17/12/30 22:02:11 INFO mapreduce.ExportJobBase: Beginning export of testsqoop

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

17/12/30 22:02:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

17/12/30 22:02:13 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1

17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1

17/12/30 22:02:23 INFO mapreduce.JobSubmitter: number of splits:2

17/12/30 22:02:23 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

17/12/30 22:02:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0001

17/12/30 22:02:25 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0001

17/12/30 22:02:25 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0001/

17/12/30 22:02:25 INFO mapreduce.Job: Running job: job_1514638990227_0001

17/12/30 22:03:13 INFO mapreduce.Job: Job job_1514638990227_0001 running in uber mode : false

17/12/30 22:03:13 INFO mapreduce.Job:  map 0% reduce 0%

17/12/30 22:03:58 INFO mapreduce.Job:  map 100% reduce 0%

17/12/30 22:03:59 INFO mapreduce.Job: Job job_1514638990227_0001 completed successfully

17/12/30 22:03:59 INFO mapreduce.Job: Counters: 30

    File System Counters

        FILE: Number of bytes read=0

        FILE: Number of bytes written=277282

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

        HDFS: Number of bytes read=484

        HDFS: Number of bytes written=0

        HDFS: Number of read operations=8

        HDFS: Number of large read operations=0

        HDFS: Number of write operations=0

    Job Counters

        Launched map tasks=2

        Data-local map tasks=2

        Total time spent by all maps in occupied slots (ms)=79918

        Total time spent by all reduces in occupied slots (ms)=0

        Total time spent by all map tasks (ms)=79918

        Total vcore-milliseconds taken by all map tasks=79918

        Total megabyte-milliseconds taken by all map tasks=81836032

    Map-Reduce Framework

        Map input records=10

        Map output records=10

        Input split bytes=286

        Spilled Records=0

        Failed Shuffles=0

        Merged Map outputs=0

        GC time elapsed (ms)=386

        CPU time spent (ms)=4950

        Physical memory (bytes) snapshot=216600576

        Virtual memory (bytes) snapshot=1697566720

        Total committed heap usage (bytes)=32874496

    File Input Format Counters

        Bytes Read=0

    File Output Format Counters

        Bytes Written=0

17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Transferred 484 bytes in 105.965 seconds (4.5675 bytes/sec)

17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Exported 10 records.

執行截圖

小結：從執行過程可以看出只有Map任務，沒有Reduce任務。

4、在mysql資料庫再次查詢結果

mysql> select * from testsqoop;

+------+---------------+------+

| uid  | uname         | age  |

+------+---------------+------+

|    1 | ligang        |    2 |

|    2 | chenghua      |    3 |

|    3 | liqin         |    1 |

|    4 | zhanghua      |    4 |

|    5 | wanghua       |    1 |

|    6 | liulinjin     |    5 |

|    7 | wangxiaochuan |    6 |

|    8 | guchuan       |    2 |

|    9 | xiaoyong      |    4 |

|   10 | huping        |    6 |

+------+---------------+------+

10 rows in set (0.00 sec)

從結果可以證明資料匯出到mysql資料庫成功。

目標二、將mysql的資料匯入到hdfs叢集中

1、刪除hive中mmzs資料庫的testsqoop表的資料

確認真的刪除了：

2、將mysql中的資料匯入到hdfs群

A、指定部分查詢資料匯入到叢集眾

[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n'

[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n'

17/12/30 22:40:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

17/12/30 22:40:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

17/12/30 22:40:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

17/12/30 22:40:55 INFO tool.CodeGenTool: Beginning code generation

17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and  (1 = 0)

17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and  (1 = 0)

17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and  (1 = 0)

17/12/30 22:40:55 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3

注: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.java使用或覆蓋了已過時的 API。

注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。

17/12/30 22:40:58 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.jar

17/12/30 22:40:58 INFO mapreduce.ImportJobBase: Beginning query import.

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

17/12/30 22:40:59 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

17/12/30 22:41:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

17/12/30 22:41:08 INFO db.DBInputFormat: Using read commited transaction isolation

17/12/30 22:41:09 INFO mapreduce.JobSubmitter: number of splits:1

17/12/30 22:41:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0003

17/12/30 22:41:10 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0003

17/12/30 22:41:10 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0003/

17/12/30 22:41:10 INFO mapreduce.Job: Running job: job_1514638990227_0003

17/12/30 22:41:54 INFO mapreduce.Job: Job job_1514638990227_0003 running in uber mode : false

17/12/30 22:41:54 INFO mapreduce.Job:  map 0% reduce 0%

17/12/30 22:42:29 INFO mapreduce.Job:  map 100% reduce 0%

17/12/30 22:42:31 INFO mapreduce.Job: Job job_1514638990227_0003 completed successfully

17/12/30 22:42:32 INFO mapreduce.Job: Counters: 30

    File System Counters

        FILE: Number of bytes read=0

        FILE: Number of bytes written=138692

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

        HDFS: Number of bytes read=87

        HDFS: Number of bytes written=94

        HDFS: Number of read operations=4

        HDFS: Number of large read operations=0

        HDFS: Number of write operations=2

    Job Counters

        Launched map tasks=1

        Other local map tasks=1

        Total time spent by all maps in occupied slots (ms)=32275

        Total time spent by all reduces in occupied slots (ms)=0

        Total time spent by all map tasks (ms)=32275

        Total vcore-milliseconds taken by all map tasks=32275

        Total megabyte-milliseconds taken by all map tasks=33049600

    Map-Reduce Framework

        Map input records=7

        Map output records=7

        Input split bytes=87

        Spilled Records=0

        Failed Shuffles=0

        Merged Map outputs=0

        GC time elapsed (ms)=170

        CPU time spent (ms)=2020

        Physical memory (bytes) snapshot=109428736

        Virtual memory (bytes) snapshot=851021824

        Total committed heap usage (bytes)=19091456

    File Input Format Counters

        Bytes Read=0

    File Output Format Counters

        Bytes Written=94

17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Transferred 94 bytes in 91.0632 seconds (1.0322 bytes/sec)

17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Retrieved 7 records.

17/12/30 22:42:32 INFO util.AppendUtils: Appending to directory testsqoop

在叢集中查詢是否真的匯入了資料：

在hive資料庫中中查詢是否真的匯入了資料：

從結果可以證明資料匯入到hdfs叢集成功。

刪除叢集資料，方便下次匯入操作：

[hadoop@master01 software]$ hdfs dfs -rm -r /user/hive/warehouse/mmzs.db/testsqoop/part-m-00000

B、指定一張表，整個表的資料一起匯入到叢集中

sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '\t' --lines-terminated-by '\n'

[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '\t' --lines-terminated-by '\n'

17/12/30 22:28:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

17/12/30 22:28:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

17/12/30 22:28:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

17/12/30 22:28:32 INFO tool.CodeGenTool: Beginning code generation

17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1

17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1

17/12/30 22:28:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3

注: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.java使用或覆蓋了已過時的 API。

注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。

17/12/30 22:28:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.jar

17/12/30 22:28:36 WARN manager.MySQLManager: It looks like you are importing from mysql.

17/12/30 22:28:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

17/12/30 22:28:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

17/12/30 22:28:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

17/12/30 22:28:36 INFO mapreduce.ImportJobBase: Beginning import of testsqoop

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

17/12/30 22:28:36 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

17/12/30 22:28:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

17/12/30 22:28:45 INFO db.DBInputFormat: Using read commited transaction isolation

17/12/30 22:28:45 INFO mapreduce.JobSubmitter: number of splits:1

17/12/30 22:28:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0002

17/12/30 22:28:46 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0002

17/12/30 22:28:47 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0002/

17/12/30 22:28:47 INFO mapreduce.Job: Running job: job_1514638990227_0002

17/12/30 22:29:29 INFO mapreduce.Job: Job job_1514638990227_0002 running in uber mode : false

17/12/30 22:29:29 INFO mapreduce.Job:  map 0% reduce 0%

17/12/30 22:30:06 INFO mapreduce.Job:  map 100% reduce 0%

17/12/30 22:30:07 INFO mapreduce.Job: Job job_1514638990227_0002 completed successfully

17/12/30 22:30:08 INFO mapreduce.Job: Counters: 30

    File System Counters

        FILE: Number of bytes read=0

        FILE: Number of bytes written=138842

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

        HDFS: Number of bytes read=87

        HDFS: Number of bytes written=128

        HDFS: Number of read operations=4

        HDFS: Number of large read operations=0

        HDFS: Number of write operations=2

    Job Counters

        Launched map tasks=1

        Other local map tasks=1

        Total time spent by all maps in occupied slots (ms)=33630

        Total time spent by all reduces in occupied slots (ms)=0

        Total time spent by all map tasks (ms)=33630

        Total vcore-milliseconds taken by all map tasks=33630

        Total megabyte-milliseconds taken by all map tasks=34437120

    Map-Reduce Framework

        Map input records=10

        Map output records=10

        Input split bytes=87

        Spilled Records=0

        Failed Shuffles=0

        Merged Map outputs=0

        GC time elapsed (ms)=177

        CPU time spent (ms)=2490

        Physical memory (bytes) snapshot=109060096

        Virtual memory (bytes) snapshot=850882560

        Total committed heap usage (bytes)=18972672

    File Input Format Counters

        Bytes Read=0

    File Output Format Counters

        Bytes Written=128

17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Transferred 128 bytes in 89.4828 seconds (1.4304 bytes/sec)

17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Retrieved 10 records.

17/12/30 22:30:08 INFO util.AppendUtils: Appending to directory testsqoop

執行結果

在叢集中查詢是否真的匯入了資料：

在hive資料庫中中查詢是否真的匯入了資料：

從結果可以證明資料匯入到hdfs叢集成功。

sqoop安裝及使用

sqoop安裝及使用

最新文章