簡介:

  sqoop是一款用於hadoop和關係型資料庫之間資料匯入匯出的工具。你可以通過sqoop把資料從資料庫(比如mysql,oracle)匯入到hdfs中;也可以把資料從hdfs中匯出到關係型資料庫中。通過將sqoop的操作命令轉化為Hadoop的MapReduce作業進行匯入匯出,(通常只涉及到Map任務)即sqoop生成的Job主要是併發執行MapTask實現資料並行傳輸以提升資料傳送速度和效率,如果使用Shell指令碼來實現多執行緒資料傳送則存在很大的難度Sqoop2(sqoop1.99.7)需要在Hadoop安裝目錄下的配置檔案中設定代理,屬於重量級嵌入安裝,文中我們使用qoop1(Sqoop1.4.6)。

前提:(若不知道如何安裝請看我前面寫的hadoop分類的文章

CloudDeskTop上安裝了: hadoop-2.7.3  jdk1.7.0_79  mysql-5.5.32 sqoop-1.4.6 hive-1.2.2
master01和master02安裝了: hadoop-2.7.3 jdk1.7.0_79
slave01、slave02、slave03安裝了: hadoop-2.7.3 jdk1.7.0_79 zookeeper-3.4.10

一、安裝:

1、上傳安裝包sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz到/install/目錄下

2、解壓:

[hadoop@CloudDeskTop install]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /software/

3、配置環境:

[hadoop@CloudDeskTop software]$ su -lc "vi /etc/profile"

JAVA_HOME=/software/jdk1.7.0_79
HADOOP_HOME=/software/hadoop-2.7.3
SQOOP_HOME=/software/sqoop-1.4.6
PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/lib:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin
export PATH JAVA_HOME HADOOP_HOME SQOOP_HOME

4、配置完環境後,執行如下語句,立即生效配置檔案:

[hadoop@CloudDeskTop software]$ source /etc/profile

5、進入/software/sqoop-1.4.6/lib/目錄,上傳mysql-connector-java-5.1.43-bin.jar

這個地方的資料庫驅動包必須選擇該版本(5.1.43),因為Sqoop需要對接MySql資料庫,如果選擇的資料庫驅動包不是這個版本,很容易出錯。

6、配置sqoop

[hadoop@CloudDeskTop software]$ cd /software/sqoop-1.4.6/bin/

[hadoop@CloudDeskTop bin]$ vi configure-sqoop

註釋掉如下程式碼:用這個符號“:<<COMMENT”作為起始符,“COMMENT”作為結束符;

127 :<<COMMENT
128 ## Moved to be a runtime check in sqoop.
129 if [ ! -d "${HBASE_HOME}" ]; then
130 echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
131 echo 'Please set $HBASE_HOME to the root of your HBase installation.'
132 fi
133
134 ## Moved to be a runtime check in sqoop.
135 if [ ! -d "${HCAT_HOME}" ]; then
136 echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
137 echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
138 fi
139
140 if [ ! -d "${ACCUMULO_HOME}" ]; then
141 echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
142 echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
143 fi
144 if [ ! -d "${ZOOKEEPER_HOME}" ]; then
145 echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
146 echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
147 fi
148 COMMENT

二、啟動(沒說明的都預設是在hadoop使用者下操作)

【0、在CloudDeskTop的root使用者下啟動mysql】

[root@CloudDeskTop ~]# cd /software/mysql-5.5.32/sbin/ && ./mysqld start && lsof -i:3306 && cd -

【1、在slave節點啟動zookeeper叢集(小弟中選個leader和follower)】

  cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh start && cd - && jps
  cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh status && cd -

【2、master01啟動HDFS叢集】cd /software/ && start-dfs.sh && jps

【3、master01啟動YARN叢集】cd /software/ && start-yarn.sh && jps

【YARN叢集啟動時,不會把另外一個備用主節點的YARN叢集拉起來啟動,所以在master02執行語句:】

cd /software/ && yarn-daemon.sh start resourcemanager && jps

【4、檢視程序】

【6、查詢sqoop版本來判斷sqoop是否安裝成功】

 [hadoop@CloudDeskTop software]$ sqoop version

三、測試

  說明:匯入與匯出操作的方向是以HDFS叢集為基準參考點來定義的,如果資料從HDFS叢集流出則表示匯出,如果資料流入HDFS叢集則表示匯入Hive表中的資料實際上是儲存到HDFS叢集中的,因此對Hive表的匯入與匯出實際上都是在操作HDFS叢集中的檔案。

首先,在本地建立資料:

在hive資料庫建表後上傳到叢集中表存放資料的路徑下:

[hadoop@CloudDeskTop test]$ hdfs dfs -put testsqoop.out /user/hive/warehouse/mmzs.db/testsqoop

 目標一、將hdfs叢集的資料匯入到mysql資料庫中

1、在hive資料庫mmzs中建立表,並匯入資料

[hadoop@CloudDeskTop software]$ cd /software/hive-1.2.2/bin/
[hadoop@CloudDeskTop bin]$ ./hive
hive> show databases;
OK
default
mmzs
mmzsmysql
Time taken: 0.373 seconds, Fetched: 3 row(s)
hive> create table if not exists mmzs.testsqoop(id int,name string,age int) row format delimited fields terminated by '\t';
OK
Time taken: 0.126 seconds
hive> select * from mmzs.testsqoop;
OK
1 ligang 2
2 chenghua 3
3 liqin 1
4 zhanghua 4
5 wanghua 1
6 liulinjin 5
7 wangxiaochuan 6
8 guchuan 2
9 xiaoyong 4
10 huping 6
Time taken: 0.824 seconds, Fetched: 10 row(s)

2、在mysql資料庫中建立相同欄位的表

[root@CloudDeskTop bin]# cd ~
[root@CloudDeskTop ~]# cd /software/mysql-5.5.32/bin/
[root@CloudDeskTop bin]# ./mysql -uroot -p123456 -P3306 -h192.168.154.134 -e "create database mmzs character set utf8"
[root@CloudDeskTop bin]# ./mysql -uroot -p123456 -h192.168.154.134 -P3306 -Dmmzs
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.5.32 Source distribution Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show tables;
Empty set (0.00 sec) mysql> create table if not exists testsqoop(uid int(11),uname varchar(30),age int)engine=innodb charset=utf8
-> ;
Query OK, 0 rows affected (0.06 sec) mysql> desc testsqoop;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| uid | int(11) | YES | | NULL | |
| uname | varchar(30) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec) mysql> select * from testsqoop;
Empty set (0.01 sec)

3、使用Sqoop將Hive表中的資料匯出到MySql資料庫中(整個HDFS檔案匯出)

[hadoop@CloudDeskTop software]$ sqoop-export --help

17/12/30 21:54:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS] Common arguments:
--connect <jdbc-uri> Specify JDBC connect
string
--connection-manager <class-name> Specify connection manager
class name
--connection-param-file <properties-file> Specify connection
parameters file
--driver <class-name> Manually specify JDBC
driver class to use
--hadoop-home <hdir> Override
$HADOOP_MAPRED_HOME_ARG
--hadoop-mapred-home <dir> Override
$HADOOP_MAPRED_HOME_ARG
--help Print usage instructions
-P Read password from console
--password <password> Set authentication
password
--password-alias <password-alias> Credential provider
password alias
--password-file <password-file> Set authentication
password file path
--relaxed-isolation Use read-uncommitted
isolation for imports
--skip-dist-cache Skip copying jars to
distributed cache
--username <username> Set authentication
username
--verbose Print more information
while working Export control arguments:
--batch Indicates
underlying
statements
to be
executed in
batch mode
--call <arg> Populate the
table using
this stored
procedure
(one call
per row)
--clear-staging-table Indicates
that any
data in
staging
table can be
deleted
--columns <col,col,col...> Columns to
export to
table
--direct Use direct
export fast
path
--export-dir <dir> HDFS source
path for the
export
-m,--num-mappers <n> Use 'n' map
tasks to
export in
parallel
--mapreduce-job-name <name> Set name for
generated
mapreduce
job
--staging-table <table-name> Intermediate
staging
table
--table <table-name> Table to
populate
--update-key <key> Update
records by
specified
key column
--update-mode <mode> Specifies
how updates
are
performed
when new
rows are
found with
non-matching
keys in
database
--validate Validate the
copy using
the
configured
validator
--validation-failurehandler <validation-failurehandler> Fully
qualified
class name
for
ValidationFa
ilureHandler
--validation-threshold <validation-threshold> Fully
qualified
class name
for
ValidationTh
reshold
--validator <validator> Fully
qualified
class name
for the
Validator Input parsing arguments:
--input-enclosed-by <char> Sets a required field encloser
--input-escaped-by <char> Sets the input escape
character
--input-fields-terminated-by <char> Sets the input field separator
--input-lines-terminated-by <char> Sets the input end-of-line
char
--input-optionally-enclosed-by <char> Sets a field enclosing
character Output line formatting arguments:
--enclosed-by <char> Sets a required field enclosing
character
--escaped-by <char> Sets the escape character
--fields-terminated-by <char> Sets the field separator character
--lines-terminated-by <char> Sets the end-of-line character
--mysql-delimiters Uses MySQL's default delimiter set:
fields: , lines: \n escaped-by: \
optionally-enclosed-by: '
--optionally-enclosed-by <char> Sets a field enclosing character Code generation arguments:
--bindir <dir> Output directory for compiled
objects
--class-name <name> Sets the generated class name.
This overrides --package-name.
When combined with --jar-file,
sets the input class.
--input-null-non-string <null-str> Input null non-string
representation
--input-null-string <null-str> Input null string representation
--jar-file <file> Disable code generation; use
specified jar
--map-column-java <arg> Override mapping for specific
columns to java types
--null-non-string <null-str> Null non-string representation
--null-string <null-str> Null string representation
--outdir <dir> Output directory for generated
code
--package-name <name> Put auto-generated classes in
this package HCatalog arguments:
--hcatalog-database <arg> HCatalog database name
--hcatalog-home <hdir> Override $HCAT_HOME
--hcatalog-partition-keys <partition-key> Sets the partition
keys to use when
importing to hive
--hcatalog-partition-values <partition-value> Sets the partition
values to use when
importing to hive
--hcatalog-table <arg> HCatalog table name
--hive-home <dir> Override $HIVE_HOME
--hive-partition-key <partition-key> Sets the partition key
to use when importing
to hive
--hive-partition-value <partition-value> Sets the partition
value to use when
importing to hive
--map-column-hive <arg> Override mapping for
specific column to
hive types. Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions] At minimum, you must specify --connect, --export-dir, and --table

#-m是指定map任務的個數

[hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2
[hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2
17/12/30 22:02:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/30 22:02:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/30 22:02:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/30 22:02:04 INFO tool.CodeGenTool: Beginning code generation
17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
17/12/30 22:02:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3
注: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
17/12/30 22:02:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.jar
17/12/30 22:02:11 INFO mapreduce.ExportJobBase: Beginning export of testsqoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/12/30 22:02:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/12/30 22:02:13 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1
17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1
17/12/30 22:02:23 INFO mapreduce.JobSubmitter: number of splits:2
17/12/30 22:02:23 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/12/30 22:02:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0001
17/12/30 22:02:25 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0001
17/12/30 22:02:25 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0001/
17/12/30 22:02:25 INFO mapreduce.Job: Running job: job_1514638990227_0001
17/12/30 22:03:13 INFO mapreduce.Job: Job job_1514638990227_0001 running in uber mode : false
17/12/30 22:03:13 INFO mapreduce.Job: map 0% reduce 0%
17/12/30 22:03:58 INFO mapreduce.Job: map 100% reduce 0%
17/12/30 22:03:59 INFO mapreduce.Job: Job job_1514638990227_0001 completed successfully
17/12/30 22:03:59 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=277282
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=484
HDFS: Number of bytes written=0
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=79918
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=79918
Total vcore-milliseconds taken by all map tasks=79918
Total megabyte-milliseconds taken by all map tasks=81836032
Map-Reduce Framework
Map input records=10
Map output records=10
Input split bytes=286
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=386
CPU time spent (ms)=4950
Physical memory (bytes) snapshot=216600576
Virtual memory (bytes) snapshot=1697566720
Total committed heap usage (bytes)=32874496
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Transferred 484 bytes in 105.965 seconds (4.5675 bytes/sec)
17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Exported 10 records.

執行截圖

小結:從執行過程可以看出只有Map任務,沒有Reduce任務。

4、在mysql資料庫再次查詢結果

mysql> select * from testsqoop;
+------+---------------+------+
| uid | uname | age |
+------+---------------+------+
| 1 | ligang | 2 |
| 2 | chenghua | 3 |
| 3 | liqin | 1 |
| 4 | zhanghua | 4 |
| 5 | wanghua | 1 |
| 6 | liulinjin | 5 |
| 7 | wangxiaochuan | 6 |
| 8 | guchuan | 2 |
| 9 | xiaoyong | 4 |
| 10 | huping | 6 |
+------+---------------+------+
10 rows in set (0.00 sec)

從結果可以證明資料匯出到mysql資料庫成功。

 目標二、將mysql的資料匯入到hdfs叢集中

1、刪除hive中mmzs資料庫的testsqoop表的資料

確認真的刪除了:

2、將mysql中的資料匯入到hdfs群

A、指定部分查詢資料匯入到叢集眾

[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n'
[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n'
17/12/30 22:40:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/30 22:40:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/30 22:40:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/30 22:40:55 INFO tool.CodeGenTool: Beginning code generation
17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and (1 = 0)
17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and (1 = 0)
17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and (1 = 0)
17/12/30 22:40:55 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3
注: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
17/12/30 22:40:58 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.jar
17/12/30 22:40:58 INFO mapreduce.ImportJobBase: Beginning query import.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/12/30 22:40:59 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/12/30 22:41:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/30 22:41:08 INFO db.DBInputFormat: Using read commited transaction isolation
17/12/30 22:41:09 INFO mapreduce.JobSubmitter: number of splits:1
17/12/30 22:41:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0003
17/12/30 22:41:10 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0003
17/12/30 22:41:10 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0003/
17/12/30 22:41:10 INFO mapreduce.Job: Running job: job_1514638990227_0003
17/12/30 22:41:54 INFO mapreduce.Job: Job job_1514638990227_0003 running in uber mode : false
17/12/30 22:41:54 INFO mapreduce.Job: map 0% reduce 0%
17/12/30 22:42:29 INFO mapreduce.Job: map 100% reduce 0%
17/12/30 22:42:31 INFO mapreduce.Job: Job job_1514638990227_0003 completed successfully
17/12/30 22:42:32 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=138692
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=94
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=32275
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=32275
Total vcore-milliseconds taken by all map tasks=32275
Total megabyte-milliseconds taken by all map tasks=33049600
Map-Reduce Framework
Map input records=7
Map output records=7
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=170
CPU time spent (ms)=2020
Physical memory (bytes) snapshot=109428736
Virtual memory (bytes) snapshot=851021824
Total committed heap usage (bytes)=19091456
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=94
17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Transferred 94 bytes in 91.0632 seconds (1.0322 bytes/sec)
17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Retrieved 7 records.
17/12/30 22:42:32 INFO util.AppendUtils: Appending to directory testsqoop

在叢集中查詢是否真的匯入了資料:

在hive資料庫中中查詢是否真的匯入了資料:

從結果可以證明資料匯入到hdfs叢集成功。

刪除叢集資料,方便下次匯入操作:

[hadoop@master01 software]$ hdfs dfs -rm -r /user/hive/warehouse/mmzs.db/testsqoop/part-m-00000

B、指定一張表,整個表的資料一起匯入到叢集中

sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '\t' --lines-terminated-by '\n'
[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '\t' --lines-terminated-by '\n'
17/12/30 22:28:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/30 22:28:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/30 22:28:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/30 22:28:32 INFO tool.CodeGenTool: Beginning code generation
17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
17/12/30 22:28:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3
注: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
17/12/30 22:28:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.jar
17/12/30 22:28:36 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/12/30 22:28:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/12/30 22:28:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/12/30 22:28:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/12/30 22:28:36 INFO mapreduce.ImportJobBase: Beginning import of testsqoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/12/30 22:28:36 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/12/30 22:28:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/30 22:28:45 INFO db.DBInputFormat: Using read commited transaction isolation
17/12/30 22:28:45 INFO mapreduce.JobSubmitter: number of splits:1
17/12/30 22:28:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0002
17/12/30 22:28:46 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0002
17/12/30 22:28:47 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0002/
17/12/30 22:28:47 INFO mapreduce.Job: Running job: job_1514638990227_0002
17/12/30 22:29:29 INFO mapreduce.Job: Job job_1514638990227_0002 running in uber mode : false
17/12/30 22:29:29 INFO mapreduce.Job: map 0% reduce 0%
17/12/30 22:30:06 INFO mapreduce.Job: map 100% reduce 0%
17/12/30 22:30:07 INFO mapreduce.Job: Job job_1514638990227_0002 completed successfully
17/12/30 22:30:08 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=138842
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=128
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=33630
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=33630
Total vcore-milliseconds taken by all map tasks=33630
Total megabyte-milliseconds taken by all map tasks=34437120
Map-Reduce Framework
Map input records=10
Map output records=10
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=177
CPU time spent (ms)=2490
Physical memory (bytes) snapshot=109060096
Virtual memory (bytes) snapshot=850882560
Total committed heap usage (bytes)=18972672
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=128
17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Transferred 128 bytes in 89.4828 seconds (1.4304 bytes/sec)
17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Retrieved 10 records.
17/12/30 22:30:08 INFO util.AppendUtils: Appending to directory testsqoop

執行結果

在叢集中查詢是否真的匯入了資料:

在hive資料庫中中查詢是否真的匯入了資料:

從結果可以證明資料匯入到hdfs叢集成功。