大資料教程（13.6）sqoop使用教程

阿新 • • 發佈：2019-03-18

上一章節，介紹了sqoop資料遷移工具安裝以及簡單匯入例項的相關知識；本篇部落格，博主將繼續為小夥伴們分享sqoop的使用。

一、sqoop資料匯入

(1)、匯入關係表到HIVE

./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1

執行報錯

[hadoop@centos-aaron-h1 bin]$ ./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1
Warning: /home/hadoop/sqoop/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 18:46:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/18 18:46:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/03/18 18:46:49 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/03/18 18:46:49 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/03/18 18:46:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/03/18 18:46:49 INFO tool.CodeGenTool: Beginning code generation
19/03/18 18:46:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 18:46:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 18:46:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1
注: /tmp/sqoop-hadoop/compile/b0cd7f379424039f4df44ee2b703c3d0/emp.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
19/03/18 18:46:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/b0cd7f379424039f4df44ee2b703c3d0/emp.jar
19/03/18 18:46:51 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/03/18 18:46:51 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/03/18 18:46:51 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/03/18 18:46:51 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/03/18 18:46:51 INFO mapreduce.ImportJobBase: Beginning import of emp
19/03/18 18:46:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/18 18:46:52 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/18 18:46:52 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032
19/03/18 18:46:54 INFO mapreduce.JobSubmitter: number of splits:1
19/03/18 18:46:54 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/18 18:46:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0003
19/03/18 18:46:54 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0003
19/03/18 18:46:54 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0003/
19/03/18 18:46:54 INFO mapreduce.Job: Running job: job_1552898029697_0003
19/03/18 18:47:06 INFO mapreduce.Job: Job job_1552898029697_0003 running in uber mode : false
19/03/18 18:47:06 INFO mapreduce.Job:  map 0% reduce 0%
19/03/18 18:47:13 INFO mapreduce.Job:  map 100% reduce 0%
19/03/18 18:47:13 INFO mapreduce.Job: Job job_1552898029697_0003 completed successfully
19/03/18 18:47:13 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=206933
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=151
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3950
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=3950
                Total vcore-milliseconds taken by all map tasks=3950
                Total megabyte-milliseconds taken by all map tasks=4044800
        Map-Reduce Framework
                Map input records=5
                Map output records=5
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=65
                CPU time spent (ms)=680
                Physical memory (bytes) snapshot=135651328
                Virtual memory (bytes) snapshot=1715556352
                Total committed heap usage (bytes)=42860544
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=151
19/03/18 18:47:13 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 21.0263 seconds (7.1815 bytes/sec)
19/03/18 18:47:13 INFO mapreduce.ImportJobBase: Retrieved 5 records.
19/03/18 18:47:13 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp
19/03/18 18:47:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 18:47:13 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive
19/03/18 18:47:13 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/18 18:47:13 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
19/03/18 18:47:13 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
        at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
        at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:190)
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
        ... 12 more

解決方案：

# 檢視HiveConf.class類是否存在
[hadoop@centos-aaron-h1 lib]$ jcd /home/hadoop/apps/apache-hive-1.2.2-bin/lib
[hadoop@centos-aaron-h1 lib]$ jar tf hive-common-1.2.2.jar |grep HiveConf.class
org/apache/hadoop/hive/conf/HiveConf.class
[hadoop@centos-aaron-h1 lib]$   
檢視到HiveConf.class類明明存在，只是環境沒有找到。

修改環境配置，將hive的lib新增HADOOP_CLASSPATH中

#編輯環境變數,並且新增以下內容
vi /etc/profile
export HADOOP_CLASSPATH=/home/hadoop/apps/hadoop-2.9.1/lib/*
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/*
#生效環境變數
source /etc/profile

再次執行，報錯之前匯入emp的臨時目錄已經存在，需要刪除

[hadoop@centos-aaron-h1 bin]$ ./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1
Warning: /home/hadoop/sqoop/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 19:13:03 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/18 19:13:03 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/03/18 19:13:03 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/03/18 19:13:03 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/03/18 19:13:03 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/03/18 19:13:03 INFO tool.CodeGenTool: Beginning code generation
19/03/18 19:13:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 19:13:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 19:13:04 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1
注: /tmp/sqoop-hadoop/compile/d1c8de7d06b0dc6c09379069fe10322a/emp.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
19/03/18 19:13:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/d1c8de7d06b0dc6c09379069fe10322a/emp.jar
19/03/18 19:13:07 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/03/18 19:13:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/03/18 19:13:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/03/18 19:13:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/03/18 19:13:07 INFO mapreduce.ImportJobBase: Beginning import of emp
19/03/18 19:13:08 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/18 19:13:08 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/18 19:13:08 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032
19/03/18 19:13:09 ERROR tool.ImportTool: Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://centos-aaron-h1:9000/user/hadoop/emp already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:279)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:145)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
        at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

解決方案：

 hdfs dfs -rm -r /user/hadoop/emp

再次執行，成功

[hadoop@centos-aaron-h1 bin]$ ./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1
Warning: /home/hadoop/sqoop/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 19:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/18 19:15:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/03/18 19:15:15 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/03/18 19:15:15 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/03/18 19:15:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/03/18 19:15:15 INFO tool.CodeGenTool: Beginning code generation
19/03/18 19:15:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 19:15:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 19:15:15 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1
注: /tmp/sqoop-hadoop/compile/e3a407469bc365c026d8fabf4e264f38/emp.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
19/03/18 19:15:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e3a407469bc365c026d8fabf4e264f38/emp.jar
19/03/18 19:15:17 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/03/18 19:15:17 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/03/18 19:15:17 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/03/18 19:15:17 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/03/18 19:15:17 INFO mapreduce.ImportJobBase: Beginning import of emp
19/03/18 19:15:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/18 19:15:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/18 19:15:19 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032
19/03/18 19:15:20 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/18 19:15:20 INFO mapreduce.JobSubmitter: number of splits:1
19/03/18 19:15:20 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/18 19:15:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0004
19/03/18 19:15:21 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0004
19/03/18 19:15:21 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0004/
19/03/18 19:15:21 INFO mapreduce.Job: Running job: job_1552898029697_0004
19/03/18 19:15:28 INFO mapreduce.Job: Job job_1552898029697_0004 running in uber mode : false
19/03/18 19:15:28 INFO mapreduce.Job:  map 0% reduce 0%
19/03/18 19:15:34 INFO mapreduce.Job:  map 100% reduce 0%
19/03/18 19:15:34 INFO mapreduce.Job: Job job_1552898029697_0004 completed successfully
19/03/18 19:15:34 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=206933
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=151
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3734
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=3734
                Total vcore-milliseconds taken by all map tasks=3734
                Total megabyte-milliseconds taken by all map tasks=3823616
        Map-Reduce Framework
                Map input records=5
                Map output records=5
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=59
                CPU time spent (ms)=540
                Physical memory (bytes) snapshot=129863680
                Virtual memory (bytes) snapshot=1715556352
                Total committed heap usage (bytes)=42860544
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=151
19/03/18 19:15:34 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 15.9212 seconds (9.4842 bytes/sec)
19/03/18 19:15:34 INFO mapreduce.ImportJobBase: Retrieved 5 records.
19/03/18 19:15:34 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp
19/03/18 19:15:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 19:15:34 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive
19/03/18 19:15:34 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
OK
Time taken: 2.138 seconds
Loading data to table default.emp
Table default.emp stats: [numFiles=1, totalSize=151]
OK
Time taken: 0.547 seconds

檢視結果：

hive> [hadoop@centos-aaron-h1 bin]$ hadoop fs -cat /user/hive/warehouse/emp/part-m-00000         
1gopalmanager50000.00TP
2manishaProof reader50000.00TP
3khalilphp dev30000.00AC
4prasanthphp dev30000.00AC
5kranthiadmin20000.00TP

(2)、指定行分隔符和列分隔符，指定hive-import，指定覆蓋匯入，指定自動建立hive表，指定表名，指定刪除中間結果資料目錄

./sqoop import \
--connect jdbc:mysql://centos-aaron-03:3306/test \
--username root \
--password 123456 \
--table emp \
--fields-terminated-by "\t" \
--lines-terminated-by "\n" \
--hive-import \
--hive-overwrite \
--create-hive-table \
--delete-target-dir \
--hive-database  mydb_test \
--hive-table emp

執行到最後報錯hive庫找不到

手動建立mydb_test資料塊

hive>  create database mydb_test;
OK
Time taken: 0.678 seconds
hive>

再次執行，依然報錯找不到hive庫，用命令檢視資料庫是存在的；

解決方法：複製hive/conf下的hive-site.xml到sqoop工作目錄的conf下,實際上該database是在hive中存在的，由於sqoop下的配置檔案太舊引起的，一般會出現在,換臺機器執行sqoopCDH 預設路徑在sqoop下： /etc/hive/conf/hive-site.xml copy到 /etc/sqoop/conf/hive-site.xm

再次執行，成功

hive> [hadoop@centos-aaron-h1 bin]$ cd ~/sqoop/bin
[hadoop@centos-aaron-h1 bin]$ ./sqoop import \
> --connect jdbc:mysql://centos-aaron-03:3306/test \
> --username root \
> --password 123456 \
> --table emp \
> --fields-terminated-by "\t" \
> --lines-terminated-by "\n" \
> --hive-import \
> --hive-overwrite \
> --create-hive-table \
> --delete-target-dir \
> --hive-database  mydb_test \
> --hive-table emp
Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 20:49:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/18 20:49:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/03/18 20:49:59 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/03/18 20:49:59 INFO tool.CodeGenTool: Beginning code generation
19/03/18 20:50:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 20:50:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 20:50:00 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1
注: /tmp/sqoop-hadoop/compile/7a157b339316952d30024e165d5db00d/emp.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
19/03/18 20:50:01 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7a157b339316952d30024e165d5db00d/emp.jar
19/03/18 20:50:03 INFO tool.ImportTool: Destination directory emp deleted.
19/03/18 20:50:03 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/03/18 20:50:03 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/03/18 20:50:03 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/03/18 20:50:03 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/03/18 20:50:03 INFO mapreduce.ImportJobBase: Beginning import of emp
19/03/18 20:50:03 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/18 20:50:03 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/18 20:50:03 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032
19/03/18 20:50:04 INFO mapreduce.JobSubmitter: number of splits:5
19/03/18 20:50:04 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/18 20:50:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0016
19/03/18 20:50:05 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0016
19/03/18 20:50:05 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0016/
19/03/18 20:50:05 INFO mapreduce.Job: Running job: job_1552898029697_0016
19/03/18 20:50:12 INFO mapreduce.Job: Job job_1552898029697_0016 running in uber mode : false
19/03/18 20:50:12 INFO mapreduce.Job:  map 0% reduce 0%
19/03/18 20:50:18 INFO mapreduce.Job:  map 20% reduce 0%
19/03/18 20:50:21 INFO mapreduce.Job:  map 40% reduce 0%
19/03/18 20:50:22 INFO mapreduce.Job:  map 100% reduce 0%
19/03/18 20:50:23 INFO mapreduce.Job: Job job_1552898029697_0016 completed successfully
19/03/18 20:50:23 INFO mapreduce.Job: Counters: 31
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=1034665
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=491
                HDFS: Number of bytes written=151
                HDFS: Number of read operations=20
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=10
        Job Counters 
                Killed map tasks=1
                Launched map tasks=5
                Other local map tasks=5
                Total time spent by all maps in occupied slots (ms)=32416
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=32416
                Total vcore-milliseconds taken by all map tasks=32416
                Total megabyte-milliseconds taken by all map tasks=33193984
        Map-Reduce Framework
                Map input records=5
                Map output records=5
                Input split bytes=491
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=1240
                CPU time spent (ms)=3190
                Physical memory (bytes) snapshot=660529152
                Virtual memory (bytes) snapshot=8577761280
                Total committed heap usage (bytes)=214302720
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=151
19/03/18 20:50:23 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 20.6001 seconds (7.3301 bytes/sec)
19/03/18 20:50:23 INFO mapreduce.ImportJobBase: Retrieved 5 records.
19/03/18 20:50:23 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp
19/03/18 20:50:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 20:50:23 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive
19/03/18 20:50:23 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
OK
Time taken: 1.131 seconds
Loading data to table mydb_test.emp
Table mydb_test.emp stats: [numFiles=5, numRows=0, totalSize=151, rawDataSize=0]
OK
Time taken: 0.575 seconds
[hadoop@centos-aaron-h1 bin]$

檢視結果資料：

[hadoop@centos-aaron-h1 bin]$ hive

Logging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive> show databases;
OK
default
mydb_test
wcc_log
Time taken: 0.664 seconds, Fetched: 3 row(s)
hive> use mydb_test;
OK
Time taken: 0.027 seconds
hive> show tables;
OK
emp
Time taken: 0.038 seconds, Fetched: 1 row(s)
hive> select * from emp;
OK
1       gopal   manager 50000.0 TP
2       manisha Proof reader    50000.0 TP
3       khalil  php dev 30000.0 AC
4       prasanth        php dev 30000.0 AC
5       kranthi admin   20000.0 TP
Time taken: 0.634 seconds, Fetched: 5 row(s)
hive>

上面的語句等價於：

sqoop import  \
--connect jdbc:mysql://centos-aaron-03:3306/test  \
--username root  \
--password 123456  \
--table emp  \
--fields-terminated-by "\t"  \
--lines-terminated-by "\n"  \
--hive-import  \
--hive-overwrite  \
--create-hive-table  \ 
--hive-table  mydb_test.emp  \
--delete-target-dir

(3)、匯入到HDFS指定目錄

在匯入表資料到HDFS使用Sqoop匯入工具，我們可以指定目標目錄。以下是指定目標目錄選項的Sqoop匯入命令的語法:

--target-dir <new or exist directory in HDFS>

下面的命令是用來匯入emp表資料到'/queryresult'目錄。

./sqoop import \
--connect jdbc:mysql://centos-aaron-03:3306/test \
--username root \
--password 123456 \
--target-dir /queryresult \
--table emp --m 1

執行效果

[hadoop@centos-aaron-h1 bin]$ ./sqoop import \
> --connect jdbc:mysql://centos-aaron-03:3306/test \
> --username root \
> --password 123456 \
> --target-dir /queryresult \
> --table emp --m 1
Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 21:00:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/18 21:00:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/03/18 21:00:59 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/03/18 21:00:59 INFO tool.CodeGenTool: Beginning code generation
19/03/18 21:00:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 21:00:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 21:00:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1
注: /tmp/sqoop-hadoop/compile/433dbe7d1d24f817e00a85bf0d78eb42/emp.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
19/03/18 21:01:01 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/433dbe7d1d24f817e00a85bf0d78eb42/emp.jar
19/03/18 21:01:01 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/03/18 21:01:01 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/03/18 21:01:01 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/03/18 21:01:01 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/03/18 21:01:01 INFO mapreduce.ImportJobBase: Beginning import of emp
19/03/18 21:01:01 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/18 21:01:02 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/18 21:01:02 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032
19/03/18 21:01:04 INFO mapreduce.JobSubmitter: number of splits:1
19/03/18 21:01:04 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/18 21:01:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0017
19/03/18 21:01:04 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0017
19/03/18 21:01:04 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0017/
19/03/18 21:01:04 INFO mapreduce.Job: Running job: job_1552898029697_0017
19/03/18 21:01:11 INFO mapreduce.Job: Job job_1552898029697_0017 running in uber mode : false
19/03/18 21:01:11 INFO mapreduce.Job:  map 0% reduce 0%
19/03/18 21:01:17 INFO mapreduce.Job:  map 100% reduce 0%
19/03/18 21:01:17 INFO mapreduce.Job: Job job_1552898029697_0017 completed successfully
19/03/18 21:01:17 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=206929
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=151
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3157
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=3157
                Total vcore-milliseconds taken by all map tasks=3157
                Total megabyte-milliseconds taken by all map tasks=3232768
        Map-Reduce Framework
                Map input records=5
                Map output records=5
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=60
                CPU time spent (ms)=530
                Physical memory (bytes) snapshot=133115904
                Virtual memory (bytes) snapshot=1715552256
                Total committed heap usage (bytes)=42860544
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=151
19/03/18 21:01:17 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 14.555 seconds (10.3744 bytes/sec)
19/03/18 21:01:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.

檢視資料結果：

[hadoop@centos-aaron-h1 bin]$ hdfs dfs -ls /queryresult
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2019-03-18 21:01 /queryresult/_SUCCESS
-rw-r--r--   2 hadoop supergroup        151 2019-03-18 21:01 /queryresult/part-m-00000
[hadoop@centos-aaron-h1 bin]$ hdfs dfs -cat /queryresult/part-m-00000 
1,gopal,manager,50000.00,TP
2,manisha,Proof reader,50000.00,TP
3,khalil,php dev,30000.00,AC
4,prasanth,php dev,30000.00,AC
5,kranthi,admin,20000.00,TP
[hadoop@centos-aaron-h1 bin]$

(4)、匯入表資料子集
我們可以匯入表的使用Sqoop匯入工具，"where"子句的一個子集。它執行在各自的資料庫伺服器相應的SQL查詢，並將結果儲存在HDFS的目標目錄。
where子句的語法如下:

--where <condition>

下面的命令用來匯入emp表資料的子集。子集查詢檢索員工ID為3，

./sqoop import \
--connect jdbc:mysql://centos-aaron-03:3306/test \
--username root \
--password 123456 \
--where "id =3 " \
--target-dir /wherequery \
--table emp --m 1

執行效果

(5)、按需匯入

./sqoop import \
--connect jdbc:mysql://centos-aaron-03:3306/test \
--username root \
--password 123456 \
--target-dir /wherequery2 \
--query 'select id,name,deg from emp WHERE  id>2 and $CONDITIONS' \
--split-by id \
--fields-terminated-by '\t' \
--m 1

執行效果

(6)、增量匯入

我們可以匯入表的使用Sqoop匯入工具，"where"子句的一個子集。它執行在各自的資料庫伺服器相應的SQL查詢，並將結果儲存在HDFS的目標目錄。增量匯入是僅匯入新新增的表中的行的技術。它需要新增‘incremental’, ‘check-column’, 和 ‘last-value’選項來執行增量匯入。
下面的語法用於Sqoop匯入命令增量選項:

--incremental <mode>
--check-column <column name>
--last value <last check column value>

假設新新增的資料轉換成emp表如下：

6, satish p, grp des, 20000, GR

下面的命令用於在emp表執行增量匯入:

./sqoop import \
--connect jdbc:mysql://centos-aaron-03:3306/test \
--username root \
--password 123456 \
--table emp --m 1 \
--target-dir /wherequery \
--incremental append \
--check-column id \
--last-value 5

執行效果：

二、Sqoop的資料匯出

將資料從HDFS匯出到RDBMS資料庫；匯出前，目標表必須存在於目標資料庫中；預設操作是將檔案中的資料使用INSERT語句插入到表中；更新模式下，是生成UPDATE語句更新表資料；

語法：

以下是export命令語法

sqoop export (generic-args) (export-args)

示例：

資料是在HDFS 中“/queryresult ”目錄的hdfs dfs -cat /queryresult/part-m-00000檔案中。所述hdfs dfs -cat /queryresult/part-m-00000如下：

1,gopal,manager,50000.00,TP
2,manisha,Proof reader,50000.00,TP
3,khalil,php dev,30000.00,AC
4,prasanth,php dev,30000.00,AC
5,kranthi,admin,20000.00,TP

(1)、首先需要手動建立mysql中的目標表

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| azkaban            |
| hive               |
| hivedb             |
| mysql              |
| performance_schema |
| test               |
| urldb              |
| web_log_wash       |
+--------------------+
9 rows in set (0.00 sec)

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> CREATE TABLE employee ( 
    ->    id INT NOT NULL PRIMARY KEY, 
    ->    name VARCHAR(20), 
    ->    deg VARCHAR(20),
    ->    salary INT,
    ->    dept VARCHAR(10));
Query OK, 0 rows affected (0.02 sec)
Aborted

(2)、然後執行匯出命令

./sqoop export \
--connect "jdbc:mysql://centos-aaron-03:3306/test?useUnicode=true&characterEncoding=utf-8" \
--username root \
--password 123456 \
--table employee \
--fields-terminated-by ","  \
--export-dir /queryresult/part-m-00000 \
--columns="id,name,deg,salary,dept"

報錯

具體問題是資料中有中文，而資料庫表編碼不支援
解決方案如下：
將表的資料匯出，刪除表後重新建立表，指定編碼DEFAULT CHARSET=utf8

繼續報錯，分析確認hdfs上資料內容與建表時的int欄位不匹配，需要將表的int改為decimal型別

繼續執行，成功

驗證效果：

三、Sqoop作業

注：Sqoop作業——將事先定義好的資料匯入匯出任務按照指定流程執行

語法：

以下是建立Sqoop作業的語法

$ sqoop job (generic-args) (job-args)
   [-- [subtool-name] (subtool-args)]

建立作業(--create)

在這裡，我們建立一個名為myjob，這可以從RDBMS表的資料匯入到HDFS作業

#該命令建立了一個從db庫的employee表匯入到HDFS檔案的作業
./sqoop job --create myimportjob -- import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --m 1

驗證作業 (--list)

‘--list’ 引數是用來驗證儲存的作業。下面的命令用來驗證儲存Sqoop作業的列表。

#它顯示了儲存作業列表。
sqoop job --list

檢查作業(--show)
‘--show’ 引數用於檢查或驗證特定的工作，及其詳細資訊。以下命令和樣本輸出用來驗證一個名為myjob的作業。

#它顯示了工具和它們的選擇，這是使用在myjob中作業情況。
sqoop job --show myjob

[hadoop@centos-aaron-h1 bin]$ sqoop job --show myimportjob
Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 22:46:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password: 
Job: myimportjob
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
db.connect.string = jdbc:mysql://centos-aaron-03:3306/test
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
split.limit = null
hbase.create.table = false
mainframe.input.dataset.type = p
db.require.password = true
skip.dist.cache = false
hdfs.append.dir = false
db.table = emp
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = root
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
customtool.options.jsonmap = {}
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = None
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-hadoop/compile/e0ba9288d4916ac38fdbbe98737f9829
direct.import = false
temporary.dirRoot = _sqoop
hive.fail.table.exists = false
db.batch = false
[hadoop@centos-aaron-h1 bin]$

執行作業 (--exec)

‘--exec’ 選項用於執行儲存的作業。下面的命令用於執行儲存的作業稱為myjob

sqoop job --exec myjob
#正常情況它會顯示下面的輸出。
10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation

報錯：

分析是由於mysql訪問許可權引起，需要修改資料庫許可權：

#123456表示資料庫連線密碼
grant all privileges on *.* to root@'%' identified by '123456' ;
FLUSH PRIVILEGES;

再次執行sqoop job,成功

[hadoop@centos-aaron-h1 bin]$ sqoop job --exec myimportjob
Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 23:02:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password: 
19/03/18 23:02:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/03/18 23:02:11 INFO tool.CodeGenTool: Beginning code generation
19/03/18 23:02:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 23:02:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 23:02:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1
注: /tmp/sqoop-hadoop/compile/ea795ab1037c940352cf3f7d5af2728f/emp.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
19/03/18 23:02:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/ea795ab1037c940352cf3f7d5af2728f/emp.jar
19/03/18 23:02:13 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/03/18 23:02:13 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/03/18 23:02:13 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/03/18 23:02:13 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/03/18 23:02:13 INFO mapreduce.ImportJobBase: Beginning import of emp
19/03/18 23:02:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/18 23:02:14 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/18 23:02:14 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032
19/03/18 23:02:16 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/18 23:02:16 INFO mapreduce.JobSubmitter: number of splits:1
19/03/18 23:02:16 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/18 23:02:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0030
19/03/18 23:02:17 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0030
19/03/18 23:02:17 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0030/
19/03/18 23:02:17 INFO mapreduce.Job: Running job: job_1552898029697_0030
19/03/18 23:02:24 INFO mapreduce.Job: Job job_1552898029697_0030 running in uber mode : false
19/03/18 23:02:24 INFO mapreduce.Job:  map 0% reduce 0%
19/03/18 23:02:30 INFO mapreduce.Job:  map 100% reduce 0%
19/03/18 23:02:30 INFO mapreduce.Job: Job job_1552898029697_0030 completed successfully
19/03/18 23:02:30 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=207365
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=180
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3466
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=3466
                Total vcore-milliseconds taken by all map tasks=3466
                Total megabyte-milliseconds taken by all map tasks=3549184
        Map-Reduce Framework
                Map input records=6
                Map output records=6
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=63
                CPU time spent (ms)=590
                Physical memory (bytes) snapshot=132681728
                Virtual memory (bytes) snapshot=1715552256
                Total committed heap usage (bytes)=42860544
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=180
19/03/18 23:02:30 INFO mapreduce.ImportJobBase: Transferred 180 bytes in 15.5112 seconds (11.6045 bytes/sec)
19/03/18 23:02:30 INFO mapreduce.ImportJobBase: Retrieved 6 records.
[hadoop@centos-aaron-h1 bin]$

四、Sqoop的原理

概述：Sqoop的原理其實就是將匯入匯出命令轉化為mapreduce程式來執行，sqoop在接收到命令後，都要生成mapreduce程式；使用sqoop的程式碼生成工具可以方便檢視到sqoop所生成的java程式碼，並可在此基礎之上進行深入定製開發。

程式碼定製：

以下是Sqoop程式碼生成命令的語法

$ sqoop-codegen (generic-args) (codegen-args)

示例：以USERDB資料庫中的表emp來生成Java程式碼為例。
下面的命令用來生成匯入

sqoop codegen --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp -bindir .

如果命令成功執行，那麼它就會產生如下的輸出

[hadoop@centos-aaron-h1 bin]$ sqoop codegen --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp -bindir .
Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 23:21:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/18 23:21:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/03/18 23:21:24 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/03/18 23:21:24 INFO tool.CodeGenTool: Beginning code generation
19/03/18 23:21:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 23:21:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/03/18 23:21:24 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1
注: ./emp.java使用或覆蓋了已過時的 API。
注: 有關詳細資訊, 請使用 -Xlint:deprecation 重新編譯。
19/03/18 23:21:26 INFO orm.CompilationManager: Writing jar file: ./emp.jar
[hadoop@centos-aaron-h1 bin]$ ll

驗證: 檢視輸出目錄下的檔案

如果想做深入定製匯出，則可修改上述程式碼檔案。

最後寄語，以上是博主本次文章的全部內容，如果大家覺得博主的文章還不錯，請點贊；如果您對博主其它伺服器大資料技術或者博主本人感興趣，請關注博主部落格，並且歡迎隨時跟

大資料教程（13.6）sqoop使用教程

大資料教程（13.6）sqoop使用教程

大資料教程（8.6）yarn客戶端提交job的流程梳理和總結&自定義partition程式設計

大資料教程（9.6）map端join實現

大資料教程（13.2）Flume多個agent連線

大資料教程（13.3）azkaban簡介&安裝

大資料教程（13.4）azkaban例項演示

大資料入門（13）zookeeper的安裝配置

大資料學習之小白如何學大資料？（詳細篇）

中國旅遊研究院：2018中日韓旅遊大資料報告（附下載）

CBNData：2018年輕人租房大資料報告（附下載）

天巡：2018十一黃金週出境自由行大資料報告（附下載）

OpenResty安裝使用教程（CentOS 6）

oracle大資料表（千萬級）修改，刪除優化技巧【轉】

Python開發Kettle做大資料ETL（前期準備）

EXCEL大資料匯出（100W條）

機器學習競賽分享：NFL大資料碗（上篇）

大資料入門（16）mysql5.6.26的rpm方式安裝

大資料入門（6）hdfs的客戶端java

大資料之（6）hbase2.1.1版本全分散式安裝及使用

大資料教程（8.4）移動流量分析案例

大資料教程（13.6）sqoop使用教程

相關推薦