Alex 的 Hadoop 菜鳥教程: 第12課 Sqoop1 安裝/匯入/匯出教程
阿新 • • 發佈:2019-02-06
原帖地址: http://blog.csdn.net/nsrainbow/article/details/41575807
Sqoop是什麼
sqoop是用於在傳統關係型資料庫跟hdfs之間進行資料匯入匯出的工具。目前sqoop已經出了2,但是截至當前,sqoop2還是個半成品,不支援hbase,功能還很少,所以我還是主要講sqoop1
安裝Sqoop1
- yum install -y sqoop
- # sqoop help
-
Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 14/11/28 11:33:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1
-
usage: sqoop COMMAND [ARGS]
- Available commands:
- codegen Generate code to interact with database records
- create-hive-table Import a table definition into Hive
- eval Evaluate a SQL statement and display the results
- export Export an HDFS directory to a database table
-
help List available commands
- import Import a table from a database to HDFS
- import-all-tables Import tables from a database to HDFS
- job Work with saved jobs
- list-databases List available databases on a server
- list-tables List available tables in a database
- merge Merge results of incremental imports
- metastore Run a standalone Sqoop metastore
- version Display version information
- See 'sqoop help COMMAND' for information on a specific command.
拷貝驅動到 /usr/lib/sqoop/lib
下載後,解壓開找到驅動jar包,upload到伺服器上,然後移過去
- mv /home/alex/mysql-connector-java-5.1.34-bin.jar /usr/lib/sqoop/lib
匯入
資料準備
在mysql中建立sqoop_test庫- createdatabase sqoop_test;
在sqoop_test裡面建立一個表
- CREATETABLE `employee` (
- `id` int(11) NOTNULL,
- `name` varchar(20) NOTNULL,
- PRIMARYKEY (`id`)
- ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
插入幾條資料
- insertinto employee (id,name) values (1,'michael');
- insertinto employee (id,name) values (2,'ted');
- insertinto employee (id,name) values (3,'jack');
匯入mysql到hdfs
列出所有表
我們先不急著匯入,先做幾個準備步驟熱身一下,也方便排查問題 先把mysql的測試使用者設定成可以遠端連線的,因為hadoop會把匯入/匯出任務分發到不同的機器上執行,所以你的資料庫url裡面不能寫localhost而要寫成域名或者IP。本例子中直接用root來測試,所以就改下root的Host (實際生產環境千萬別這麼幹啊!)- mysql> use mysql
- mysql> updateuserset Host='%'where Host='127.0.0.1'andUser='root';
- mysql> flush privileges;
列出所有資料庫
- # sqoop list-databases --connect jdbc:mysql://host1:3306/sqoop_test --username root --password root
- Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 14/12/01 09:20:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1
- 14/12/01 09:20:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 14/12/01 09:20:28 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- information_schema
- cacti
- metastore
- mysql
- sqoop_test
- wordpress
- zabbix
先用sqoop連線上資料庫並列出所有表
- # sqoop list-tables --connect jdbc:mysql://host1/sqoop_test --username root --password root
- Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 14/11/28 11:46:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1
- 14/11/28 11:46:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 14/11/28 11:46:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- employee
- student
- workers
這條命令不用跟驅動的類名是因為sqoop預設支援mysql的,如果要跟jdbc驅動的類名用
- # sqoop list-tables --connect jdbc:mysql://localhost/sqoop_test --username root --password root --driver com.mysql.jdbc.Driver
匯入資料到hdfs
- sqoop import --connect jdbc:mysql://host1:3306/sqoop_test --username root --password root --table employee --m 1 --target-dir /user/test3
- import 代表是匯入任務
- --connect 指定連線的url
- --username 指定使用者名稱
- --password 指定密碼
- --table 指定要匯入的資料來源表
- --m 代表任務併發數,這裡設定成1
- --target-dir 代表匯入後要儲存的hdfs上的資料夾位置
- [[email protected] hadoop-hdfs]# sqoop import --connect jdbc:mysql://host1:3306/sqoop_test --username root --password root --table employee --m 1 --target-dir /user/test3
- Warning: /usr/lib/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 15/01/23 06:48:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.1
- 15/01/23 06:48:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 15/01/23 06:48:11 INFO manager.SqlManager: Using default fetchSize of 1000
- 15/01/23 06:48:11 INFO tool.CodeGenTool: Beginning code generation
- 15/01/23 06:48:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
- 15/01/23 06:48:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
- 15/01/23 06:48:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
- Note: /tmp/sqoop-root/compile/0989201fc3275ff35dc9c41f1031ea42/employee.java uses or overrides a deprecated API.
- Note: Recompile with -Xlint:deprecation for details.
- 15/01/23 06:48:45 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/0989201fc3275ff35dc9c41f1031ea42/employee.jar
- 15/01/23 06:48:47 WARN manager.MySQLManager: It looks like you are importing from mysql.
- 15/01/23 06:48:47 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
- 15/01/23 06:48:47 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
- 15/01/23 06:48:47 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
- 15/01/23 06:48:47 INFO mapreduce.ImportJobBase: Beginning import of employee
- 15/01/23 06:48:57 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
- 15/01/23 06:49:12 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
- 15/01/23 06:49:13 INFO client.RMProxy: Connecting to ResourceManager at host1/192.168.199.126:8032
- 15/01/23 06:50:10 INFO db.DBInputFormat: Using read commited transaction isolation
- 15/01/23 06:50:10 INFO mapreduce.JobSubmitter: number of splits:1
- 15/01/23 06:50:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421771779239_0003
- 15/01/23 06:50:22 INFO impl.YarnClientImpl: Submitted application application_1421771779239_0003
- 15/01/23 06:50:23 INFO mapreduce.Job: The url to track the job: http://host1:8088/proxy/application_1421771779239_0003/
- 15/01/23 06:50:23 INFO mapreduce.Job: Running job: job_1421771779239_0003
- 15/01/23 06:57:10 INFO mapreduce.Job: Job job_1421771779239_0003 running in uber mode : false
- 15/01/23 06:57:16 INFO mapreduce.Job: map 0% reduce 0%
- 15/01/23 06:58:13 INFO mapreduce.Job: map 100% reduce 0%
- 15/01/23 06:58:19 INFO mapreduce.Job: Job job_1421771779239_0003 completed successfully
- 15/01/23 06:58:33 INFO mapreduce.Job: Counters: 30
- File System Counters
- FILE: Number of bytes read=0
- FILE: Number of bytes written=128844
- FILE: Number of read operations=0
- FILE: Number of large read operations=0
- FILE: Number of write operations=0
- HDFS: Number of bytes read=87
- HDFS: Number of bytes written=23