在hadoop 裡安裝 sqoop 並把mysql資料匯入hdfs
hadoop 2.6.0
sqoop:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
然後把sqoop 解壓。這裡解壓發生的問題是,
tar: /home/luis:歸檔中找不到
tar: 由於前次錯誤,將以上次的錯誤狀態退出
tar -xzvf XXXXX -C ~/ 大寫-C 不要忘記加上。
配置環境變數:sudo nano /etc/bash.bashrc
加上SQOOP_HOME=/home/hadoop/sqoop-1.4.6
PATH=$PATH:$SQOOP_HOME
source/etc/profile
修改配置檔案
在$SQOOP_HOME/conf 目錄下拷貝sqoop-env-template.sh改名為sqoop-env.sh.
cp sqoop-env-template.sh sqoop-env.sh
#Setpath to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/luis/hadoop-2.6.0
#Setpath to where hadoop-*-core.jar is available
#export HADOOP_MAPRED_HOME=/home/luis/hadoop-2.6.0
#setthe path to where bin/hbase is available
export HBASE_HOME=/home/luis/hbase-1.0.1.1
#Setthe path to where bin/hive is available
#export HIVE_HOME=
#Setthe path for where zookeper config dir is
export ZOOCFGDIR=/home/hadoop/zookeeper-3.4.6
修改$SQOOP_HOME/bin/configure-sqoop
註釋掉HCatalog,Accumulo檢查(除非你準備使用HCatalog,Accumulo等HADOOP上的元件)
##Moved to be a runtime check in sqoop.
#if[ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does notexist! HCatalog jobs will fail."
# echo 'Please set $HCAT_HOME to the root ofyour HCatalog installation.'
#fi
#if[ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does notexist! Accumulo imports will fail."
# echo 'Please set $ACCUMULO_HOME to the rootof your Accumulo installation.'
#fi
#Add HCatalog to dependency list
#if[ -e "${HCAT_HOME}/bin/hcat" ]; then
# TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat-classpath`
# if [ -z "${HIVE_CONF_DIR}" ]; then
# TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
# fi
# SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
#fi
#Add Accumulo to dependency list
#if[ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
# for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*accumulo.*jar |cut -d':' -f2`; do
# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
# done
# for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*zookeeper.*jar |cut -d':' -f2`; do
# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
# done
#fi
同時我還把zookeeper 的內容給註釋掉了。因為老是卡在那裡報錯。
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
# echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
#fi
就配置完了,然後執行命令,還是會有常見異常:
Streaming result set [email protected]
執行sqoop指令碼時,出現這個異常是因為mysql的驅動的原因,使用最新的mysql驅動包。
但是即使我更新到最新的java連結mysql的包,放入lib 目錄裡還是會出錯。
原因是,不止一個地方我放入了 java連線mysql 的包,要把所有共享基礎包的地方的連線方式的包都換成最新的。比如黃底位置
$ sudo find -name mysql-connector-java*
./home/luis/下載/mysql-connector-java-5.1.32
./home/luis/下載/mysql-connector-java-5.1.32/mysql-connector-java-5.1.32-bin.jar
./home/luis/weka/weka-3-6-13/lib/mysql-connector-java-5.1.6-bin.jar
./home/luis/sqoop-1.4.6/lib/mysql-connector-java-5.1.32-bin.jar
./home/luis/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev.jython_4.2.0.201507041133/cachedir/packages/mysql-connector-java-5.1.6-bin.pkc
./usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/mysql-connector-java-5.1.32-bin.jar
./usr/share/java/mysql-connector-java-5.1.32-bin.jar
然後可以使用 命令把mysql 匯入 hdfs了
匯入全部 表 sqoop import-all-tables --connect jdbc:mysql://192.168.1.113:3306/weibocatch
1、每個表必須都只有一個列作為主鍵;
2、必須將每個表中所有的資料匯入,而不是部分;
3、必須使用預設分隔列,且WHERE子句無任何強加的條件
匯入單張表:
./sqoop import --connect jdbc:mysql://192.168.1.113:3306/weibocatch --username root --password xxxxx --table w_transfer -m1 這裡的m 1 表示的是啟動map 的個數
測試連線
sqoop list-databases --connect jdbc:mysql://172.16.247.140:3306/ --username xxx --password xxx
列出表
sqoop list-tables --connect jdbc:mysql://172.16.247.140:3306/sqoop --username hive --password 123456
然後就可以進到hadoop的hdfs裡頭看看內容了。
hdfs 的命令 ls 和 ls -al ,rm , rmr