1. 程式人生 > >在hadoop 裡安裝 sqoop 並把mysql資料匯入hdfs

在hadoop 裡安裝 sqoop 並把mysql資料匯入hdfs

hadoop 2.6.0

sqoop:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

然後把sqoop 解壓。這裡解壓發生的問題是,

tar: /home/luis:歸檔中找不到
tar: 由於前次錯誤,將以上次的錯誤狀態退出

tar -xzvf XXXXX -C ~/                  大寫-C 不要忘記加上。

配置環境變數:sudo nano /etc/bash.bashrc

 加上SQOOP_HOME=/home/hadoop/sqoop-1.4.6
PATH=$PATH:$SQOOP_HOME

source/etc/profile
修改配置檔案


在$SQOOP_HOME/conf     目錄下拷貝sqoop-env-template.sh改名為sqoop-env.sh.
cp sqoop-env-template.sh sqoop-env.sh
#Setpath to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/luis/hadoop-2.6.0


#Setpath to where hadoop-*-core.jar is available
#export HADOOP_MAPRED_HOME=/home/luis/hadoop-2.6.0


#setthe path to where bin/hbase is available
export HBASE_HOME=/home/luis/hbase-1.0.1.1


#Setthe path to where bin/hive is available
#export HIVE_HOME=


#Setthe path for where zookeper config dir is
export ZOOCFGDIR=/home/hadoop/zookeeper-3.4.6

修改$SQOOP_HOME/bin/configure-sqoop
註釋掉HCatalog,Accumulo檢查(除非你準備使用HCatalog,Accumulo等HADOOP上的元件)     
##Moved to be a runtime check in sqoop.
#if[ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does notexist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root ofyour HCatalog installation.'
#fi


#if[ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does notexist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the rootof your Accumulo installation.'
#fi


#Add HCatalog to dependency list
#if[ -e "${HCAT_HOME}/bin/hcat" ]; then
# TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat-classpath`
#  if [ -z "${HIVE_CONF_DIR}" ]; then
#   TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
#  fi
#  SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
#fi


#Add Accumulo to dependency list
#if[ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
#  for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*accumulo.*jar |cut -d':' -f2`; do
#    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
#  done
#  for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*zookeeper.*jar |cut -d':' -f2`; do
#    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
#  done
#fi

同時我還把zookeeper 的內容給註釋掉了。因為老是卡在那裡報錯。

#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
#  echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
#fi

就配置完了,然後執行命令,還是會有常見異常:

Streaming result set [email protected]

is still active
執行sqoop指令碼時,出現這個異常是因為mysql的驅動的原因,使用最新的mysql驅動包。

但是即使我更新到最新的java連結mysql的包,放入lib 目錄裡還是會出錯。

原因是,不止一個地方我放入了 java連線mysql 的包,要把所有共享基礎包的地方的連線方式的包都換成最新的。比如黃底位置

$ sudo find -name mysql-connector-java*
./home/luis/下載/mysql-connector-java-5.1.32
./home/luis/下載/mysql-connector-java-5.1.32/mysql-connector-java-5.1.32-bin.jar
./home/luis/weka/weka-3-6-13/lib/mysql-connector-java-5.1.6-bin.jar
./home/luis/sqoop-1.4.6/lib/mysql-connector-java-5.1.32-bin.jar
./home/luis/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev.jython_4.2.0.201507041133/cachedir/packages/mysql-connector-java-5.1.6-bin.pkc
./usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/mysql-connector-java-5.1.32-bin.jar
./usr/share/java/mysql-connector-java-5.1.32-bin.jar

然後可以使用 命令把mysql 匯入 hdfs了

匯入全部 表 sqoop import-all-tables  --connect  jdbc:mysql://192.168.1.113:3306/weibocatch

1、每個表必須都只有一個列作為主鍵;
2、必須將每個表中所有的資料匯入,而不是部分;
3、必須使用預設分隔列,且WHERE子句無任何強加的條件

匯入單張表:

./sqoop import --connect jdbc:mysql://192.168.1.113:3306/weibocatch --username root --password xxxxx --table w_transfer -m1  這裡的m 1 表示的是啟動map 的個數

測試連線

sqoop list-databases --connect jdbc:mysql://172.16.247.140:3306/ --username xxx --password xxx

列出表

sqoop list-tables --connect jdbc:mysql://172.16.247.140:3306/sqoop --username hive --password 123456

然後就可以進到hadoop的hdfs裡頭看看內容了。

hdfs 的命令 ls  和 ls -al  ,rm , rmr