1. 程式人生 > >hive0.14-insert、update、delete操作測試

hive0.14-insert、update、delete操作測試

問題導讀

1.測試insert報錯,該如何解決?
2.hive delete和update報錯,該如何解決?
3.什麼情況下才允許delete和update?






首先用最普通的建表語句建一個表:
  1. hive>create table test(id int,name string)row format delimited fields terminated by ',';  
複製程式碼 測試insert:

  1. insert into table test values (1,'row1'),(2,'row2');  
複製程式碼

結果報錯:
  1. java.io.FileNotFoundException: File does not exist: hdfs://127.0.0.1:9000/home/hadoop/git/hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/
  2. apache-hive-0.14.0-SNAPSHOT-bin/lib/curator-client-2.6.0.jar
  3.         at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
  4.         at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
  5.         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  6.         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
  7.         at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
  8.         at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
  9.         at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
  10.         at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
  11.         at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
  12.         at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
  13.         at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
  14.         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
  15.         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
  16.         at java.security.AccessController.doPrivileged(Native Method)
  17.         ......
複製程式碼

貌似往hdfs上找jar包了,小問題,直接把lib下的jar包上傳到hdfs
  1. hadoop fs -mkdir -p /home/hadoop/git/hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/lib/
  2. hadoop fs -put $HIVE_HOME/lib/* /home/hadoop/git/hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/lib/
複製程式碼

接著執行insert,沒有問題,接下來測試delete
  1. hive>delete from test where id = 1;  
複製程式碼

報錯!:
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
說是在使用的轉換管理器不支援update跟delete操作。
原來要支援update操作跟delete操作,必須額外再配置一些東西,見:
https://cwiki.apache.org/conflue ... tersforTransactions
根據提示配置hive-site.xml:


  1.     hive.support.concurrency – true
  2.     hive.enforce.bucketing – true
  3.     hive.exec.dynamic.partition.mode – nonstrict
  4.     hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
  5.     hive.compactor.initiator.on – true
  6.     hive.compactor.worker.threads – 1
複製程式碼

配置完以為能夠順利運行了,誰知開始報下面這個錯誤:
  1. FAILED: LockException [Error 10280]: Error communicating with the metastore  
複製程式碼

與元資料庫出現了問題,修改log為DEBUG檢視具體錯誤:
  1. 4-11-04 14:20:14,367 DEBUG [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:findReadyToClean(265)) - Going to execute query <select cq_id,   
  2. cq_database, cq_table, cq_partition, cq_type, cq_run_as from COMPACTION_QUEUE where cq_state = 'r'>  
  3. 2014-11-04 14:20:14,367 ERROR [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:findReadyToClean(285)) - Unable to select next element for cleaning,  
  4. Table 'hive.COMPACTION_QUEUE' doesn't exist  
  5. 2014-11-04 14:20:14,367 DEBUG [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:findReadyToClean(287)) - Going to rollback  
  6. 2014-11-04 14:20:14,368 ERROR [Thread-8]: compactor.Cleaner (Cleaner.java:run(143)) - Caught an exception in the main loop of compactor cleaner, MetaException(message  
  7. :Unable to connect to transaction database com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'hive.COMPACTION_QUEUE' doesn't exist  
  8.     at sun.reflect.GeneratedConstructorAccessor19.newInstance(Unknown Source)  
  9.     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  
  10.     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)  
  11.     at com.mysql.jdbc.Util.handleNewInstance(Util.java:409)  
複製程式碼

在元資料庫中找不到COMPACTION_QUEUE這個表,趕緊去mysql中檢視,確實沒有這個表。怎麼會沒有這個表呢?找了很久都沒找到什麼原因,查原始碼吧。
在org.apache.hadoop.hive.metastore.txn下的TxnDbUtil類中找到了建表語句,順藤摸瓜,找到了下面這個方法會呼叫建表語句:

  1. private void checkQFileTestHack() {  
  2.     boolean hackOn = HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) ||  
  3.         HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEZ_TEST);  
  4.     if (hackOn) {  
  5.       LOG.info("Hacking in canned values for transaction manager");  
  6.       // Set up the transaction/locking db in the derby metastore  
  7.       TxnDbUtil.setConfValues(conf);  
  8.       try {  
  9.         TxnDbUtil.prepDb();  
  10.       } catch (Exception e) {  
  11.         // We may have already created the tables and thus don't need to redo it.  
  12.         if (!e.getMessage().contains("already exists")) {  
  13.           throw new RuntimeException("Unable to set up transaction database for" +  
  14.               " testing: " + e.getMessage());  
  15.         }  
  16.       }  
  17.     }  
  18.   }  
複製程式碼

什麼意思呢,就是說要執行建表語句還有一個條件:HIVE_IN_TEST或者HIVE_IN_TEZ_TEST.只有在測試環境中才能用delete,update操作,也可以理解,畢竟還沒有開發完全。
終於找到原因,解決方法也很簡單:在hive-site.xml中新增下面的配置:

  1. <property>
  2. <name>hive.in.test</name>
  3. <value>true</value>
  4. </property>
複製程式碼

OK,再重新啟動服務,再執行delete:
  1. hive>delete from test where id = 1;
複製程式碼 又報錯:
  1. FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table default.test that does not use an AcidOutputFormat or is not bucketed
複製程式碼 說是要進行delete操作的表test不是AcidOutputFormat或沒有分桶。估計是要求輸出是AcidOutputFormat然後必須分桶
網上查到確實如此,而且目前只有ORCFileformat支援AcidOutputFormat,不僅如此建表時必須指定引數('transactional' = true)。感覺太麻煩了。。。。
於是按照網上示例建表:

  1. hive>create table test(id int ,name string )clustered by (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
複製程式碼 insert
  1. hive>insert into table test values (1,'row1'),(2,'row2'),(3,'row3');
複製程式碼

delete
  1. hive>delete from test where id = 1;
複製程式碼

update
  1. hive>update test set name = 'Raj' where id = 2;
複製程式碼

OK!全部順利執行,不過貌似效率太低了,基本都要30s左右,估計應該可以優化,再研究研究