1. 程式人生 > >HBase實戰案例之使用Scanner獲取資料

HBase實戰案例之使用Scanner獲取資料

HBase 實戰案例之使用Scanner獲取資料

1.Java API 簡介

1.1 getScanner()

getScanner方法有三個過載模型,分別如下:

  • getScanner(Scan scan)
  /**
   * Returns a scanner on the current table as specified by the {@link Scan}
   * object.
   * 返回當前表上由Scan物件指定的一個scanner
   * 
   * Note that the passed {@link Scan}'s start row and caching properties
   * maybe changed.
   *注意:傳遞的Scan的起始行以及緩衝引數可能會被改變【這是什麼意思?】
   
   * @param scan A configured {@link Scan} object.
   * @return A scanner.
   * @throws IOException if a remote or network exception occurs.
   * @since 0.20.0
   */
ResultScanner getScanner(Scan scan) throws IOException;
  • getScanner(byte[] family)
 /**
   * Gets a scanner on the current table for the given family.
   * 在當前的表,以及指定的列族上獲取一個scanner(掃描器)
   
   * @param family The column family to scan.
   * @return A scanner.
   * @throws IOException if a remote or network exception occurs.
   * @since 0.20.0
   */
ResultScanner getScanner(byte[] family) throws IOException;
  • getScanner(byte[] family, byte[] qualifier)
  /**
   * Gets a scanner on the current table for the given family and qualifier.
   * 返回一個當前表中給定的列族和限定符所表示的scanner
   * 
   * @param family The column family to scan.
   * @param qualifier The column qualifier to scan.
   * @return A scanner.
   * @throws IOException if a remote or network exception occurs.
   * @since 0.20.0
   */
ResultScanner getScanner(byte[] family, byte[] qualifier) throws IOException;

2.實戰程式碼

2.1 分別針對上述api,進行測試。在測試之前,請看tsdb-uid表中的資料,如下:
 \x00                                               column=id:metrics, timestamp=1541500656882, value=\x00\x00\x00\x00\x00\x00\x00\x05                                                                    
 \x00                                               column=id:tagk, timestamp=1535982247222, value=\x00\x00\x00\x00\x00\x00\x00\x03                                                                       
 \x00                                               column=id:tagv, timestamp=1541425665699, value=\x00\x00\x00\x00\x00\x00\x00\x08                                                                       
 \x00\x00\x01                                       column=name:metrics, timestamp=1531479245132, value=mytest.cpu                                                                                        
 \x00\x00\x01                                       column=name:tagk, timestamp=1531479245162, value=host                                                                                                 
 \x00\x00\x01                                       column=name:tagv, timestamp=1531479245189, value=server4                                                                                              
 \x00\x00\x02                                       column=name:metrics, timestamp=1535891521172, value=metric-t                                                                                          
 \x00\x00\x02                                       column=name:tagk, timestamp=1535891521198, value=chl                                                                                                  
 \x00\x00\x02                                       column=name:tagv, timestamp=1531479264404, value=server5                                                                                              
 \x00\x00\x03                                       column=name:metrics, timestamp=1535982247205, value=csdn                                                                                              
 \x00\x00\x03                                       column=name:tagk, timestamp=1535982247230, value=accessNumber                                                                                         
 \x00\x00\x03                                       column=name:tagv, timestamp=1531485413194, value=s485276                                                                                              
 \x00\x00\x04                                       column=name:metrics, timestamp=1541426336083, value=test                                                                                              
 \x00\x00\x04                                       column=name:tagv, timestamp=1535891521217, value=hqdApp                                                                                               
 \x00\x00\x05                                       column=name:metrics, timestamp=1541500656917, value=test_meta                                                                                         
 \x00\x00\x05                                       column=name:tagv, timestamp=1535982247253, value=cs                                                                                                   
 \x00\x00\x06                                       column=name:tagv, timestamp=1537103490275, value=Firminal                                                                                             
 \x00\x00\x07                                       column=name:tagv, timestamp=1541425665353, value=lawson                                                                                               
 \x00\x00\x08                                       column=name:tagv, timestamp=1541425665725, value=firminal                                                                                             
 Firminal                                           column=id:tagv, timestamp=1537103490289, value=\x00\x00\x06                                                                                           
 accessNumber                                       column=id:tagk, timestamp=1535982247235, value=\x00\x00\x03                                                                                           
 chl                                                column=id:tagk, timestamp=1535891521203, value=\x00\x00\x02                                                                                           
 cs                                                 column=id:tagv, timestamp=1535982247259, value=\x00\x00\x05                                                                                           
 csdn                                               column=id:metrics, timestamp=1535982247213, value=\x00\x00\x03                                                                                        
 firminal                                           column=id:tagv, timestamp=1541425665756, value=\x00\x00\x08                                                                                           
 host                                               column=id:tagk, timestamp=1531479245177, value=\x00\x00\x01                                                                                           
 hqdApp                                             column=id:tagv, timestamp=1535891521224, value=\x00\x00\x04                                                                                           
 lawson                                             column=id:tagv, timestamp=1541425665366, value=\x00\x00\x07                                                                                           
 metric-t                                           column=id:metrics, timestamp=1535891521182, value=\x00\x00\x02                                                                                        
 mytest.cpu                                         column=id:metrics, timestamp=1531479245145, value=\x00\x00\x01                                                                                        
 s485276                                            column=id:tagv, timestamp=1531485413204, value=\x00\x00\x03                                                                                           
 server4                                            column=id:tagv, timestamp=1531479245192, value=\x00\x00\x01                                                                                           
 server5                                            column=id:tagv, timestamp=1531479264407, value=\x00\x00\x02                                                                                           
 test                                               column=id:metrics, timestamp=1541426336086, value=\x00\x00\x04                                                                                        
 test_meta                                          column=id:metrics, timestamp=1541500656927, value=\x00\x00\x05                                                                                        
25 row(s) in 0.7650 seconds
  • 使用 columnFamily作為引數
public static void getRowByScan(String tableName, String columnFamily) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            ResultScanner resultScanner = table.getScanner(Bytes.toBytes(columnFamily));// get cf's data
            for(Result res: resultScanner){
                System.out.println(res);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

執行結果如下:

keyvalues={\x00\x00\x01/name:metrics/1531479245132/Put/vlen=10/seqid=0, \x00\x00\x01/name:tagk/1531479245162/Put/vlen=4/seqid=0, \x00\x00\x01/name:tagv/1531479245189/Put/vlen=7/seqid=0}
keyvalues={\x00\x00\x02/name:metrics/1535891521172/Put/vlen=8/seqid=0, \x00\x00\x02/name:tagk/1535891521198/Put/vlen=3/seqid=0, \x00\x00\x02/name:tagv/1531479264404/Put/vlen=7/seqid=0}
keyvalues={\x00\x00\x03/name:metrics/1535982247205/Put/vlen=4/seqid=0, \x00\x00\x03/name:tagk/1535982247230/Put/vlen=12/seqid=0, \x00\x00\x03/name:tagv/1531485413194/Put/vlen=7/seqid=0}
keyvalues={\x00\x00\x04/name:metrics/1541426336083/Put/vlen=4/seqid=0, \x00\x00\x04/name:tagv/1535891521217/Put/vlen=6/seqid=0}
keyvalues={\x00\x00\x05/name:metrics/1541500656917/Put/vlen=9/seqid=0, \x00\x00\x05/name:tagv/1535982247253/Put/vlen=2/seqid=0}
keyvalues={\x00\x00\x06/name:tagv/1537103490275/Put/vlen=8/seqid=0}
keyvalues={\x00\x00\x07/name:tagv/1541425665353/Put/vlen=6/seqid=0}
keyvalues={\x00\x00\x08/name:tagv/1541425665725/Put/vlen=8/seqid=0}

可以看到程式碼中的一個res其實是一個 Keyvalues,因為同行中的資料不等,於是得到的總資料就是8行。

  • 使用Scan作為引數
     public static void getRowByScan(String tableName) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Scan scan = new Scan();
            scan.setStartRow("server4".getBytes());

            ResultScanner resultScanner = table.getScanner(scan);// get cf's data
            for(Result res: resultScanner){
                System.out.println(res);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

執行結果如下:

keyvalues={server4/id:tagv/1531479245192/Put/vlen=3/seqid=0}
keyvalues={server5/id:tagv/1531479264407/Put/vlen=3/seqid=0}
keyvalues={test/id:metrics/1541426336086/Put/vlen=3/seqid=0}
keyvalues={test_meta/id:metrics/1541500656927/Put/vlen=3/seqid=0}
  • 使用columnFamily,qualifier作為引數
public static void getRowByScanThree(String tableName,String family,String qualifier) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));

            ResultScanner resultScanner = table.getScanner(family.getBytes(),qualifier.getBytes());// get cf's data
            for(Result res: resultScanner){
                System.out.println(res);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

執行結果如下:

keyvalues={\x00\x00\x01/name:metrics/1531479245132/Put/vlen=10/seqid=0}
keyvalues={\x00\x00\x02/name:metrics/1535891521172/Put/vlen=8/seqid=0}
keyvalues={\x00\x00\x03/name:metrics/1535982247205/Put/vlen=4/seqid=0}
keyvalues={\x00\x00\x04/name:metrics/1541426336083/Put/vlen=4/seqid=0}
keyvalues={\x00\x00\x05/name:metrics/1541500656917/Put/vlen=9/seqid=0}
2.2 輸出 Keyvalue的值

上面的輸出將表中一整行的資料作為一個 Keyvalue物件儲存,但是如何單獨取出 Keyvalue中的值呢?比如說,我想取出rowKey=? value=? timestamp=?等。程式碼如下:

public static void getRowValue(String tableName,String family,String qualifier) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));

            ResultScanner resultScanner = table.getScanner(family.getBytes(),qualifier.getBytes());// get cf's data
            for(Result res: resultScanner){
                //System.out.println(res);
                for (KeyValue kv : res.raw()) {

                    byte []temp = new byte[]{};
                    temp = kv.getRow();
                    System.out.print("rowKey: ");
                    for(int i = 0;i<temp.length;i++){
                        System.out.print(temp[i]);
                    }
                    System.out.println(" value: "+Bytes.toString(kv.getValue()) +" timestamp: "+(kv.getTimestamp()));
                }

            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

執行結果

rowKey: 001, value: mytest.cpu, timestamp: 1531479245132
rowKey: 002, value: metric-t, timestamp: 1535891521172
rowKey: 003, value: csdn, timestamp: 1535982247205
rowKey: 004, value: test, timestamp: 1541426336083
rowKey: 005, value: test_meta, timestamp: 1541500656917

因為在表tsdb-uidrowKey是一個位元組陣列,所以無法將其直接轉為String,於是在上面的程式碼裡,使用的是for()迴圈輸出rowKey