1. 程式人生 > >Hive之執行計劃分析(explain)

Hive之執行計劃分析(explain)

  • Hive是通過把sql轉換成對應mapreduce程式,然後提交到Hadoop上執行,檢視具體的執行計劃可以通過執行explain sql知曉
  • 一條sql會被轉化成由多個階段組成的步驟,每個步驟有執行順序和依賴關係,可以稱之為有向無環圖(DAG:Directed Acyclic Graph)
  • 這些步驟可能包含:元資料的操作,檔案系統的操作,map/reduce計算等
  • 語法格式:
EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION] query
  • explain輸出內容包括:
    • 抽象語法樹
    • 執行計劃不同階段的依賴關係
    • 各個階段的描述
  • extended輸出更加詳細的資訊
  • denpendency輸出依賴的資料來源
  • authorization輸出執行sql授權資訊
  • locks 輸出鎖情況
  • vectorization相關
    • Adds detail to the EXPLAIN output showing why Map and Reduce work is not vectorized.
    • Syntax: EXPLAIN VECTORIZATION [ONLY] [SUMMARY|OPERATOR|EXPRESSION|DETAIL]
    • ONLY option suppresses most non-vectorization elements.
    • SUMMARY (default) shows vectorization information for the PLAN (is vectorization enabled) and a summary of Map and Reduce work.
    • OPERATOR shows vectorization information for operators. E.g. Filter Vectorization. Includes all information of SUMMARY.
    • EXPRESSION shows vectorization information for expressions. E.g. predicateExpression. Includes all information of SUMMARY and OPERATOR.
    • DETAIL shows detail-level vectorization information. It includes all information of SUMMARY, OPERATOR, and EXPRESSION.
  • 帶上FORMATTED 關鍵子,可以json格式輸出
  • sort order: +表示升序 -表示降序
  • 大概瞭解一下相關的執行情況
# explain預設
0: jdbc:hive2://> explain select * from sort_test sort by id desc limit 10;
+--------------------------------------------------------------------------------------------------+--+
|                                             Explain                                              |
+--------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                              |
|   Stage-1 is a root stage                                                                        |
|   Stage-2 depends on stages: Stage-1                                                             |
|   Stage-0 depends on stages: Stage-2                                                             |
|                                                                                                  |
| STAGE PLANS:                                                                                     |
|   Stage: Stage-1                                                                                 |
|     Map Reduce                                                                                   |
|       Map Operator Tree:                                                                         |
|           TableScan                                                                              |
|             alias: sort_test                                                                     |
|             Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE      |
|             Select Operator                                                                      |
|               expressions: id (type: int), name (type: string)                                   |
|               outputColumnNames: _col0, _col1                                                    |
|               Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE    |
|               Reduce Output Operator                                                             |
|                 key expressions: _col0 (type: int)                                               |
|                 sort order: -                                                                    |
|                 Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE  |
|                 value expressions: _col1 (type: string)                                          |
|       Reduce Operator Tree:                                                                      |
|         Select Operator                                                                          |
|           expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string)                |
|           outputColumnNames: _col0, _col1                                                        |
|           Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE        |
|           Limit                                                                                  |
|             Number of rows: 10                                                                   |
|             Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE      |
|             File Output Operator                                                                 |
|               compressed: false                                                                  |
|               table:                                                                             |
|                   input format: org.apache.hadoop.mapred.SequenceFileInputFormat                 |
|                   output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat       |
|                   serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe                |
|                                                                                                  |
|   Stage: Stage-2                                                                                 |
|     Map Reduce                                                                                   |
|       Map Operator Tree:                                                                         |
|           TableScan                                                                              |
|             Reduce Output Operator                                                               |
|               key expressions: _col0 (type: int)                                                 |
|               sort order: -                                                                      |
|               Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE    |
|               value expressions: _col1 (type: string)                                            |
|       Reduce Operator Tree:                                                                      |
|         Select Operator                                                                          |
|           expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string)                |
|           outputColumnNames: _col0, _col1                                                        |
|           Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE        |
|           Limit                                                                                  |
|             Number of rows: 10                                                                   |
|             Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE      |
|             File Output Operator                                                                 |
|               compressed: false                                                                  |
|               Statistics: Num rows: 8 Data size: 890 Basic stats: COMPLETE Column stats: NONE    |
|               table:                                                                             |
|                   input format: org.apache.hadoop.mapred.TextInputFormat                         |
|                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat      |
|                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                      |
|                                                                                                  |
|   Stage: Stage-0                                                                                 |
|     Fetch Operator                                                                               |
|       limit: 10                                                                                  |
|       Processor Tree:                                                                            |
|         ListSink                                                                                 |
|                                                                                                  |
+--------------------------------------------------------------------------------------------------+--+

# authorization
0: jdbc:hive2://> explain formatted authorization  select * from sort_test sort by id desc limit 10;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
|                                                                                                               Explain                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| {"CURRENT_USER":"root","OPERATION":"SWITCHDATABASE","INPUTS":["[email protected]_test"],"OUTPUTS":["hdfs://master:9000/tmp/hive/root/fac1e10c-babb-4927-886e-411b3e9190fb/hive_2018-10-18_11-04-47_534_1155924552647075339-1/-mr-10000"]}  |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+

參考資料

【0】Hive wiki - LanguageManual Explain

【1】hive入門學習:explain執行計劃的理解