談談hive中join下on和where

本文為部落格園作者所寫：一寸HUI，個人部落格地址：https://www.cnblogs.com/zsql/

很多人如果先接觸mysql的執行順序（from ->on ->join ->where ->group by ->having ->select ->distinct ->order by ->limit），可能會對hive中的on和where會產生一些誤解，網上也有一些部落格寫了關於這些內容的，但是自己也還是想自己親自試驗一波，本文主要從inner join，left/right join和full join條件下去區別on和where，以及加上分割槽的條件，其實搞懂這些對寫hql有很大的幫助，要麼可以更加的簡潔，要麼更優，接下來就開始實踐。

版本：本文使用CDH 6.3.2 的版本，hive 2.1.1+cdh6.3.2進行測試的

一、試驗表和資料

1.1、建表

create  table `user`(  -- 使用者表，分割槽表
department_id int,
age int,
sex string,
name string
)
PARTITIONED BY (`date` string)
row format delimited
fields terminated by ','
STORED AS TEXTFILE;


create table department(  -- 部門表
id int,
name string,
count int
)
row format delimited
fields terminated by ','
STORED AS TEXTFILE;

1.2、資料

-- /data/hive/user1.txt
1,34,male,zhangsan
1,31,female,lili
3,14,female,liushen
3,24,female,sasa
4,54,male,liubei
4,36,female,yuji
4,25,male,zhaoyun
8,18,male,zhangfei

-- /data/hive/user2.txt
3,37,male,wangwu
4,38,female,lisi
3,19,female,caocao
2,22,female,guanyu
1,51,male,wzj
6,31,female,zhenji
6,25,male,sunwukong
6,17,male,tangsz

-- /data/hive/department.txt
1,dashuju,8
2,kaifa,9
3,ui,10
4,hr,3
5,shouxiao,12
6,zongjian,3

1.3、資料匯入

load data local inpath '/data/hive/user1.txt'  into table `user` partition (`date`='2020-12-24');
load data local inpath '/data/hive/user2.txt'  into table `user` partition (`date`='2020-12-25');
load data local inpath '/data/hive/department.txt'  into table `department`;

1.4、查詢資料

SELECT * from `user`;

SELECT * from department ;

1.5、對錶進行分析

ANALYZE TABLE `user` partition(`date`='2020-12-24') COMPUTE STATISTICS;
ANALYZE TABLE `user` partition(`date`='2020-12-25') COMPUTE STATISTICS;
ANALYZE TABLE department COMPUTE STATISTICS;

如果不進行如上的操作，在下面的實踐中會出問題，資料不真實，看不出效果，所以要做explain前對錶進行分析，這樣更加的準確（剛剛踩了坑，每次explain出來都只有1條資料，統計有問題）

二、inner join下的on和where分析

2.1、不使用分割槽進行過濾

1、首先看一個沒有條件的inner join的結果

SELECT  * from `user` u  inner join  department d  on d.id=u.department_id;

檢視執行計劃：

explain SELECT  * from `user` u  inner join  department d  on d.id=u.department_id;

STAGE DEPENDENCIES:
  Stage-4 is a root stage  
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            filterExpr: id is not null (type: boolean)
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: id is not null (type: boolean)
              Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 department_id (type: int)
                  1 id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            filterExpr: department_id is not null (type: boolean)
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Filter Operator
              predicate: department_id is not null (type: boolean)
              Statistics: Num rows: 16 Data size: 2944 Basic stats: COMPLETE Column stats: PARTIAL
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 department_id (type: int)
                  1 id (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col8, _col9, _col10
                Statistics: Num rows: 17 Data size: 3238 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                  Statistics: Num rows: 17 Data size: 3238 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 17 Data size: 3238 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

如上語句沒有做任何的條件過濾，也沒有使用分割槽：

首先對d表（department）進行全表掃描，掃描到了6條資料，然後預設新增id is not null的過濾條件對其掃描到的6條資料進行過濾（自動優化，如果join下要保留null的行的話需要提前做特殊處理，不然預設就被優化掉了，會導致資料丟失），最終得到過濾後還剩下6條資料參與inner join。
然後對u表（user）進行全表掃描，掃描到了16條資料，同樣新增預設的過濾條件department_id is not null，最終得到16條資料參與inner join。

2、接下來看一個有where條件和on條件下的結果

SELECT  * from `user` u inner join  department d on d.id=u.department_id
and d.count > 9
and u.age > 20
where u.age < 30;

接下來看看執行計劃：

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            filterExpr: ((count > 9) and id is not null) (type: boolean)
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: ((count > 9) and id is not null) (type: boolean)
              Statistics: Num rows: 2 Data size: 19 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 department_id (type: int)
                  1 id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            filterExpr: ((age > 20) and department_id is not null and (age < 30)) (type: boolean)
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Filter Operator
              predicate: ((age > 20) and department_id is not null and (age < 30)) (type: boolean)
              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: PARTIAL
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 department_id (type: int)
                  1 id (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col8, _col9, _col10
                Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                  Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

結果如上所示：

首先掃描d（department）表，全表掃描6條資料，並對其進行過濾：((count > 9) and id is not null) ，過濾結果剩下2條資料進行inner join操作
然後掃描u（user）表，也是全表掃描16條資料，並對其進行過濾((age > 20) and department_id is not null and (age < 30))，過濾剩下1條資料（這裡是有誤差的，其實應該剩餘4條資料，hive的執行計劃是一個大概的統計執行過程，不完全正確）進行inner join操作

小總結：inner join在不使用分割槽過濾的情況下，會對其兩個表都進行全表掃描，然後自動為join的鍵（on d.id=u.department_id）新增is not null的過濾條件，然後在配合on和where後面的條件進行過濾，在inner join中where和on是同時進行過濾的。

2.2、使用分割槽過濾

1、先看一個只有分割槽過濾的情況

SELECT  * from `user` u inner join  department d
on d.id=u.department_id
where u.`date`='2020-12-25';

檢視執行計劃：

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            filterExpr: id is not null (type: boolean)
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: id is not null (type: boolean)
              Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 department_id (type: int)
                  1 id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            filterExpr: (department_id is not null and (date = '2020-12-25')) (type: boolean)
            Statistics: Num rows: 8 Data size: 134 Basic stats: COMPLETE Column stats: NONE #這裡一個分割槽只有8條資料
            Filter Operator
              predicate: department_id is not null (type: boolean)
              Statistics: Num rows: 8 Data size: 134 Basic stats: COMPLETE Column stats: NONE 
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 department_id (type: int)
                  1 id (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col8, _col9, _col10
                Statistics: Num rows: 8 Data size: 147 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), '2020-12-25' (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                  Statistics: Num rows: 8 Data size: 147 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 8 Data size: 147 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

結果如上，這裡和沒有新增分割槽過的情況對比，就是添加了分割槽後不會對錶u（user）進行全表掃描，這樣的話就能提高效率，因為分割槽的儲存就是一個資料夾，所以在分割槽過濾後就可以指定分割槽進行掃描，就不會進行全表掃描，這樣的情況說明：分割槽表先進行分割槽過濾，然後對指定的分割槽進行全部掃描，然後再使用on和where以及自動新增的is not null條件進行過濾，過濾後的資料才進行inner join

2、看一個帶條件和分割槽過濾的結果

SELECT  * from `user` u inner join  department d
on d.id=u.department_id
and d.count > 9
and u.age > 20
where u.age < 30  and u.`date`='2020-12-24';

看執行計劃：

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            filterExpr: ((count > 9) and id is not null) (type: boolean)
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: ((count > 9) and id is not null) (type: boolean)
              Statistics: Num rows: 2 Data size: 19 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 department_id (type: int)
                  1 id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            filterExpr: ((age > 20) and department_id is not null and (age < 30)) (type: boolean)
            Statistics: Num rows: 8 Data size: 136 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: ((age > 20) and department_id is not null and (age < 30)) (type: boolean)
              Statistics: Num rows: 1 Data size: 17 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 department_id (type: int)
                  1 id (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col8, _col9, _col10
                Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), '2020-12-24' (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                  Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

結果如上，得出結果與（不使用分割槽條件過濾且使用on和where過濾）對比，可以看出來，使用分割槽過濾的區別就是在對錶進行掃描的時候是掃描全部還是掃描指定的分割槽，如果沒有分割槽過濾，則掃描全表，否則，只對指定的分割槽進行掃描。

2.3、inner join下on和where的總結

在inner join下，如果where條件中使用了分割槽過濾，則掃描指定的分割槽的資料，然後在通過where和on條件進行過濾，以及為join的鍵（on d.id=u.department_id）新增is not null的過濾條件（這裡需要注意的是join鍵為null的資料是否要保留，需要保留的話，就需要對join鍵進行特殊的處理，否則資料則被過濾掉，導致資料丟失），這裡on和where是同時過濾的，不區分先後。

三、left/right join下的on和where分析

由於left join和right join屬於同一型別，所以本文只針對left join進行實踐。

3.1、非主表在on和where條件下執行

先看一條執行語句的結果（在非主表在on後面新增過濾條件）（約定：u （主表） left join d（非主表））

SELECT * from `user` u
left join department d
on d.id=u.department_id
and d.count > 9

檢視結果

然後看一個執行計劃：

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            filterExpr: (count > 9) (type: boolean)
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (count > 9) (type: boolean)
              Statistics: Num rows: 2 Data size: 19 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 department_id (type: int)
                  1 id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 department_id (type: int)
                1 id (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col8, _col9, _col10
              Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

結果如上，我們發現在left join下，約定：u （主表） left join d（非主表），非主表在on下面的條件d.count > 9過濾有效，最終掃描全部6條資料，通過條件過濾剩下2條資料然後進行left join，主表掃描全表進行left join，這裡注意，在left join條件下兩個表的join鍵（on d.id=u.department_id）都沒有加上is not null的條件過濾，所以在進行left join的時候需要注意join 鍵是否為空，為空的情況可以對其進行優化。

看一條執行語句的結果（在非主表在where後面新增過濾條件）（約定：u （主表） left join d（非主表））

SELECT  * from `user` u
left join  department d
on d.id=u.department_id
where d.count > 9

結果如下：（與非主表在on後面新增的新增結果是不一樣的）

看執行計劃：

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            HashTable Sink Operator
              keys:
                0 department_id (type: int)
                1 id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 department_id (type: int)
                1 id (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col8, _col9, _col10
              Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: (_col10 > 9) (type: boolean)
                Statistics: Num rows: 5 Data size: 1039 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                  Statistics: Num rows: 5 Data size: 1039 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 5 Data size: 1039 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

從如上的執行計劃來看，對錶u（user）和d（department）是在沒有任何過濾的情況下，進行了去全表掃描的left join，在left join獲得結果後，然後再對結果使用非主表的where條件d.count > 9進行過濾

小總結：（left join）在非主表下使用on或者使用where進行過濾時的結果是不一樣的，如果是在on下面新增條件過濾，則先進行表的資料過濾，然後在進行left join，如果是在where後面新增條件過濾，則是先進行left join，然後在對left join得到的結果進行where條件的過濾，在left join中，不會對join鍵新增預設的is not null的過濾條件。

3.2、主表在on和where條件下執行

先看一條執行語句的結果（在主表在on後面新增過濾條件）（約定：u （主表） left join d（非主表））

SELECT  * from `user` u
left join  department d
on d.id=u.department_id
and u.age > 20

看到如上的結果發現，還是保留了16條資料（user表的全部資料），但是發現age<=20的資料好像不參加left join一樣，後面的值全都是null。

看看執行計劃：

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            HashTable Sink Operator
              filter predicates:
                0 {(age > 20)}
                1 
              keys:
                0 department_id (type: int)
                1 id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              filter predicates:
                0 {(age > 20)}
                1 
              keys:
                0 department_id (type: int)
                1 id (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col8, _col9, _col10
              Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

結果如上，其中在處理d(department表)時，掃描全表6條資料，對錶d（department）進行標記age>20的條件，然後對u（user）表進行全表掃描並進行全表的left join，在left join的過程中對d（department）表（來源於d表的的欄位）通過主表的條件age > 20進行篩選，如果u表的age <=20,則來源於d表的欄位全部置為null，（如上為個人理解，不一定正確。簡單來說，先做個判斷標記，然後進行left join，在left join的過程中通過條件進行過濾（不符合條件的資料保留主表的資料，非主表的資料丟棄，置為null）），這裡在on後面的條件不會對主表的條數產生影響，也是先進行left join並進行相應的過濾。理解起來比較繞，可以自己對應結果看看。

看一條執行語句的結果（在主表在where後面新增過濾條件）（約定：u （主表） left join d（非主表））

SELECT  * from `user` u
left join  department d
on d.id=u.department_id
where u.age > 20

接下來看執行計劃：

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            HashTable Sink Operator
              keys:
                0 department_id (type: int)
                1 id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            filterExpr: (age > 20) (type: boolean)
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Filter Operator
              predicate: (age > 20) (type: boolean)
              Statistics: Num rows: 5 Data size: 920 Basic stats: COMPLETE Column stats: PARTIAL
              Map Join Operator
                condition map:
                     Left Outer Join0 to 1
                keys:
                  0 department_id (type: int)
                  1 id (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col8, _col9, _col10
                Statistics: Num rows: 6 Data size: 63 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                  Statistics: Num rows: 6 Data size: 63 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 6 Data size: 63 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

結果如上，可以明確的看出來當在主表中使用where過濾，會先對主表的資料進行過濾然後在進行left join，主表掃描出16條資料，過濾後剩餘5條，然後再進行left join得到最終的結果。

小總結：（left join）在主表下使用on或者使用where進行過濾時的結果是不一樣的，當使用where對主表進行過濾的時候，先過濾再進行left join。當使用on對主表進行過濾，先在非主表進行過濾標記，然後再對全表進行left join時根據過濾條件把不符合條件的行中來源於非主表的資料設定為null。

3.3、left/right join使用分割槽過濾

看如下語句：

SELECT  * from `user` u
left join  department d
on d.id=u.department_id
where u.age > 20 and u.`date` = '2020-12-24';

結果：

看看執行計劃：

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            HashTable Sink Operator
              keys:
                0 department_id (type: int)
                1 id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            filterExpr: ((age > 20) and (date = '2020-12-24')) (type: boolean)
            Statistics: Num rows: 8 Data size: 136 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (age > 20) (type: boolean)
              Statistics: Num rows: 2 Data size: 34 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Left Outer Join0 to 1
                keys:
                  0 department_id (type: int)
                  1 id (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col8, _col9, _col10
                Statistics: Num rows: 6 Data size: 63 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), '2020-12-24' (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                  Statistics: Num rows: 6 Data size: 63 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 6 Data size: 63 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

根據如上的執行計劃可以看出來，使用分割槽過濾可以防止全表掃描，如上值掃描了一個分割槽，所以資料只有8條。

3.4、left/right join下on和where的總結

使用分割槽條件過濾，可以防止全表掃描，最優先過濾
在主表下進行過濾，使用on和where過濾的結果是不一樣的，當使用where對主表進行過濾的時候，先過濾再進行left join。當使用on對主表進行過濾，先在非主表進行過濾標記，然後再對全表進行left join時根據過濾條件把不符合條件的行中來源於非主表的資料設定為null。
在非主表下進行過濾，使用on和where過濾的結果是不一樣的，如果是在on下面新增條件過濾，則先進行表的資料過濾，然後在進行left join，如果是在where後面新增條件過濾，則是先進行left join，然後在對left join得到的結果進行where條件的過濾
left/right join不會對join鍵自動新增is not null的過濾條件，所以在left/right join的時候要注意join鍵為null的情況，這裡是可以做優化的

四、full join下的on和where分析

4.1、沒有過濾條件的full join

直接看一個沒有任何條件的full join

SELECT  * from `user` u
full join  department d
on d.id=u.department_id

檢視執行計劃：

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Reduce Output Operator
              key expressions: department_id (type: int)
              sort order: +
              Map-reduce partition columns: department_id (type: int)
              Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
              value expressions: age (type: int), sex (type: string), name (type: string), date (type: string)
          TableScan
            alias: d
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: id (type: int)
              sort order: +
              Map-reduce partition columns: id (type: int)
              Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
              value expressions: name (type: string), count (type: int)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
          keys:
            0 department_id (type: int)
            1 id (type: int)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col8, _col9, _col10
          Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
            Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

執行計劃如上，會對每個表進行升序的排序，沒有自動優化（新增null過濾），執行全表的full join。

4.2、有where條件的full join

 SELECT  * from `user` u
full join  department d
on d.id=u.department_id
where u.age > 20 and  d.count > 9

結果如下

檢視執行計劃：

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Reduce Output Operator
              key expressions: department_id (type: int)
              sort order: +
              Map-reduce partition columns: department_id (type: int)
              Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
              value expressions: age (type: int), sex (type: string), name (type: string), date (type: string)
          TableScan
            alias: d
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: id (type: int)
              sort order: +
              Map-reduce partition columns: id (type: int)
              Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
              value expressions: name (type: string), count (type: int)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
          keys:
            0 department_id (type: int)
            1 id (type: int)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col8, _col9, _col10
          Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
          Filter Operator
            predicate: ((_col10 > 9) and (_col1 > 20)) (type: boolean)
            Statistics: Num rows: 1 Data size: 207 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
              Statistics: Num rows: 1 Data size: 207 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 207 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

從執行計劃看出來，在full join下的使用where 進行過濾的時候是先進行全表掃描，然後進行full join，full join獲得結果後才對where中的條件進行過濾。

4.3、有on條件的full join（留有疑問）

SELECT  * from `user` u
full join  department d
on d.id=u.department_id
and u.age > 20 and  d.count > 9

看到如上結果，可能有點意外，（個人能力有限，厲害的博友可以解釋解釋），個人的理解為就像left join的主表下的on條件一樣，都是在full join的過程中進行過濾，然而兩個表的全部資料都有保留下來，只有兩個條件都成立的情況下，才沒有null值。（在full join如果不懂，就儘量使用where條件判斷啦）

檢視執行計劃：

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Reduce Output Operator
              key expressions: department_id (type: int)
              sort order: +
              Map-reduce partition columns: department_id (type: int)
              Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
              value expressions: age (type: int), sex (type: string), name (type: string), date (type: string)
          TableScan
            alias: d
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: id (type: int)
              sort order: +
              Map-reduce partition columns: id (type: int)
              Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
              value expressions: name (type: string), count (type: int)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
          filter predicates:
            0 {(VALUE._col0 > 20)}
            1 {(VALUE._col1 > 9)}
          keys:
            0 department_id (type: int)
            1 id (type: int)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col8, _col9, _col10
          Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col8 (type: int), _col9 (type: string), _col10 (type: int)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
            Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

留有疑問？？？？，優秀的博友如果理解了，可以在下面留言

4.4、分割槽過濾的full join

SELECT  * from department d full join `user` u 
on d.id=u.department_id 
where u.`date`= '2020-12-24';

檢視執行計劃：

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: d
            Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: id (type: int)
              sort order: +
              Map-reduce partition columns: id (type: int)
              Statistics: Num rows: 6 Data size: 58 Basic stats: COMPLETE Column stats: NONE
              value expressions: name (type: string), count (type: int)
          TableScan
            alias: u
            Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
            Reduce Output Operator
              key expressions: department_id (type: int)
              sort order: +
              Map-reduce partition columns: department_id (type: int)
              Statistics: Num rows: 16 Data size: 3214 Basic stats: COMPLETE Column stats: PARTIAL
              value expressions: age (type: int), sex (type: string), name (type: string), date (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
          keys:
            0 id (type: int)
            1 department_id (type: int)
          outputColumnNames: _col0, _col1, _col2, _col6, _col7, _col8, _col9, _col10
          Statistics: Num rows: 17 Data size: 3535 Basic stats: COMPLETE Column stats: NONE
          Filter Operator
            predicate: (_col10 = '2020-12-24') (type: boolean)
            Statistics: Num rows: 8 Data size: 1663 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: _col0 (type: int), _col1 (type: string), _col2 (type: int), _col6 (type: int), _col7 (type: int), _col8 (type: string), _col9 (type: string), '2020-12-24' (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
              Statistics: Num rows: 8 Data size: 1663 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 8 Data size: 1663 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

根據執行計劃得知：在full join中，就算使用了分割槽過濾，還是先full join得到結果，然後在通過where條件進行過濾，所以推薦使用子查詢先過濾，然後在進行full join。

4.5、full join中的on和where總結

這裡在on的條件下還是留有疑問。。
在where的條件下不管是否使用分割槽過濾都是先full join，再進行過濾的，所以這裡現有通過子查詢過濾，再進行full join
在full jion中不會自動新增join 鍵為is not null的條件

五、總結

1、inner join

inner join首先可以通過分割槽進行過濾，防止全表掃描。
inner join會自動為join的鍵（on d.id=u.department_id）新增is not null的過濾條件
inner join 下on和where後面的條件進行過濾，在inner join中where和on是同時進行過濾，沒有順序的區別

2、left/right join

left/right join使用分割槽條件過濾，可以防止全表掃描，最優先過濾
left/right join在主表下進行過濾，使用on和where過濾的結果是不一樣的，當使用where對主表進行過濾的時候，先過濾再進行left join。當使用on對主表進行過濾，先在非主表進行過濾標記，然後再對全表進行left join時根據過濾條件把不符合條件的行中來源於非主表的資料設定為null。
left/right join在非主表下進行過濾，使用on和where過濾的結果是不一樣的，如果是在on下面新增條件過濾，則先進行表的資料過濾，然後在進行left join，如果是在where後面新增條件過濾，則是先進行left join，然後在對left join得到的結果進行where條件的過濾，所以過濾非主表的時候可以通過on進行條件過濾，這樣防止寫子查詢
left/right join不會對join鍵自動新增is not null的過濾條件，所以在left/right join的時候要注意join鍵為null的情況，這裡是可以做優化的

3、full join

full join中on下條件下過濾(有疑問，還待探究)
full join中where下條件過濾，不管是不是分割槽過濾，都是先進行full join，在根據條件進行過濾，這裡推薦子查詢先過濾在進行full join
不會對join鍵自動新增is not null的過濾條件

相關推薦

談談hive中join下on和where

本文為部落格園作者所寫：一寸HUI，個人部落格地址：https://www.cnblogs.com/zsql/ 很多人如果先接觸mysql的執行順序（from ->on ->join ->where ->group by ->having ->select -&

mysql中join後on、where的區別

可能只看這兩個關鍵字看不出任何的問題。那我們使用實際的例子來說到底有沒有差異。    例如存在兩張表A,B表資料如下：語句一 Sql程式碼 select  A.ID as AID, B.ID as BID   from A leftjoin B on A

SQL語句Left join 中On和Where的用法區別

如果SQL用的是Left Join ，On後面的條件對Left的表沒有作用，只對Right的表有過濾作用，Where語句可以對Left的表有過濾作用如果SQL用的是Right Join ，On後面的條件對Right的表沒有作用，只對Left的表有過濾作用，Where

Oracle的left join中on和where的區別以及與(+)的區別

資料庫在通過連線兩張或多張表來返回記錄時，都會生成一張中間的臨時表，然後再將這張臨時表返回給使用者。       在使用left jion時，on和where條件的區別如下： 1、 on條件是在生成臨時表時使用的條件，它不管on中的條件是否為真，都會返回左邊表中的記錄。 2、where條件是在臨時表生成好後

Oracle的left join中on和where的區別

資料庫在通過連線兩張或多張表來返回記錄時，都會生成一張中間的臨時表，然後再將這張臨時表返回給使用者。     在使用left jion時，on和where條件的區別如下：     1、on條件是在生成臨時表時使用的條件，它不管on中的條件是否為真，都會返回左邊表中的記

SQL中on和where的區別

知識庫 mysq sdn pad 行數 full join sel targe 知識這篇博客參考了網上的，具體哪一篇是原稿，不知道，特此聲明。數據庫在通過連接兩張或多張表來返回記錄時，都會生成一張中間的臨時表，然後再將這張臨時表返回給用戶。下面以使用le

left join on和where

tab sid image 宋體 body 基礎上連接成績生成 left join on：向左關聯某個表記錄，意思是以左邊的表記錄為基準，通過關聯條件會從左表返回所有的行，即使在右表中沒有匹配的行。舉個例子： select * from A left join B

sql查詢語句中on和where的區別

運算 www span 多表速度分享執行最終 .com 　　數據庫在通過連接兩張或多張表來返回記錄時，都會生成一張中間的臨時表，然後再將這張臨時表返回給用戶。在使用left jion時，on和where條件的區別如下： on條件是在生成臨時表時使用的條件，它不管

join on 和where的執行順序

轉自：https://www.cnblogs.com/Jessy/p/3525419.html left join :左連線，返回左表中所有的記錄以及右表中連線欄位相等的記錄。 right join :右連線，返回右表中所有的記錄以及左表中連線欄位相等的記錄。 inner join: 內連線

left join on 和where條件的放置

JOIN聯表中ON,WHERE後面跟條件的區別對於JOIN的連表操作，這裡就不細述了，當我們在對錶進行JOIN關聯操作時，對於ON和WHERE後面的條件，不清楚大家有沒有注意過，有什麼區別，可能有的朋友會認為跟在它們後面的條件是一樣的，你可以跟在ON後面，如果願意，也可以跟在WHERE後面。它們在ON和W

親測SQL left join on 和 where 效率

最近看到很多前輩都用了left join on，於是在網上查了好久，感覺說的不大對勁，於是，親測了一下。雖然不嚴謹，但是，已經反映一般規律了。親測SQL left join on 和 where 效率紀念一下自己測試的結果： where 用了1點多秒下面是lef

資料庫中，on和where使用時機的分析

資料庫查詢中on和where關鍵字使用的時機：首先說下連表查詢，資料庫在通過連線查詢兩張表或者多張表時，會生成一箇中間表，而我們看到的就是這個中間表。以左連結（左連結不再贅述，不瞭解的自行百度哈）為例說明，如下sql語句：Select * from book left joi

【mysql】mysql中left join使用on 與where篩選的差異

資料表A與資料表B中資料如下：語句一： Sql程式碼 select  A.ID as AID, B.ID as BID   from A left join B on A.ID =

從join on和where執行順序認識T-SQL查詢處理執行順序

SQL語句中join連表時on和where後都可以跟條件，那麼對查詢結果集，執行順序，效率是如何呢？通過查詢資料發現：區別：on是對中間結果進行篩選，where是對最終結果篩選。執行順序：先進行on的過濾, 而後才進行join。效率：如果是inner join, 放o

left join on和where區別

left join中關於where和on條件的幾個知識點： 1.多表left join是會生成一張臨時

Python中join函數和os.path.join用法

post () 參數說明 python class log join 絕對路徑返回 Python中有join和os.path.join()兩個函數，具體作用如下： join：連接字符串數組。將字符串、元組、列表中的元素以指定的字符(分隔符)連接生成一個新的字符串 os.p

python(七) Python中單下劃線和雙下劃線

www www. 自己 pro improt 分類鏈接默認 question Python中單下劃線和雙下劃線：一、分類 (1)、以單下劃線開頭，表示這是一個保護成員，只有類對象和子類對象自己能訪問到這些變量。以單下劃線開

Python中單下劃線和雙下劃線有啥區別

__func__：一種約定，Python內部的名字，用來區別其他使用者自定義的命名，以防衝突。 _func：一種約定，用來指定私有變數或私有函式（private） __func：如果類中有一個方法與另外一個類中的一個方法同名了

Hive中join, outer join, semi join區別

Hive是基於Hadoop的一個數據倉庫工具，可以將結構化的資料檔案對映為一張資料庫表，並提供簡單的sql查詢功能，可以將sql語句轉換為MapReduce任務進行執行。最常用的就是多表關聯查詢，主要講解下join、outer join和semi join的具體使用。

Android Studio 中 build下clean 和rebuild選項消失

使用AS2.2版本過程中關閉重新開啟後，所有的module 都被打上紅叉， build下clean 和rebuild選項也消失了，沒法重新編譯，並且在setting-->system 下也找不

談談hive中join下on和where

一、試驗表和資料

1.1、建表

1.2、資料

1.3、資料匯入

1.4、查詢資料

二、inner join下的on和where分析

2.1、不使用分割槽進行過濾

2.2、使用分割槽過濾

2.3、inner join下on和where的總結

三、left/right join下的on和where分析

3.1、非主表在on和where條件下執行

3.2、主表在on和where條件下執行

3.3、left/right join使用分割槽過濾

3.4、left/right join下on和where的總結

四、full join下的on和where分析

4.1、沒有過濾條件的full join

4.2、有where條件的full join

4.3、有on條件的full join（留有疑問）

4.4、分割槽過濾的full join

4.5、full join中的on和where總結

五、總結

相關推薦