行式儲存和列式儲存的比較
行式儲存的優點:
同一行資料存放在同一個block塊裡面,select * from table_name;資料能直接獲取出來;
INSERT/UPDATE比較方便
行式儲存的缺點:
不同型別資料存放在同一個block塊裡面,壓縮效能不好;
select id,name from table_name;這種型別的列查詢,所有資料都要讀取,而不能跳過。
列式儲存的優點:
同類型資料存放在同一個block塊裡面,壓縮效能好;
任何列都能作為索引。
列式儲存的缺點:
select * from table_name;這類全表查詢,需要資料重組;
INSERT/UPDATE比較麻煩。
create table page_views_orc_zlib
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS ORC
TBLPROPERTIES("orc.compress"="ZLIB")
as select * from page_views;
#預設是zlib,寫不寫都一樣
create table page_views_orc_snappy
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS ORC
TBLPROPERTIES("orc.compress"="SNAPPY")
as select * from page_views;
create table page_views_parquet
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS PARQUET
as select * from page_views;
set parquet.compression=gzip;
create table page_views_parquet_gzip
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS PARQUET
as select * from page_views;
【來自@若澤大資料】