1. 程式人生 > >特性速覽| Apache Hudi 0.5.3版本正式釋出

特性速覽| Apache Hudi 0.5.3版本正式釋出

### 1. 下載連線 * 原始碼下載:[Apache Hudi 0.5.3 Source Release](https://downloads.apache.org/hudi/0.5.3/hudi-0.5.3.src.tgz) ([asc](https://downloads.apache.org/hudi/0.5.3/hudi-0.5.3.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.5.3/hudi-0.5.3.src.tgz.sha512)) * 0.5.3版本相關jar包地址:https://repository.apache.org/#nexus-search;quick~hudi ### 2. 遷移指南 * 這是一個bugfix版本,從0.5.2升級時不需要任何特殊的遷移步驟。如果要從早期版本"X"升級,請閱讀"X"和0.5.3之間的每個後續版本的遷移指南。 * 0.5.3是Hudi畢業後的第一個版本,因此所有hudi jar的版本名稱中不再帶有"-incubating"。在所有提及hudi版本的地方,請確保不再存在"-incubating"。 例如,hudi-spark-bundle pom依賴如下所示: ```xml org.apache.hudi hudi-spark-bundle_2.12 0.5.3 ``` ### 3. 關鍵特性 * Hudi內建支援 `aliyun OSS` 物件儲存。 * 預設情況下將為delta-streamer和spark datasource寫入啟用Embedded Timeline Server。在此版本之前,此功能處於實驗模式,embeddedTimeline Server在Spark Driver中快取檔案列表,並提供Restful介面給Spark Writer任務呼叫來減少了每次寫入時的list檔案列表的操作,此優化對雲上物件儲存非常友好。 * 預設情況下為delta-streamer和Spark datasource寫入均啟用"增量清理(incremental cleaning)"。在此版本之前,此功能還處於實驗模式,在穩定狀態下,增量清理避免了掃描所有分割槽的昂貴步驟,而是使用Hudi元資料來查詢要清理的檔案,此優化也對雲上物件儲存非常友好。 * 支援將Delta-Streamer配置檔案放置在與實際資料不同的檔案系統中。 * Hudi Hive Sync現在支援按日期型別列分割槽的表。 * Hudi Hive Sync現在支援直接通過Hive MetaStore進行同步。您只需要設定`hoodie.datasource.hive_sync.use_jdbc = false`。Hive Metastore Uri將從environment中隱式讀取。例如當通過Spark datasource寫入時, ```java spark.write.format(“hudi”) .option(…) .option(“hoodie.datasource.hive_sync.username”, “”) .option(“hoodie.datasource.hive_sync.password”, “”) .option(“hoodie.datasource.hive_sync.partition_fields”, “”) .option(“hoodie.datasource.hive_sync.database”, “”) .option(“hoodie.datasource.hive_sync.table”, “”) .option(“hoodie.datasource.hive_sync.use_jdbc”, “false”) .mode(APPEND) .save(“/path/to/dataset”) ``` * 支援Presto查詢MoR表時Hudi側的改造。 * 其他與Writer Performance相關的缺陷修復。 * 現在DataSource Writer避免了寫入後不必要的資料載入。 * Hudi Writer現在利用spark的併發來加速小檔案查詢。 ### 4. 感謝 感謝如下貢獻者(排名不分先後): @[bhasudha](https://github.com/apache/hudi/commits?author=bhasudha),@[yanghua](https://github.com/apache/hudi/commits?author=yanghua) ,@[ddong](https://github.com/apache/hudi/commits?author=hddong) ,@[smarthi](https://github.com/apache/hudi/commits?author=smarthi) ,@[afilipchik](https://github.com/afilipchik),@[zhedoubushishi](https://github.com/zhedoubushishi),@[umehrot2](https://github.com/umehrot2),@[varadar](https://github.com/apache/hudi/commits?author=bvaradar),@[ffcchi](https://github.com/ffcchi),@[bschell](https://github.com/bschell),@[vinothchandar](https://github.com/apache/hudi/commits?author=vinothchandar) ,@[shenh062326](https://github.com/apache/hudi/commits?author=shenh062326),@[lamber-ken](https://github.com/apache/hudi/commits?author=lamber-ken),@[zhaomin1423](https://github.com/apache/hudi/commits?author=zhaomin1423),@[EdwinGuo](https://github.com/apache/hudi/commits?author=EdwinGuo),@[prashantwason](https://github.com/apache/hudi/commits?author=prashantwason) ,@[pratyakshsharma](https://github.com/apache/hudi/commits?author=pratyakshsharma),@[dengziming](https://github.com/apache/hudi/commits?author=dengziming) ,@[AakashPradeep](https://github.com/AakashPradeep),@[Jecarm](https://github.com/apache/hudi/commits?author=Jecarm) ,@[xushiyan](https://github.com/apache/hudi/commits?author=xushiyan) ,@[cxzl25](https://github.com/apache/hudi/commits?author=cxzl25),@[garyli1019](https://github.com/apache/hudi/commits?author=garyli1019) ,@[rolandjohann](https://github.com/apache/hudi/commits?author=rolandjohann) ,@[nsivabalan](https://github.com/apache/hudi/commits?author=nsivabalan),@[leesf](https://github.com/apache/hudi/commits?author=leesf) ,@[jfrazee](https://github.com/apache/hudi/commits?author=