ClickHouse 18.12.13-2018-09-10版本新特性解析
ClickHouse的發版速度是眾所周知的快
在最近,他們正式發出了18.12.13-2018-09-10版本
相關的CHANGELOG更是多的嚇人
為了能夠更好的使用新版特性,特做了詳細的介紹
其中新特性部分,為人工翻譯、校對,畢竟這部分內容是重點,後面為Google翻譯
新特性列表(按照GitHub中CHANGELOG順序)
支援Decimal
- Added the DECIMAL(digits, scale) data type (Decimal32(scale), Decimal64(scale), Decimal128(scale)). To enable it, use the setting allow_experimental_decimal_type. #2846 #2970 #3008 #3047
SELECT * FROM data_type_families WHERE name LIKE '%De%' ┌─name───────┬─case_insensitive─┬─alias_to─┐ │ Decimal32│1 ││ │ Decimal64│1 ││ │ Decimal128 │1 ││ │ Decimal│1 ││ └────────────┴──────────────────┴──────────┘
新的WITH ROLLUP修飾符GROUP BY,替代語法 GROUP BY ROLLUP(…)
- New WITH ROLLUP modifier for GROUP BY (alternative syntax: GROUP BY ROLLUP(…)). #2948
JOIN查詢會把*解析為欄位
- In requests with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting - asterisk_left_columns_only to 1 on the user configuration level. Winter Zhang
JOIN支援table functions(remote/merge/numbers/url)
- Added support for JOIN with table functions. Winter Zhang
終端支援tab自動補全
- Autocomplete by pressing Tab in clickhouse-client. Sergey Shcherbin
終端支援ctrl c取消輸入
- Ctrl+C in clickhouse-client clears a query that was entered. #2877
可指定預設的JOIN行為
- Added the join_default_strictness setting (values: “, ‘any’, ‘all’). This allows you to not specify ANY or ALL for JOIN. #2982
server log關聯查詢ID
- Each line of the server log related to query processing shows the query ID. #2482
2018.09.15 23:01:12.934700 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: (from xx.xx.80.34:37066, user: user) select * from numbers(100) 2018.09.15 23:01:12.934984 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> InterpreterSelectQuery: FetchColumns -> Complete 2018.09.15 23:01:12.935029 [ 86 ] <Trace> SystemLog (system.query_log): Flushing system log 2018.09.15 23:01:12.935036 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: Query pipeline: Expression Expression Limit Numbers 2018.09.15 23:01:12.935391 [ 277 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> ThreadStatus: Thread 277 exited 2018.09.15 23:01:12.935449 [ 87 ] <Trace> SystemLog (system.query_thread_log): Flushing system log 2018.09.15 23:01:12.935517 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Information> executeQuery: Read 100 rows, 800.00 B in 0.001 sec., 146631 rows/sec., 1.12 MiB/sec. 2018.09.15 23:01:12.935557 [ 275 ] <Debug> MemoryTracker: Peak memory usage (total): 1.00 MiB. 2018.09.15 23:01:12.935572 [ 275 ] <Information> TCPHandler: Processed in 0.001 sec.
終端可以直接print日誌
- Now you can get query execution logs in clickhouse-client (use the send_logs_level setting). With distributed query processing, logs are cascaded from all the servers. #2482
SELECT * FROM system.settings WHERE name = 'send_logs_level' ┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ send_logs_level │ none│0 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │ └─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 1 rows in set. Elapsed: 0.002 sec. xx.xx.xx.xx. :) set send_logs_level = 'trace'; SET send_logs_level = 'trace' Ok. 0 rows in set. Elapsed: 0.001 sec. xx.xx.xx.xx. :) select * from system.settings where name = 'send_logs_level' ; SELECT * FROM system.settings WHERE name = 'send_logs_level' [xx.xx.xx.xx.] 2018.09.15 23:30:51.158294 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: (from 127.0.0.1:59056, user: user) select * from system.settings where name = 'send_logs_level' [xx.xx.xx.xx.] 2018.09.15 23:30:51.159056 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete [xx.xx.xx.xx.] 2018.09.15 23:30:51.159152 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: Query pipeline: Expression Expression Filter One ┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ send_logs_level │ trace │1 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │ └─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ [xx.xx.xx.xx.] 2018.09.15 23:30:51.159654 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 398 ] <Trace> ThreadStatus: Thread 398 exited [xx.xx.xx.xx.] 2018.09.15 23:30:51.159803 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Information> executeQuery: Read 178 rows, 25.15 KiB in 0.001 sec., 127014 rows/sec., 17.53 MiB/sec. # 分散式查詢的日誌也會打到當前終端
記錄setting行為到query_log
- The system.query_log and system.processes (SHOW PROCESSLIST) tables now have information about all changed settings when you run a query (the nested structure of the Settings data). - Added the log_query_settings setting. #2482
SELECT * FROM system.query_log ORDER BY event_time DESC LIMIT 1 Row 1: ────── type:1 event_date:2018-09-15 event_time:2018-09-15 23:42:17 query_start_time:2018-09-15 23:42:17 query_duration_ms:0 read_rows:0 read_bytes:0 written_rows:0 written_bytes:0 result_rows:0 result_bytes:0 memory_usage:0 query:select * from system.metrics exception: stack_trace: is_initial_query:1 user:user query_id:0881d528-a79c-4bd7-8c0a-38ce270b95f1 address:�� M� port:33384 initial_user:user initial_query_id:0881d528-a79c-4bd7-8c0a-38ce270b95f1 initial_address:�� M� initial_port:33384 interface:2 os_user: client_hostname: client_name: client_revision:0 client_version_major: 0 client_version_minor: 0 client_version_patch: 0 http_method:1 http_user_agent:Go-http-client/1.1 quota_key: revision:54407 thread_numbers:[] ProfileEvents.Names:[] ProfileEvents.Values: [] Settings.Names:['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','readonly','max_memory_usage'] Settings.Values:['48','0','64','random','1','1','32212254720']
記錄執行緒數
- The system.query_log and system.processes tables now show information about the number of threads that are participating in query execution (see the thread_numbers column). #2482
SELECT thread_numbers FROM system.query_log ORDER BY event_time DESC LIMIT 10 ┌─thread_numbers─┐ │ []│ │ [88]│ │ []│ │ [77,173]│ │ []│ │ []│ │ [77,174]│ │ []│ │ [77,171]│ │ []│ └────────────────┘ SELECT thread_numbers FROM system.processes LIMIT 10 ┌─thread_numbers─┐ │ [77]│ └────────────────┘
增加程序統計資訊
-
Added ProfileEvents counters that measure the time spent on reading and writing over the network and reading and writing to disk, the number of network errors, and the time spent - waiting when network bandwidth is limited. #2482
-
Added ProfileEventscounters that contain the system metrics from rusage (you can use them to get information about CPU usage in userspace and the kernel, page faults, and context -
switches), as well as taskstats metrics (use these to obtain information about I/O wait time, CPU wait time, and the amount of data read and recorded, both with and without page c ache). -#2482
-
The ProfileEvents counters are applied globally and for each query, as well as for each query execution thread, which allows you to profile resource consumption by query in detail. #2482
SELECT * FROM system.processes LIMIT 10 Row 1: ────── is_initial_query:1 user:user query_id:7e65449f-8899-4c5f-8859-20eeae32d1b1 address:127.0.0.1 port:38610 initial_user:user initial_query_id:7e65449f-8899-4c5f-8859-20eeae32d1b1 initial_address:127.0.0.1 initial_port:38610 interface:1 os_user:root client_hostname:xx.xx.xx.xx. client_name:ClickHouse client client_version_major: 18 client_version_minor: 12 client_version_patch: 13 client_revision:54407 http_method:0 http_user_agent: quota_key: elapsed:0.000800632 is_cancelled:0 read_rows:0 read_bytes:0 total_rows_approx:0 written_rows:0 written_bytes:0 memory_usage:880 peak_memory_usage:880 query:select * from system.processeslimit 10 thread_numbers:[181] ProfileEvents.Names:['Query','SelectQuery','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','ContextLock','RWLockAcquiredReadLocks'] ProfileEvents.Values: [1,1,36,1,10,1,57,3,1] Settings.Names:['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','max_memory_usage'] Settings.Values:['48','0','64','random','1','64424509440']
新增每個查詢執行執行緒的資訊
- Added the system.query_thread_log table, which contains information about each query execution thread. Added the log_query_threads setting. #2482
SELECT * FROM system.query_thread_log ORDER BY event_time DESC LIMIT 3 Row 1: ────── event_date:2018-09-15 event_time:2018-09-15 22:29:17 query_start_time:2018-09-15 22:29:17 query_duration_ms:4 read_rows:2178 read_bytes:1446750 written_rows:0 written_bytes:0 memory_usage:27136 peak_memory_usage:44996168 thread_name:ParalInputsProc thread_number:252 os_thread_id:19255 master_thread_number: 74 master_os_thread_id:9227 query:select * from system.query_thread_log order by event_time desclimit 10 is_initial_query:1 user:user query_id:55740e27-e796-4b0f-a9ff-530363f91d76 address:�� port:52456 initial_user:user initial_query_id:55740e27-e796-4b0f-a9ff-530363f91d76 initial_address:�� initial_port:52456 interface:1 os_user:root client_hostname:xx.xx.xx.xx. client_name:ClickHouse client client_revision:54407 client_version_major: 18 client_version_minor: 12 client_version_patch: 13 http_method:0 http_user_agent: quota_key: revision:54407 ProfileEvents.Names:['FileOpen','ReadBufferFromFileDescriptorRead','ReadBufferFromFileDescriptorReadBytes','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','MarkCacheMisses','CreatedReadBufferOrdinary','DiskReadElapsedMicroseconds','ContextLock','RealTimeMicroseconds','UserTimeMicroseconds','SystemTimeMicroseconds','SoftPageFaults'] ProfileEvents.Values: [80,118,231033,229113,46,1101082,80,42862276,40,40,1135,2,4328,890,2671,678]
新增 system.metrics and system.events
- The system.metrics and system.events tables now have built-in documentation. #3016
SELECT * FROM system.metrics ┌─metric───────────────────────────────────┬──────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Query│1 │ Number of executing queries│ │ Merge│0 │ Number of executing background merges│ │ PartMutation│0 │ Number of mutations (ALTER DELETE/UPDATE)│ │ ReplicatedFetch│0 │ Number of data parts fetching from replica│ │ ReplicatedSend│0 │ Number of data parts sending to replicas│ │ ReplicatedChecks│0 │ Number of data parts checking for consistency SELECT * FROM system.events LIMIT 10 ┌─event───────────────────────────────────┬────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Query│127 │ Number of queries started to be interpreted and maybe executed. Does not include queries that are failed to parse, that are rejected due to AST size limits; rejected due to quota limits or limits on number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. │ │ SelectQuery│124 │ Same as Query, but only for SELECT queries.│ │ FileOpen│54504 │ Number of files opened.│ │ Seek│47 │ Number of times the 'lseek' function was called.
新增arrayEnumerateDense函式
- Added the arrayEnumerateDense function. Amos Bird
SELECT arrayEnumerateDense([(1, 2), (3, 4), (1, 2), (1, 2), (2, 3), (2, 3)]) ┌─arrayEnumerateDense(array(tuple(1, 2), tuple(3, 4), tuple(1, 2), tuple(1, 2), tuple(2, 3), tuple(2, 3)))─┐ │ [1,2,1,1,3,3]│ └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
新增arrayCumSumNonNegativ/arrayDifference函式
-
Added the arrayCumSumNonNegative and arrayDifference functions. Aleksey Studnev
-
似乎還沒上,無法使用。
新增retention函式
- Added the retention aggregate function. Sundy Li
SELECT uid, retention(date = '2018-08-06', date = '2018-08-07', date = '2018-08-08') AS r FROM retention_test WHERE date IN ('2018-08-06', '2018-08-07', '2018-08-08') GROUP BY uid ORDER BY uid ASC LIMIT 3 ┌─uid─┬─r───────┐ │0 │ [1,1,1] │ │1 │ [1,1,1] │ │2 │ [1,1,1] │ └─────┴─────────┘
states函式可以使用加號雲演算法
- Now you can add (merge) states of aggregate functions by using the plus operator, and multiply the states of aggregate functions by a nonnegative constant. #3062 #3034
CREATE TABLE add_aggregate ( a UInt32, b UInt32 ) ENGINE = Memory INSERT INTO add_aggregate VALUES(1, 2); INSERT INTO add_aggregate VALUES(3, 1); SELECT minMerge(x) FROM ( SELECT minState(a) + minState(b) AS x FROM add_aggregate ) ┌─minMerge(x)─┐ │1 │ └─────────────┘
虛擬列
- Tables in the MergeTree family now have the virtual column _partition_id. #3089
SELECT _partition_id FROM test.partition_id ORDER BY _partition_id ASC ┌─_partition_id─┐ │ 197004│ │ 197004│ │ 197007│ │ 197007│ │ 197010│ │ 197010│ │ 201809│ │ 201809│ │ 201809│ └───────────────┘ SELECT * FROM test.partition_id ORDER BY d ASC ┌──────────d─┬───x─┐ │ 1970-04-11 │1 │ │ 1970-04-11 │1 │ │ 1970-07-20 │2 │ │ 1970-07-20 │2 │ │ 1970-10-28 │3 │ │ 1970-10-28 │3 │ │ 2018-09-13 │ 100 │ │ 2018-09-14 │ 100 │ │ 2018-09-15 │ 100 │ └────────────┴─────┘
Bug修復:
- 修復了
Dictionary
表的問題(丟擲Size of offsets doesn't match size of column
或Unknown compression method
異常)。此錯誤出現在版本18.10.3中。 ofollow,noindex">#2913 - 修復了合併
CollapsingMergeTree
表時如果其中一個數據部分為空(這些部分是在合併期間形成或者ALTER DELETE
所有資料都已刪除)的錯誤,並且該vertical
演算法用於合併。 #3049 - 在修正了比賽條件
DROP
或TRUNCATE
用於Memory
與同時表SELECT
,這可能導致伺服器崩潰。此錯誤出現在1.1.54388版本中。 #3038 - 修復了在返回錯誤時插入
Replicated
表時資料丟失的可能性Session is expired
(可以通過ReplicatedDataLoss
度量檢測到資料丟失)。版本1.1.54378中發生此錯誤。 #2939 #2949 #2964 - 修復了一段時間內的段錯誤
JOIN ... ON
。 #3000 - 修復了
WHERE
表示式完全由限定列名稱組成時的錯誤搜尋列名稱,例如WHERE table.column
。 #2994 - 修復了在執行分散式查詢時發生的“未找到列”錯誤,如果從遠端伺服器請求包含帶有子查詢的IN表示式的單個列。 #3087
- 修復了
Block structure mismatch in UNION stream: different number of columns
分散式查詢發生的錯誤,如果其中一個分片是本地分割槽而另一個分片不是,並且PREWHERE
觸發了移動優化。 #2226 #3037 #3055 #3065 #3073 #3090 #3093 - 修復了
pointInPolygon
非凸多邊形的某些情況的函式。 #2910 - 修正了
nan
與整數比較時的錯誤結果。 #3024 - 修復了
zlib-ng
庫中可能導致段錯誤的錯誤。 #2854 - 修復了插入帶有
AggregateFunction
列的表時的記憶體洩漏,如果聚合函式的狀態不簡單(單獨分配記憶體),以及單個插入請求是否導致多個小塊。 #3084 - 修復了同時建立和刪除相同
Buffer
或MergeTree
表格時的競爭條件。 - 修復了比較由某些非平凡型別組成的元組(例如元組)時出現段錯誤的可能性。 #2989
- 修復了執行某些
ON CLUSTER
查詢時出現段錯誤的可能性。 張冬 - 修復了陣列元素
arrayDistinct
函式中的錯誤Nullable
。 #2845 #2937 - 該
enable_optimize_predicate_expression
選項現在可以正確支援案例SELECT *
。 張冬 - 修復了重新初始化ZooKeeper會話時的段錯誤。 #2917
- 修復了使用ZooKeeper時潛在的阻塞問題。
- 修復了在a中新增巢狀資料結構的錯誤程式碼
SummingMergeTree
。 - 在為聚合函式的狀態分配記憶體時,正確地考慮了對齊,這使得在實現聚合函式的狀態時可以使用需要對齊的操作。 晨星-XC
安全修復:
- 安全使用ODBC資料來源。與ODBC驅動程式的互動使用單獨的
clickhouse-odbc-bridge
程序。第三方ODBC驅動程式中的錯誤不再導致伺服器穩定性或漏洞問題。 #2828 #2879 #2886 #2893 #2921 - 修復了
catBoostPool
表函式中檔案路徑的錯誤驗證問題。 #2894 - 系統表(的內容
tables
,databases
,parts
,columns
,parts_columns
,merges
,mutations
,replicas
,和replication_queue
)根據使用者的配置訪問資料庫過濾(allow_databases
)。 張冬
向後不相容的變化:
- 在使用JOIN的請求中,星形字元擴充套件為所有表中的列列表,符合SQL標準。您可以通過
asterisk_left_columns_only
在使用者配置級別設定為1 來恢復舊行為。
構建更改:
- 現在大多數整合測試都可以通過提交來執行。
- 程式碼樣式檢查也可以通過提交執行。
- 在
memcpy
上CentOS7 / Fedora的建設時,實施正確選擇。 Etienne Champetier - 使用clang進行構建時,
-Weverything
除了常規警告外,還添加了一些警告-Wall-Wextra -Werror
。 #2957 - 除錯構建使用
jemalloc
除錯選項。 - 用於與ZooKeeper互動的庫的介面被宣告為abstract。 #2950