1. 程式人生 > >prometheus 版本1.7 常用啟動參數

prometheus 版本1.7 常用啟動參數

prometheus 版本1.7 常用啟動參數

參數是使用./prometheus -h 獲取的,部分翻譯參考網上已有資料。部分參數已經廢棄了,因此我這裏就沒有列出來。



prometheus 版本1.7 常用啟動參數:

日誌類:

-log.level 可選值 [debug, info, warn, error, fatal] 例:-log.level "info"

-log.format 可選輸出到syslog或者控制臺 例:-log.format "logger:syslog?appname=prom&local=7"



查詢類:

-query.max-concurrency 20 最大支持的並發查詢量

-query.staleness-delta 5m0s

Staleness delta allowance during expression evaluations.

-query.timeout 2m0s 查詢超時時間,2分鐘。超過自動被kill掉。


存儲類:

-storage.local.checkpoint-dirty-series-limit 5000 崩潰恢復時候,只恢復5000個時序數據,這樣減少了prometheus的恢復時間。如果是SSD盤,可以適當增大這個值。

-storage.local.checkpoint-interval 5m0s 5分鐘執行一次落盤,將in-memory metrics and chunks持久化到磁盤。

-storage.local.chunk-encoding-version 1 chunks的編碼格式 ,默認是1

-storage.local.engine "persisted" 開啟持久化

-storage.local.index-cache-size.label-name-to-label-values 10485760 存放prometheus裏面定義的 label名稱的 index cache大小,默認10MB

-storage.local.path "/bdata/data/nowdb2"

-storage.local.retention 8760h0m0s 保存1年的數據

-storage.local.series-file-shrink-ratio 0.3 表示 30%的chunks被移除的時候才觸發rewrite

-storage.local.num-fingerprint-mutexes 4096 當prometheus server端在進行checkpoint操作或者處理開銷較大的查詢的時候,采集指標的操作會有短暫的停頓,這是因為prometheus給時間序列分配的mutexes可能不夠用,可以通過這個指標來增大預分配的mutexes,有時候可以設置到上萬個。

-storage.local.series-sync-strategy "adaptive"

-storage.local.target-heap-size 2147483648 # prometheus獨占的內存空間,默認2GB的內存空間,建議不要超過3GB



Web配置:

-web.listen-address ":9090"

-web.max-connections 512

-web.read-timeout 30s



目前在用的啟動參數:

nohup ./prometheus -log.level "info" -log.format "logger:syslog?appname=prom&local=7" info -storage.local.checkpoint-dirty-series-limit 5000 -storage.local.checkpoint-interval 5m0s -storage.local.chunk-encoding-version 1 -storage.local.engine "persisted" -storage.local.index-cache-size.label-name-to-label-values 10485760 -storage.local.path "/bdata/data/nowdb2" -storage.local.retention 8760h0m0s -storage.local.series-file-shrink-ratio 0.3 -storage.local.series-sync-strategy "adaptive" -storage.local.target-heap-size 2147483648 &



重載配置文件:

kill -SIGHUP $(pidof prometheus)



關閉進程:

kill -SIGTERM $(pidof prometheus)





######################################################################################################

補充: ./prometheus -h的結果:


usage: prometheus [<args>]


-version false

Print version information.

-config.file "prometheus.yml"

Prometheus configuration file name.

== ALERTMANAGER ==

-alertmanager.notification-queue-capacity 10000

The capacity of the queue for pending alert manager notifications.

-alertmanager.timeout 10s

Alert manager HTTP API timeout.

-alertmanager.url

Comma-separated list of Alertmanager URLs to send notifications to.

== LOG ==

-log.format "\"logger:stderr\""

Set the log target and format. Example:

"logger:syslog?appname=bob&local=7" or "logger:stdout?json=true"

-log.level "\"info\""

Only log messages with the given severity or above. Valid levels:

[debug, info, warn, error, fatal]

== QUERY ==

-query.max-concurrency 20 最大支持的並發查詢量

Maximum number of queries executed concurrently.

-query.staleness-delta 5m0s

Staleness delta allowance during expression evaluations.

-query.timeout 2m0s 查詢超時時間,2分鐘。超過自動被kill掉。

Maximum time a query may take before being aborted.

== STORAGE ==

-storage.local.checkpoint-dirty-series-limit 5000 崩潰恢復時候,只恢復5000個時序數據,這樣減少了prometheus的恢復時間。如果是SSD盤,可以適當增大這個值。

If approx. that many time series are in a state that would require

a recovery operation after a crash, a checkpoint is triggered, even if

the checkpoint interval hasn‘t passed yet. A recovery operation requires

a disk seek. The default limit intends to keep the recovery time below

1min even on spinning disks. With SSD, recovery is much faster, so you

might want to increase this value in that case to avoid overly frequent

checkpoints. Also note that a checkpoint is never triggered before at

least as much time has passed as the last checkpoint took.

-storage.local.checkpoint-interval 5m0s 5分鐘執行一次落盤,將in-memory metrics and chunks持久化到磁盤。

The time to wait between checkpoints of in-memory metrics and

chunks not yet persisted to series files. Note that a checkpoint is never

triggered before at least as much time has passed as the last checkpoint

took.

-storage.local.chunk-encoding-version 1 chunks的編碼格式 ,默認是1

Which chunk encoding version to use for newly created chunks.

Currently supported is 0 (delta encoding), 1 (double-delta encoding), and

2 (double-delta encoding with variable bit-width).

-storage.local.dirty=false 是否強制開啟crash recovery功能。默認 -storage.local.dirty=false的。

如果您懷疑數據庫中的損壞引起的問題,可設置啟動的時候 -storage.local.dirty=true強制執行crash recovery

If set, the local storage layer will perform crash recovery even if

the last shutdown appears to be clean.


-storage.local.engine "persisted"

Local storage engine. Supported values are: ‘persisted‘ (full local

storage with on-disk persistence) and ‘none‘ (no local storage).


-storage.local.index-cache-size.fingerprint-to-metric 10485760

The size in bytes for the fingerprint to metric index cache.

-storage.local.index-cache-size.fingerprint-to-timerange 5242880

The size in bytes for the metric time range index cache.


上面2個參數的作用: Increase the size if you have a large number of archived time series, i.e. series that have not received samples in a while but are still not old enough to be purged completely.



-storage.local.index-cache-size.label-name-to-label-values 10485760 存放prometheus裏面定義的 label名稱的 index cache大小,默認10MB

The size in bytes for the label name to label values index cache.

-storage.local.index-cache-size.label-pair-to-fingerprints 20971520 #

The size in bytes for the label pair to fingerprints index cache. Increase the size if a large number of time series share the same label pair or name.

-storage.local.max-chunks-to-persist 0 廢棄的參數

Deprecated. This flag has no effect anymore.

-storage.local.memory-chunks 0 廢棄的參數 設定prometheus內存中保留的chunks的最大個數

Deprecated. If set, -storage.local.target-heap-size will be set to

this value times 3072.

-storage.local.num-fingerprint-mutexes 4096

The number of mutexes used for fingerprint locking.

當prometheus server端在進行checkpoint操作或者處理開銷較大的查詢的時候,采集指標的操作會有短暫的停頓,這是因為prometheus給時間序列分配的mutexes可能不夠用,可以通過這個指標來增大預分配的mutexes,有時候可以設置到上萬個。

-storage.local.path "data"

Base path for metrics storage.

-storage.local.pedantic-checks false 默認false 如果設置true,崩潰恢復時候會檢查每一個序列文件

If set, a crash recovery will perform checks on each series file.

This might take a very long time.

-storage.local.retention 360h0m0s 歷史數據存儲多久,默認15天。

How long to retain samples in the local storage.


-storage.local.series-file-shrink-ratio 0.1

A series file is only truncated (to delete samples that have

exceeded the retention period) if it shrinks by at least the provided

ratio. This saves I/O operations while causing only a limited storage

space overhead. If 0 or smaller, truncation will be performed even for a

single dropped chunk, while 1 or larger will effectively prevent any

truncation.

用來控制序列文件rewrite的時機,默認是在10%的chunks被移除的時候進行rewrite,如果磁盤空間夠大,不想頻繁rewrite,可以提升該值,比如0.3,即30%的chunks被移除的時候才觸發rewrite。


-storage.local.series-sync-strategy "adaptive"

When to sync series files after modification. Possible values:

‘never‘, ‘always‘, ‘adaptive‘. Sync‘ing slows down storage performance

but reduces the risk of data loss in case of an OS crash. With the

‘adaptive‘ strategy, series files are sync‘d for as long as the storage

is not too much behind on chunk persistence.

控制寫入數據之後,何時同步到磁盤,有‘never‘, ‘always‘, ‘adaptive‘. 同步操作可以降低因為操作系統崩潰帶來數據丟失,但是會降低寫入數據的性能。

默認為adaptive的策略,即不會寫完數據就立刻同步磁盤,會利用操作系統的page cache來批量同步。



-storage.local.target-heap-size 2147483648 # prometheus獨占的內存空間,默認2GB的內存空間,建議不要超過3GB

The metrics storage attempts to limit its own memory usage such

that the total heap size approaches this value. Note that this is not a

hard limit. Actual heap size might be temporarily or permanently higher

for a variety of reasons. The default value is a relatively safe setting

to not use more than 3 GiB physical memory.

-storage.remote.graphite-address

WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB,

Graphite, and OpenTSDB has been removed. Use Prometheus‘s generic remote

write feature for building remote storage integrations. See

https://prometheus.io/docs/operating/configuration/#<remote_write>

-storage.remote.graphite-prefix

WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB,

Graphite, and OpenTSDB has been removed. Use Prometheus‘s generic remote

write feature for building remote storage integrations. See

https://prometheus.io/docs/operating/configuration/#<remote_write>

-storage.remote.graphite-transport

WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB,

Graphite, and OpenTSDB has been removed. Use Prometheus‘s generic remote

write feature for building remote storage integrations. See

https://prometheus.io/docs/operating/configuration/#<remote_write>

-storage.remote.influxdb-url

WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB,

Graphite, and OpenTSDB has been removed. Use Prometheus‘s generic remote

write feature for building remote storage integrations. See

https://prometheus.io/docs/operating/configuration/#<remote_write>

-storage.remote.influxdb.database

WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB,

Graphite, and OpenTSDB has been removed. Use Prometheus‘s generic remote

write feature for building remote storage integrations. See

https://prometheus.io/docs/operating/configuration/#<remote_write>

-storage.remote.influxdb.retention-policy

WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB,

Graphite, and OpenTSDB has been removed. Use Prometheus‘s generic remote

write feature for building remote storage integrations. See

https://prometheus.io/docs/operating/configuration/#<remote_write>

-storage.remote.influxdb.username

WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB,

Graphite, and OpenTSDB has been removed. Use Prometheus‘s generic remote

write feature for building remote storage integrations. See

https://prometheus.io/docs/operating/configuration/#<remote_write>

-storage.remote.opentsdb-url

WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB,

Graphite, and OpenTSDB has been removed. Use Prometheus‘s generic remote

write feature for building remote storage integrations. See

https://prometheus.io/docs/operating/configuration/#<remote_write>

-storage.remote.timeout

WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB,

Graphite, and OpenTSDB has been removed. Use Prometheus‘s generic remote

write feature for building remote storage integrations. See

https://prometheus.io/docs/operating/configuration/#<remote_write>

== WEB ==

-web.console.libraries "console_libraries"

Path to the console library directory.

-web.console.templates "consoles"

Path to the console template directory, available at /consoles.

-web.enable-remote-shutdown false

Enable remote service shutdown.

-web.external-url

The URL under which Prometheus is externally reachable (for

example, if Prometheus is served via a reverse proxy). Used for

generating relative and absolute links back to Prometheus itself. If the

URL has a path portion, it will be used to prefix all HTTP endpoints

served by Prometheus. If omitted, relevant URL components will be derived

automatically.

-web.listen-address ":9090"

Address to listen on for the web interface, API, and telemetry.

-web.max-connections 512

Maximum number of simultaneous connections.

-web.read-timeout 30s

Maximum duration before timing out read of the request, and closing

idle connections.

-web.route-prefix

Prefix for the internal routes of web endpoints. Defaults to path

of -web.external-url.

-web.telemetry-path "/metrics"

Path under which to expose metrics.

-web.user-assets

Path to static asset directory, available at /user.


本文出自 “一只菜雞的筆記” 博客,請務必保留此出處http://lee90.blog.51cto.com/10414478/1953896

prometheus 版本1.7 常用啟動參數