elasticsearch系統性能調優總結

阿新 • • 發佈：2018-11-12

elasticsearch效能調優

叢集規劃

獨立的master節點，不儲存資料, 數量不少於2
資料節點(Data Node)
查詢節點(Query Node)，起到負載均衡的作用

叢集節點劃分

整個叢集的節點分為以下三種主要型別

Master nodes -- 負責維護叢集狀態，不儲存index資料，硬體要求：一般性的機器就可以，給es程序分配16g記憶體 Data Nodes -- 只儲存index的資料，不被選舉為Master nodes 硬體要求: 配置要求越高越好，使用大硬碟，有條件可以上SSD硬碟 Client Nodes -- 主要用於負載均衡，不被選舉為Master node, 也不儲存index資料硬體要求: 24核CPU, 64G記憶體或更高

kopf

./elasticsearch/bin/plugin install lmenezes/elasticsearch-kopf/{branch|version}

node.master: false
node.data: false
discovery.zen.ping.unicast.hosts: ["master1","master2","master3"]
network.host: ${HOSTNAME}

啟動elasticsearch

sudo service elasticsearch start

需要注意的是elasticsearch在centos中使用service elasticsearch restart有時不能達到效果，需要分開來做

sudo kill -9 `pgrep -f elasticsearch`
sudo service elasticsearch start

nginx反向代理

為了記錄針對叢集的查詢內容，建議使用nginx來做反向代理，nginx安裝在client node上，conf.d/default.conf 最簡單的配置如下

upstream elasticsearch {
        server 127.0.0.1:9200;
}

server {
    gzip on;
    access_log /var/log/nginx/access.log combined;
    listen       80 default_server;

    server_name  _;

    #charset koi8-r;

    #access_log  logs/host.access.log  main;

    # Load configuration files for the default server block.
    include /etc/nginx/default.d/*.conf;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;

        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_pass      http://elasticsearch;
    }

   error_page  404              /404.html;
    location = /404.html {
        root   /usr/share/nginx/html;
    }

    # redirect server error pages to the static page /50x.html
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }
}

外掛安裝

建議安裝如下外掛

node.master: false
node.data: true
discovery.zen.ping.unicast.hosts: ["master1","master2","master3"]
network.host: ${HOSTNAME}

如果為elasticsearch配置了多塊硬碟，可以修改 DATA_DIR 的值，多個目錄使用逗號(,)分開

node.master: true
node.data: false
discovery.zen.ping.unicast.hosts: ["master1","master2","master3"]
network.host: ${HOSTNAME}

一個合理的叢集應該包含三個master nodes, 1到多個data nodes, 最少一個client node

安裝與配置

通用配置，以centos為例，使用rpm安裝包

sudo rpm -ivh elasticsearch-version.rpm
sudo chkconfig --add elasticsearch

修改/etc/sysconfig/elasticsearch, 修改ES_HEAP_SIZE和JAVA_OPTS的內容，注意elasticsearch建議使用的最大記憶體是32G，

ES_HEAP_SIZE=32g
JAVA_OPTS="-Xms32g"

修改/etc/security/limits.conf, 新增如下內容

* hard memlock unlimited
* soft memlock unlimited

/etc/elasticsearch/elasticsearch.yml 內容配置

head 相容es 1.x
bigdesk 相容es 1.x
kopf 相容es 1.x, 2.x
client節點
data節點
master節點

Linux系統引數配置

檔案控制代碼

Linux中，每個程序預設開啟的最大檔案控制代碼數是1000,對於伺服器程序來說，顯然太小，通過修改/etc/security/limits.conf來增大開啟最大控制代碼數

* - nofile 65535

虛擬記憶體設定

max_map_count定義了程序能擁有的最多記憶體區域

sysctl -w vm.max_map_count=262144

修改/etc/elasticsearch/elasticsearch.yml

bootstrap.mlockall: true

修改/etc/security/limits.conf, 在limits.conf中新增如下內容

* soft memlock unlimited
* hard memlock unlimited

memlock 最大鎖定記憶體地址空間，要使limits.conf檔案配置生效，必須要確保pam_limits.so檔案被加入到啟動檔案中。

確保/etc/pam.d/login檔案中有如下內容

session required /lib/security/pam_limits.so

驗證是否生效

curl localhost:9200/_nodes/stats/process?pretty

磁碟快取相關引數

vm.dirty_background_ratio 這個引數指定了當檔案系統快取髒頁數量達到系統記憶體百分之多少時（如5%）就會觸發pdflush/flush/kdmflush等後臺回寫程序執行，將一定快取的髒頁非同步地刷入外存；

vm.dirty_ratio

該引數則指定了當檔案系統快取髒頁數量達到系統記憶體百分之多少時（如10%），系統不得不開始處理快取髒頁（因為此時髒頁數量已經比較多，為了避免資料丟失需要將一定髒頁刷入外存）；在此過程中很多應用程序可能會因為系統轉而處理檔案IO而阻塞。
把該引數適當調小，原理通（1）類似。如果cached的髒資料所佔比例（這裡是佔MemTotal的比例）超過這個設定，系統會停止所有的應用層的IO寫操作，等待刷完資料後恢復IO。所以萬一觸發了系統的這個操作，對於使用者來說影響非常大的。

sysctl -w vm.dirty_ratio=10
sysctl -w vm.dirty_background_ratio=5

為了將設定永久儲存，將上述配置項寫入/etc/sysctl.conf檔案中

vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

swap調優

swap空間是一塊磁碟空間，作業系統使用這塊空間儲存從記憶體中換出的作業系統不常用page資料，這樣可以分配出更多的記憶體做page cache。這樣通常會提升系統的吞吐量和IO效能，但同樣會產生很多問題。頁面頻繁換入換出會產生IO讀寫、作業系統中斷，這些都很影響系統的效能。這個值越大作業系統就會更加積極的使用swap空間。

調節swappniess方法如下

sudo sh -c 'echo "0">/proc/sys/vm/swappiness'

io sched

如果叢集中使用的是SSD磁碟，那麼可以將預設的io sched由cfq設定為noop

sudo sh -c 'echo "noop">/sys/block/sda/queue/scheduler'

JVM引數設定

在/etc/sysconfig/elasticsearch中設定最大堆記憶體，該值不應超過32G

ES_HEAP_SIZE=32g
ES_JAVA_OPTS="-Xms32g"
MAX_LOCKED_MEMORY=unlimited
MAX_OPEN_FILES=65535

indice引數調優

以建立demo_logs模板為例，說明可以調優的引數及其數值設定原因。

PUT _template/demo_logs
{
      "order": 6,
      "template": "demo-*",
      "settings": {
         "index.merge.policy.segments_per_tier": "25",
         "index.mapping._source.compress": "true",
         "index.mapping._all.enabled": "false",
         "index.warmer.enabled": "false",
         "index.merge.policy.min_merge_size": "10mb",
         "index.refresh_interval": "60s",
         "index.number_of_shards": "7",
         "index.translog.durability": "async",
         "index.store.type": "mmapfs",
         "index.merge.policy.floor_segment": "100mb",
         "index.merge.scheduler.max_thread_count": "1",
         "index.translog.translog.flush_threshold_size": "1g",
         "index.merge.policy.merge_factor": "15",
         "index.translog.translog.flush_threshold_period": "100m",
         "index.translog.sync_interval": "5s",
         "index.number_of_replicas": "1",
         "index.indices.store.throttle.max_bytes_per_sec": "50mb",
         "index.routing.allocation.total_shards_per_node": "2",
         "index.translog.flush_threshold_ops": "1000000"
      },
      "mappings": {
         "_default_": {
            "dynamic_templates": [
               {
                  "string_template": {
                     "mapping": {
                        "index": "not_analyzed",
                        "ignore_above": "10915",
                        "type": "string"
                     },
                     "match_mapping_type": "string"
                  }
               },
               {
                  "level_fields": {
                     "mapping": {
                        "index": "no",
                        "type": "string"
                     },
                     "match": "Level*Exception*"
                  }
               }
            ]
         }
        }
      "aliases": {}
   }

replica數目

為了讓建立的es index在每臺datanode上均勻分佈，同一個datanode上同一個index的shard數目不應超過3個。

計算公式: (number_of_shard * (1+number_of_replicas)) < 3*number_of_datanodes

每臺機器上分配的shard數目

"index.routing.allocation.total_shards_per_node": "2",

refresh時間間隔

預設的重新整理時間間隔是1s，對於寫入量很大的場景，這樣的配置會導致寫入吞吐量很低，適當提高重新整理間隔，可以提升寫入量，代價就是讓新寫入的資料在60s之後可以被搜尋，新資料可見的及時性有所下降。

"index.refresh_interval": "60s"

translog

降低資料flush到磁碟的頻率。如果對資料丟失有一定的容忍，可以開啟async模式。

"index.translog.flush_threshold_ops": "1000000",
"index.translog.durability": "async",

merge相關引數

"index.merge.policy.floor_segment": "100mb",
"index.merge.scheduler.max_thread_count": "1",
"index.merge.policy.min_merge_size": "10mb"

mapping設定

對於不參與搜尋的欄位(fields), 將其index方法設定為no, 如果對分詞沒有需求，對參與搜尋的欄位，其index方法設定為not_analyzed

多使用dynamic_template

叢集引數調優

{
   "persistent": {
      "cluster": {
         "routing": {
            "allocation": {
               "enable": "new_primaries",
               "cluster_concurrent_rebalance": "8",
               "allow_rebalance": "indices_primaries_active",
               "node_concurrent_recoveries": "8"
            }
         }
      },
      "indices": {
         "breaker": {
            "fielddata": {
               "limit": "30%"
            },
            "request": {
               "limit": "30%"
            }
         },
         "recovery": {
            "concurrent_streams": "10",
            "max_bytes_per_sec": "200mb"
         }
      }
   },
   "transient": {
      "indices": {
         "store": {
            "throttle": {
               "type": "merge",
               "max_bytes_per_sec": "50mb"
            }
         },
         "recovery": {
            "concurrent_streams": "8"
         }
      },
      "threadpool": {
         "bulk": {
            "type": "fixed"
            "queue_size": "1000",
            "size": "30"
         },
         "index": {
            "type": "fixed",
            "queue_size": "1200",
            "size": "30"
         }
      },
      "cluster": {
         "routing": {
            "allocation": {
               "enable": "all",
               "cluster_concurrent_rebalance": "8",
               "node_concurrent_recoveries": "15"
            }
         }
      }
   }
}

避免shard的頻繁rebalance，將allocation的型別設定為new_primaries, 將預設並行rebalance由2設定為更大的一些的值

避免每次更新mapping, 針對2.x以下的版本

"indices.cluster.send_refresh_mapping": false

調整threadpool, size不要超過core數目，否則執行緒之間的context switching會消耗掉大量的cpu時間，導致load過高。如果沒有把握，那就不要去調整。

定期清理cache

為避免fields data佔用大量的jvm記憶體，可以通過定期清理的方式來釋放快取的資料。釋放的內容包括field data, filter cache, query cache

curl -XPOST "localhost:9200/_cache/clear"

其它

marvel: 安裝marvel外掛，多觀察系統資源佔用情況，包括記憶體，cpu
日誌: 對es的執行日誌要經常檢視，檢查index配置是否合理，以及入庫資料是否存在異常

調優之後的執行效果

寫入量穩定在30K/s