運維監控系統之Open-Falcon
運維監控系統之Open-Falcon
一、Open-Falcon介紹
open-falcon是一款用golang和python寫的監控系統,由小米啟動這個專案。
1、監控系統,可以從運營級別(基本配置即可),以及應用級別(二次開發,通過埠進行日誌上報),對伺服器、作業系統、中介軟體、應用進行全面的監控,及報警,對我們的系統正常執行的作用非常重要。
2、基礎監控
CPU、Load、記憶體、磁碟、IO、網路相關、核心引數、ss 統計輸出、埠採集、核心服務的程序存活資訊採集、關鍵業務程序資源消耗、NTP offset採集、DNS解析採集,這些指標,都是open-falcon的agent元件直接支援的。
Linux運維基礎採集項: ofollow,noindex">http://book.open-falcon.org/zh/faq/linux-metrics.html
對於這些基礎監控選項全部理解透徹的時刻,也就是對Linux執行原理及命令進階的時刻。
3、第三方監控
術業有專攻,執行在OS上的應用甚多,Open-Falcon的開發團隊不可能把所有的第三方應用的監控全部做完,這個就需要開源社群提供更多的外掛,當前對於很多常用的第三方應用都有相關外掛了。
4、JVM監控
對於Java作為主要開發語言的大多數公司,對於JVM的監控不可或缺。
每個JVM應用的引數,比如GC、類載入、JVM記憶體、程序、執行緒,都可以上報給Falcon,而這些引數的獲得,都可以通過MxBeans實現。
使用 Java 平臺管理 bean: http://www.ibm.com/developerworks/cn/java/j-mxbeans/
5、業務應用監控
對於業務需要監控的介面,比如響應時間等。可以根據業務的需要,上報相關資料到Falcon,並通過Falcon檢視結果。
官方網址: http://open-falcon.org/
中文文件: https://book.open-falcon.org/zh_0_2/
中英文件: https://book.open-falcon.org
軟體下載: https://github.com/open-falcon/falcon-plus/releases
二、Open-Falcon編寫的整個腦洞歷程
三、環境準備
1.系統環境
[root@open-falcon-server ~]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core)
2.系統優化
#安裝下載軟體 yum install wget -y #更換aliyun源 mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo #下載epel源 yum install epel-release.noarch -y rpm -Uvh http://mirrors.aliyun.com/epel/epel-release-latest-7.noarch.rpm yum clean all yum makecache #下載常用軟體 yum install git telnet net-tools tree nmap sysstat lrzsz dos2unix tcpdump ntpdate -y #配置時間同步 ntpdate cn.pool.ntp.org #更改主機名 hostnamectl set-hostname open-falcon-server hostname open-falcon-server #開啟快取 sed -i 's#keepcache=0#keepcache=1#g' /etc/yum.conf grep keepcache /etc/yum.conf #關閉selinux sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config setenforce 0 #關閉防火牆 systemctl stop firewalld.service systemctl disable firewalld.service
3.軟體環境準備
(1)redis準備
#安裝 redis yum install redis -y #redis常用命令 redis-serverredis 服務端 redis-cli redis 命令列客戶端 redis-benchmarkredis 效能測試工具 redis-check-aof AOF檔案修復工具 redis-check-dumpRDB檔案修復工具 redis-sentinel Sentinel 服務端 #啟動redis [root@open-falcon-server ~]# redis-server & [1] 1662 [root@open-falcon-server ~]# 1662:C 27 Jul 14:44:56.463 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf 1662:M 27 Jul 14:44:56.464 * Increased maximum number of open files to 10032 (it was originally set to 1024). _._ _.-``__ ''-._ _.-```.`_.''-._Redis 3.2.10 (00000000/0) 64 bit .-`` .-```.```\/_.,_ ''-._ (',.-`| `,)Running in standalone mode |`-._`-...-` __...-.``-._|'` _.-'|Port: 6379 |`-._`._/_.-'|PID: 1662 `-._`-._`-./_.-'_.-' |`-._`-._`-.__.-'_.-'_.-'| |`-._`-.__.-'_.-'|http://redis.io `-._`-._`-.__.-'_.-'_.-' |`-._`-._`-.__.-'_.-'_.-'| |`-._`-.__.-'_.-'| `-._`-._`-.__.-'_.-'_.-' `-._`-.__.-'_.-' `-.__.-' `-.__.-' 1662:M 27 Jul 14:44:56.464 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1662:M 27 Jul 14:44:56.464 # Server started, Redis version 3.2.10 1662:M 27 Jul 14:44:56.464 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 1662:M 27 Jul 14:44:56.464 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 1662:M 27 Jul 14:44:56.464 * The server is now ready to accept connections on port 6379
(2)mysql準備
#安裝mysql yum install mariadb mariadb-server -y #啟動mysql systemctl start mariadb systemctl enable mariadb #登入資料庫測試 [root@open-falcon-server ~]# mysql -uroot -p Enter password: Welcome to the MariaDB monitor.Commands end with ; or \g. Your MariaDB connection id is 4 Server version: 5.5.56-MariaDB MariaDB Server Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> exit Bye #檢查服務 [root@open-falcon-server ~]# netstat -lntp|egrep "3306|6379" tcp00 0.0.0.0:33060.0.0.0:*LISTEN1978/mysqld tcp00 0.0.0.0:63790.0.0.0:*LISTEN1662/redis-server * tcp600 :::6379:::*LISTEN1662/redis-server * #初始化SQL/">MySQL表結構 cd /tmp/ && git clone https://github.com/open-falcon/falcon-plus.git cd /tmp/falcon-plus/scripts/mysql/db_schema/ mysql -h 127.0.0.1 -u root -p < 1_uic-db-schema.sql mysql -h 127.0.0.1 -u root -p < 2_portal-db-schema.sql mysql -h 127.0.0.1 -u root -p < 3_dashboard-db-schema.sql mysql -h 127.0.0.1 -u root -p < 4_graph-db-schema.sql mysql -h 127.0.0.1 -u root -p < 5_alarms-db-schema.sql rm -rf /tmp/falcon-plus/ #設定資料庫密碼 mysqladmin -uroot password "123456" #檢查匯入的資料庫 [root@open-falcon-server ~]# mysql -uroot -p Enter password: Welcome to the MariaDB monitor.Commands end with ; or \g. Your MariaDB connection id is 11 Server version: 5.5.56-MariaDB MariaDB Server Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> show databases; +--------------------+ | Database| +--------------------+ | information_schema | | alarms| | dashboard| | falcon_portal| | graph| | mysql| | performance_schema | | test| | uic| +--------------------+ 9 rows in set (0.00 sec) MariaDB [(none)]> exit Bye
(3)Go安裝
#安裝go語言開發包 yum install golang -y #檢查版本 [root@open-falcon-server ~]# go version go version go1.9.4 linux/amd64 #檢視Go安裝路徑 [root@open-falcon-server ~]# find / -name go /etc/alternatives/go /var/lib/alternatives/go /usr/bin/go /usr/lib/golang/src/cmd/go#需要這個路徑 /usr/lib/golang/src/go /usr/lib/golang/bin/go /usr/lib/golang/pkg/linux_amd64/cmd/go /usr/lib/golang/pkg/linux_amd64/go
四、Open-Falcon後端
#建立工作目錄 export FALCON_HOME=/home/work export WORKSPACE=$FALCON_HOME/open-falcon mkdir -p $WORKSPACE #下載解壓二進位制包 wget https://github.com/open-falcon/falcon-plus/releases/download/v0.2.1/open-falcon-v0.2.1.tar.gz tar xf open-falcon-v0.2.1.tar.gz -C $WORKSPACE #檢視解壓結果 [root@open-falcon-server ~]# cd $WORKSPACE [root@open-falcon-server open-falcon]# ll 總用量 3896 drwxrwxr-x 7 501 50167 8月15 2017 agent drwxrwxr-x 5 501 50140 8月15 2017 aggregator drwxrwxr-x 5 501 50140 8月15 2017 alarm drwxrwxr-x 6 501 50151 8月15 2017 api drwxrwxr-x 5 501 50140 8月15 2017 gateway drwxrwxr-x 6 501 50151 8月15 2017 graph drwxrwxr-x 5 501 50140 8月15 2017 hbs drwxrwxr-x 5 501 50140 8月15 2017 judge drwxrwxr-x 5 501 50140 8月15 2017 nodata -rwxrwxr-x 1 501 501 3987469 8月15 2017 open-falcon lrwxrwxrwx 1 501 50116 8月15 2017 plugins -> ./agent/plugins/ lrwxrwxrwx 1 501 50115 8月15 2017 public -> ./agent/public/ drwxrwxr-x 5 501 50140 8月15 2017 transfer
模組 | 檔案所在路徑 |
---|---|
aggregator | /home/work/aggregator/config/cfg.json |
graph | /home/work/graph/config/cfg.json |
hbs | /home/work/hbs/config/cfg.json |
nodata | /home/work/nodata/config/cfg.json |
api | /home/work/api/config/cfg.json |
alarm | /home/work/alarm/config/cfg.json |
#修改配置檔案 sed -i 's#root:@tcp(127.0.0.1:3306)#root:123456@tcp(127.0.0.1:3306)#g' `find ./ -type f -name "cfg.json"|egrep "alarm|api|nodata|hbs|graph|aggregator"` cat `find ./ -type f -name "cfg.json"|egrep "alarm|api|nodata|hbs|graph|aggregator"` |grep 'root:123456@tcp(127.0.0.1:3306)' #啟動後端模組 [root@open-falcon-server open-falcon]# cd /home/work/open-falcon [root@open-falcon-server open-falcon]# ./open-falcon start [falcon-graph] 5583 [falcon-hbs] 5592 [falcon-judge] 5600 [falcon-transfer] 5606 [falcon-nodata] 5613 [falcon-aggregator] 5620 [falcon-agent] 5628 [falcon-gateway] 5635 [falcon-api] 5641 [falcon-alarm] 5653 #檢查服務啟動狀態 [root@open-falcon-server open-falcon]# ./open-falcon check falcon-graphUP5583 falcon-hbsUP5592 falcon-judgeUP5600 falcon-transferUP5606 falcon-nodataUP5613 falcon-aggregatorUP5620 falcon-agentUP5628 falcon-gatewayUP5635 falcon-apiUP5641 falcon-alarmUP5653 #更多命令列工具用法 # ./open-falcon [start|stop|restart|check|monitor|reload] module ./open-falcon start agent ./open-falcon check falcon-graphUP53007 falcon-hbsUP53014 falcon-judgeUP53020 falcon-transferUP53026 falcon-nodataUP53032 falcon-aggregatorUP53038 falcon-agentUP53044 falcon-gatewayUP53050 falcon-apiUP53056 falcon-alarmUP53063 #For debugging , You can check $WorkDir/$moduleName/log/logs/xxx.log 至此後端部署完成。 #其他用法 過載配置(備註:修改vi cfg.json配置檔案後,可以用下面命令過載配置) curl 127.0.0.1:1988/config/reload
五、Open-Falcon前端
#建立工作目錄 export HOME=/home/work export WORKSPACE=$HOME/open-falcon mkdir -p $WORKSPACE cd $WORKSPACE #克隆前端元件程式碼 git clone https://github.com/open-falcon/dashboard.git #安裝依賴包 yum install -y python-virtualenv yum install -y python-devel yum install -y openldap-devel yum install -y mysql-devel yum groupinstall "Development tools" -y #下載ez_setup.py cd ~ wget --no-check-certificate https://bootstrap.pypa.io/ez_setup.py python ez_setup.py --insecure #下載安裝pip wget https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9 tar xf pip-9.0.1.tar.gz cd pip-9.0.1 python setup.py install #解決pip安裝慢 mkdir -p ~/.pip echo '[global]' >>~/.pip/pip.conf echo 'index-url = https://pypi.tuna.tsinghua.edu.cn/simple' >>~/.pip/pip.conf #測試是否可用 [root@open-falcon-server ~]# cd /home/work/open-falcon/dashboard [root@open-falcon-server dashboard]# pip -V pip 9.0.1 from /usr/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg (python 2.7) [root@open-falcon-server dashboard]# pip Usage: pip <command> [options] Commands: installInstall packages. downloadDownload packages. uninstallUninstall packages. freezeOutput installed packages in requirements format. listList installed packages. showShow information about installed packages. checkVerify installed packages have compatible dependencies. searchSearch PyPI for packages. wheelBuild wheels from your requirements. hashCompute hashes of package archives. completionA helper command used for command completion. helpShow help for commands. General Options: -h, --helpShow help. --isolatedRun pip in an isolated mode, ignoring environment variables and user configuration. -v, --verboseGive more output. Option is additive, and can be used up to 3 times. -V, --versionShow version and exit. -q, --quietGive less output. Option is additive, and can be used up to 3 times (corresponding to WARNING, ERROR, and CRITICAL logging levels). --log <path>Path to a verbose appending log. --proxy <proxy>Specify a proxy in the form [user:passwd@]proxy.server:port. --retries <retries>Maximum number of retries each connection should attempt (default 5 times). --timeout <sec>Set the socket timeout (default 15 seconds). --exists-action <action>Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup, (a)bort. --trusted-host <hostname>Mark this host as trusted, even though it does not have valid or any HTTPS. --cert <path>Path to alternate CA bundle. --client-cert <path>Path to SSL client certificate, a single file containing the private key and the certificate in PEM format. --cache-dir <dir>Store the cache data in <dir>. --no-cache-dirDisable the cache. --disable-pip-version-check Don't periodically check PyPI to determine whether a new version of pip is available for download. Implied with --no-index. #檢視需要安裝模組 [root@open-falcon-server dashboard]# cat pip_requirements.txt Flask==0.10.1 Flask-Babel==0.9 Jinja2==2.7.2 Werkzeug==0.9.4 gunicorn==19.5.0 python-dateutil==2.2 requests==2.3.0 mysql-python python-ldap #安裝模組 pip install -r pip_requirements.txt #修改配置檔案 配置說明: dashboard的配置檔案為: 'rrd/config.py',根據實際情況修改: # API_ADDR 表示後端api元件的地址 API_ADDR = "http://127.0.0.1:8080/api/v1" # 根據實際情況,修改PORTAL_DB_*, 預設使用者名稱為root,預設密碼為"" # 根據實際情況,修改ALARM_DB_*, 預設使用者名稱為root,預設密碼為"" 配置修改: cp rrd/config.py{,.bak} vim rrd/config.py 修改內容: # Falcon+ API API_ADDR = os.environ.get("API_ADDR","http://10.0.0.100:8080/api/v1") # portal database # TODO: read from api instead of db PORTAL_DB_HOST = os.environ.get("PORTAL_DB_HOST","10.0.0.100") PORTAL_DB_PORT = int(os.environ.get("PORTAL_DB_PORT",3306)) PORTAL_DB_USER = os.environ.get("PORTAL_DB_USER","root") PORTAL_DB_PASS = os.environ.get("PORTAL_DB_PASS","123456") PORTAL_DB_NAME = os.environ.get("PORTAL_DB_NAME","falcon_portal") # alarm database # TODO: read from api instead of db ALARM_DB_HOST = os.environ.get("ALARM_DB_HOST","10.0.0.100") ALARM_DB_PORT = int(os.environ.get("ALARM_DB_PORT",3306)) ALARM_DB_USER = os.environ.get("ALARM_DB_USER","root") ALARM_DB_PASS = os.environ.get("ALARM_DB_PASS","123456") ALARM_DB_NAME = os.environ.get("ALARM_DB_NAME","alarms") #啟動服務 [root@open-falcon-server dashboard]# virtualenv ./env New python executable in /home/work/open-falcon/dashboard/env/bin/python Installing setuptools, pip, wheel...done. [root@open-falcon-server dashboard]# source env/bin/activate (env) [root@open-falcon-server dashboard]# ./control start falcon-dashboard started..., pid=20814 (env) [root@open-falcon-server dashboard]# ./control tail [2018-07-27 16:37:02 +0000] [20814] [INFO] Starting gunicorn 19.5.0 [2018-07-27 16:37:02 +0000] [20814] [INFO] Listening at: http://0.0.0.0:8081 (20814) [2018-07-27 16:37:02 +0000] [20814] [INFO] Using worker: sync [2018-07-27 16:37:02 +0000] [20819] [INFO] Booting worker with pid: 20819 [2018-07-27 16:37:02 +0000] [20820] [INFO] Booting worker with pid: 20820 [2018-07-27 16:37:02 +0000] [20821] [INFO] Booting worker with pid: 20821 [2018-07-27 16:37:02 +0000] [20826] [INFO] Booting worker with pid: 20826 ^C (env) [root@open-falcon-server dashboard]# deactivate
六、訪問網站
http://10.0.0.100:8081

訪問網站
#dashbord使用者管理 dashbord沒有預設建立任何賬號包括管理賬號,需要你通過頁面進行註冊賬號。 想擁有管理全域性的超級管理員賬號,需要手動註冊使用者名稱為root的賬號(第一個帳號名稱為root的使用者會被自動設定為超級管理員)。 超級管理員可以給普通使用者分配許可權管理。 小提示:註冊賬號能夠被任何開啟dashboard頁面的人註冊,所以當給相關的人註冊完賬號後,需要去關閉註冊賬號功能。只需要去修改api元件的配置檔案cfg.json,將signup_disable配置項修改為true,重啟api即可。當需要給人開賬號的時候,再將配置選項改回去,用完再關掉即可。

首頁
七、Open-Falcon客戶端
#服務端操作 [root@open-falcon-server ~]# cd /home/work/open-falcon [root@open-falcon-server open-falcon]# scp -r agent [email protected]:/home/ [root@open-falcon-server open-falcon]# scp -r open-falcon [email protected]:/home/ #客戶端操作 [root@open-falcon-client ~]# mkdir -p /home/work/open-falcon [root@open-falcon-client ~]# mkdir -p /home/work/open-falcon [root@open-falcon-client ~]# mv /home/open-falcon /home/agent /home/work/open-falcon [root@open-falcon-client ~]# cd /home/work/open-falcon [root@open-falcon-client open-falcon]# vim agent/config/cfg.json 修改內容: { "debug": true,# 控制一些debug資訊的輸出,生產環境通常設定為false "hostname": "", # agent採集了資料發給transfer,endpoint就設定為了hostname,預設通過`hostname`獲取,如果配置中配置了hostname,就用配置中的 "ip": "", # agent與hbs心跳的時候會把自己的ip地址發給hbs,agent會自動探測本機ip,如果不想讓agent自動探測,可以手工修改該配置 "plugin": { "enabled": false, # 預設不開啟外掛機制 "dir": "./plugin",# 把放置外掛指令碼的git repo clone到這個目錄 "git": "https://github.com/open-falcon/plugin.git", # 放置外掛指令碼的git repo地址 "logs": "./logs" # 外掛執行的log,如果外掛執行有問題,可以去這個目錄看log }, "heartbeat": { "enabled": true,# 此處enabled要設定為true "addr": "10.0.0.100:6030", # hbs的地址,埠是hbs的rpc埠 "interval": 60, # 心跳週期,單位是秒 "timeout": 1000 # 連線hbs的超時時間,單位是毫秒 }, "transfer": { "enabled": true, "addrs": [ "10.0.0.100:18433" ],# transfer的地址,埠是transfer的rpc埠, 可以支援寫多個transfer的地址,agent會保證HA "interval": 60, # 採集週期,單位是秒,即agent一分鐘採集一次資料發給transfer "timeout": 1000 # 連線transfer的超時時間,單位是毫秒 }, "http": { "enabled": true,# 是否要監聽http埠 "listen": ":1988", "backdoor": false }, "collector": { "ifacePrefix": ["eth", "em"], # 預設配置只會採集網絡卡名稱字首是eth、em的網絡卡流量,配置為空就會採集所有的,lo的也會採集。可以從/proc/net/dev看到各個網絡卡的流量資訊 "mountPoint": [] }, "default_tags": { }, "ignore": {# 預設採集了200多個metric,可以通過ignore設定為不採集 "cpu.busy": true, "df.bytes.free": true, "df.bytes.total": true, "df.bytes.used": true, "df.bytes.used.percent": true, "df.inodes.total": true, "df.inodes.free": true, "df.inodes.used": true, "df.inodes.used.percent": true, "mem.memtotal": true, "mem.memused": true, "mem.memused.percent": true, "mem.memfree": true, "mem.swaptotal": true, "mem.swapused": true, "mem.swapfree": true } } #啟動服務 ./open-falcon start agent啟動程序 ./open-falcon stop agent停止程序 ./open-falcon monitor agent檢視日誌 看var目錄下的log是否正常,或者瀏覽器訪問其1988埠。另外agent提供了一個--check引數,可以檢查agent是否可以正常跑在當前機器上 cd /home/work/open-falcon/agent/bin/ ./falcon-agent --check
進入監控介面檢視:

監控介面
八、參考文件
## Open-Falcon # 運維監控系統之Open-Falcon https://www.cnblogs.com/nulige/p/7741580.html # open-falcon安裝使用監控樹莓派 https://yq.aliyun.com/articles/437196 # 小米運維架構服務監控Open-Falcon https://blog.csdn.net/qq_27384769/article/details/79234270 # 架構師的成長之路-部落格-導圖 https://github.com/csy512889371/learnDoc # Open-Falcon編寫的整個腦洞歷程 http://mp.weixin.qq.com/s?__biz=MjM5OTcxMzE0MQ==∣=400225178&idx=1&sn=c98609a9b66f84549e41cd421b4df74d