1. 程式人生 > >Octavia health-manager 與 amphora 故障修復的實現與分析

Octavia health-manager 與 amphora 故障修復的實現與分析

目錄

文章目錄

Health Manager

Health Manager - This subcomponent monitors individual amphorae to ensure they are up and running, and otherwise healthy. It also handles failover events if amphorae fail unexpectedly.

簡單的說,Health Manager 用於監控每個 amphora 的監控狀態,如果 amphora 出現故障,則啟動故障轉移流程,以此來保障 LB 的高可用性。

那麼掌握 Health Manager Service,就是要搞清楚它是如何監控 amphora 的健康狀態的,然後再弄明白故障轉移的流程細節。

監控 amphora 健康狀態

還是從服務程序的程式入口開始(/opt/rocky/octavia/octavia/cmd/health_manager.py),啟動 octavia-health-manager service 載入了 UDPStatusGetter.check()

HealthManager.health_check() 兩個 method,我們先看看前者的實現:

# file: /opt/rocky/octavia/octavia/amphorae/drivers/health/heartbeat_udp.py

class UDPStatusGetter(object):
    """This class defines methods that will gather heatbeats

    The heartbeats are transmitted via UDP and this class will bind to a port
    and absorb them
    """
def __init__(self): self.key = cfg.CONF.health_manager.heartbeat_key self.ip = cfg.CONF.health_manager.bind_ip self.port = cfg.CONF.health_manager.bind_port self.sockaddr = None LOG.info('attempting to listen on %(ip)s port %(port)s', {'ip': self.ip, 'port': self.port}) self.sock = None self.update(self.key, self.ip, self.port) self.executor = futures.ProcessPoolExecutor( max_workers=cfg.CONF.health_manager.status_update_threads) self.repo = repositories.Repositories().amphorahealth def update(self, key, ip, port): """Update the running config for the udp socket server :param key: The hmac key used to verify the UDP packets. String :param ip: The ip address the UDP server will read from :param port: The port the UDP server will read from :return: None """ self.key = key for addrinfo in socket.getaddrinfo(ip, port, 0, socket.SOCK_DGRAM): ai_family = addrinfo[0] self.sockaddr = addrinfo[4] if self.sock is not None: self.sock.close() self.sock = socket.socket(ai_family, socket.SOCK_DGRAM) self.sock.settimeout(1) self.sock.bind(self.sockaddr) if cfg.CONF.health_manager.sock_rlimit > 0: rlimit = cfg.CONF.health_manager.sock_rlimit LOG.info("setting sock rlimit to %s", rlimit) self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, rlimit) break # just used the first addr getaddrinfo finds if self.sock is None: raise exceptions.NetworkConfig("unable to find suitable socket")

Class:UDPStatusGetter 在 octavia-health-manager service 中負責接收從 amphora 傳送過來的 heatbeats(心跳包),然後 prepare heatbeats 中的資料並持久化到資料庫中。從 __init__() 得知 amphora 與 octavia-health-manager service 的通訊實現是 UDP socket,socket 為 (CONF.health_manager.bind_ip, CONF.health_manager.bind_port)

NOTE:這裡需要強調一下 amphora 與 octavia-health-manager service 通訊的網路拓撲細節。

  • 如果部署 Octavia 時,直接使用 ext-net 作為 octavia 的 “lb-mgmt-net”,那麼 CONF.health_manager.bind_ip 應該是物理主機的 IP 地址,amphora 與 octavia-health-manager service 直接通過 OpenStack Management Network 進行通訊。不過這種方式,amphora 會佔用 ext-net 的 fixed ip,所以在生產環境中並不建議使用該方式。
  • 如果部署 Octavia 時,使用另外建立的 tenant network 作為 lb-mgmt-net,那麼 CONF.health_manager.bind_ip 就應該是 lb-mgmt-net IP pool 中的地址。那麼就需要解決 lb-mgmt-net 與 OpenStack Management Network 互通的問題。其中 devstack 的做法如下,將 lb-mgmt-net 的一個 port 掛載到 ex-int 上,lb-mgmt-net 中的 amphora 就可以通過這個 port 與執行在物理主機上的 octavia-health-manager service 進行通訊了。而在生產環境中,就需要結合現場網路環境由網管進行配置了。
neutron port-create --name octavia-health-manager-standalone-listen-port \
  --security-group <lb-health-mgr-sec-grp> \
  --device-owner Octavia:health-mgr \
  --binding:host_id=<hostname> lb-mgmt-net \
  --tenant-id <octavia service>

ovs-vsctl --may-exist add-port br-int o-hm0 \
  -- set Interface o-hm0 type=internal \
  -- set Interface o-hm0 external-ids:iface-status=active \
  -- set Interface o-hm0 external-ids:attached-mac=<Health Manager Listen Port MAC> \
  -- set Interface o-hm0 external-ids:iface-id=<Health Manager Listen Port ID>
  
# /etc/octavia/dhcp/dhclient.conf
request subnet-mask,broadcast-address,interface-mtu;
do-forward-updates false;

ip link set dev o-hm0 address <Health Manager Listen Port MAC>
dhclient -v o-hm0 -cf /etc/octavia/dhcp/dhclient.conf


o-hm0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 192.168.0.4  netmask 255.255.255.0  broadcast 192.168.0.255
        inet6 fe80::f816:3eff:fef0:b9ee  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:f0:b9:ee  txqueuelen 1000  (Ethernet)
        RX packets 1240893  bytes 278415460 (265.5 MiB)
        RX errors 0  dropped 45  overruns 0  frame 0
        TX packets 417078  bytes 75842972 (72.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

回到主題 UDPStatusGetter.check() 的實現是:

    def check(self):
        try:
            obj, srcaddr = self.dorecv()
        except socket.timeout:
            # Pass here as this is an expected cycling of the listen socket
            pass
        except exceptions.InvalidHMACException:
            # Pass here as the packet was dropped and logged already
            pass
        except Exception as e:
            LOG.warning('Health Manager experienced an exception processing a'
                        'heartbeat packet. Ignoring this packet. '
                        'Exception: %s', e)
        else:
            self.executor.submit(update_health, obj, srcaddr)
            self.executor.submit(update_stats, obj, srcaddr)
  • 呼叫 self.dorecv() 接收資料
  • 呼叫 self.executor.submit(update_health, obj, srcaddr) 將 health 持久化到 table amphora_health
  • 呼叫 self.executor.submit(update_stats, obj, srcaddr) 將 stats 持久化到 table listener_statistics

下面繼續看 amphora 是怎麼發出 heatbeats。

# file: /opt/rocky/octavia/octavia/cmd/agent.py

def main():
    # comment out to improve logging
    service.prepare_service(sys.argv)

    gmr.TextGuruMeditation.setup_autorun(version)

    health_sender_proc = multiproc.Process(name='HM_sender',
                                           target=health_daemon.run_sender,
                                           args=(HM_SENDER_CMD_QUEUE,))
    health_sender_proc.daemon = True
    health_sender_proc.start()

    # Initiate server class
    server_instance = server.Server()

    bind_ip_port = utils.ip_port_str(CONF.haproxy_amphora.bind_host,
                                     CONF.haproxy_amphora.bind_port)
    options = {
        'bind': bind_ip_port,
        'workers': 1,
        'timeout': CONF.amphora_agent.agent_request_read_timeout,
        'certfile': CONF.amphora_agent.agent_server_cert,
        'ca_certs': CONF.amphora_agent.agent_server_ca,
        'cert_reqs': True,
        'preload_app': True,
        'accesslog': '/var/log/amphora-agent.log',
        'errorlog': '/var/log/amphora-agent.log',
        'loglevel': 'debug',
    }
    AmphoraAgent(server_instance.app, options).run()

在啟動 amphora-agent 服務程序時,載入了 health_daemon.run_sender 這就是 amphora 向 octavia-health-manager service 傳送心跳包的實現。

# file: /opt/rocky/octavia/octavia/amphorae/backends/health_daemon/health_daemon.py

def run_sender(cmd_queue):
    LOG.info('Health Manager Sender starting.')
    sender = health_sender.UDPStatusSender()

    keepalived_cfg_path = util.keepalived_cfg_path()
    keepalived_pid_path = util.keepalived_pid_path()

    while True:

        try:
            # If the keepalived config file is present check
            # that it is running, otherwise don't send the health
            # heartbeat
            if os.path.isfile(keepalived_cfg_path):
                # Is there a pid file for keepalived?
                with open(keepalived_pid_path, 'r') as pid_file:
                    pid = int(pid_file.readline())
                os.kill(pid, 0)

            message = build_stats_message()
            sender.dosend(message)

        except IOError as e:
            # Missing PID file, skip health heartbeat
            if e.errno == errno.ENOENT:
                LOG.error('Missing keepalived PID file %s, skipping health '
                          'heartbeat.', keepalived_pid_path)
            else:
                LOG.error('Failed to check keepalived and haproxy status due '
                          'to exception %s, skipping health heartbeat.', e)
        except OSError as e:
            # Keepalived is not running, skip health heartbeat
            if e.errno == errno.ESRCH:
                LOG.error('Keepalived is configured but not running, '
                          'skipping health heartbeat.')
            else:
                LOG.error('Failed to check keepalived and haproxy status due '
                          'to exception %s, skipping health heartbeat.', e)
        except Exception as e:
            LOG.error('Failed to check keepalived and haproxy status due to '
                      'exception %s, skipping health heartbeat.', e)

        try:
            cmd = cmd_queue.get_nowait()
            if cmd == 'reload':
                LOG.info('Reloading configuration')
                CONF.reload_config_files()
            elif cmd == 'shutdown':
                LOG.info('Health Manager Sender shutting down.')
                break
        except queue.Empty:
            pass
        time.sleep(CONF.health_manager.heartbeat_interval)

run_sender function 呼叫了 build_stats_message() 構建 heatbeats,然後呼叫 UDPStatusSender.dosend() 來發送資料。注意,當 keepalived 服務程序沒有正常執行的時候,是不會發送 heatbeats 的。也就是說 keepalived 不正常的 amphora 就會被當作故障 amphora 處理。資料傳送依舊使用了 UDP socket,目標 URL 由 CONF.health_manager.controller_ip_port_list 設定。

# file: /etc/octavia/octavia.conf

[health_manager]
bind_port = 5555
bind_ip = 192.168.0.4
controller_ip_port_list = 192.168.0.4:5555

簡而言之,octavia-health-manager 與 amphora-agent 之間實現了週期性的心跳協議來監控 amphora 的健康狀態。

故障轉移

故障轉移機制由 health_manager.HealthManager.health_check() 週期性監控和觸發。

health_check method 週期性的從 table amphora_health 獲取所謂的 stale amphora 記錄,也就是過期沒有上報 heatbeats 被判定為故障的 amphora:

# file: /opt/rocky/octavia/octavia/db/repositories.py

    def get_stale_amphora(self, session):
        """Retrieves a stale amphora from the health manager database.

        :param session: A Sql Alchemy database session.
        :returns: [octavia.common.data_model]
        """

        timeout = CONF.health_manager.heartbeat_timeout
        expired_time = datetime.datetime.utcnow() - datetime.timedelta(
            seconds=timeout)

        amp = session.query(self.model_class).with_for_update().filter_by(
            busy=False).filter(
            self.model_class.last_update < expired_time).first()

        if amp is None:
            return None

        amp.busy = True

        return amp.to_data_model()

如果存在 stale amphora 並且 loadbalancer status 不處於 PENDING_UPDATE,那麼就會進入 failover amphora 流程,failover amphora 的 taskflow 是 self._amphora_flows.get_failover_flow,UML 如下:

在這裡插入圖片描述

很明顯,整個 failover_flow 分為 delete old amphora 和 get a new amphora 兩大部分。

  • delete old amphora
    • MarkAmphoraPendingDeleteInDB
    • MarkAmphoraHealthBusy
    • ComputeDelete:刪除 amphora
    • WaitForPortDetach:解除安裝 amphora 上的 port(s)
    • MarkAmphoraDeletedInDB

NOTE:如果故障的 amphora 是一個 free amphora,那麼直接刪除掉即可。

  • get a new amphora
    • get_amphora_for_lb_subflow:獲取一個可用的 free amphora
    • UpdateAmpFailoverDetails:將 old amphora 的資訊(table amphora)更新到 new amphora
    • ReloadLoadBalancer & ReloadAmphora:從資料庫獲取 loadbalancer 和 amphora 的記錄作為 stores 傳入 flow 中
    • GetAmphoraeNetworkConfigs & GetListenersFromLoadbalancer & GetAmphoraeFromLoadbalancer:獲取 listener、amphora 及其網路資訊, 作為 stores 傳入 flow 中,準備重建 amphora 網路模型
    • PlugVIPPort:為 amphora 設定 keepalived 的 VIP NIC
    • AmphoraPostVIPPlug:將 amphora 的 VIP NIC 注入 network namespace 中
    • update_amps_subflow\AmpListenersUpdate:根據 listener 資料更新 amphora 的 haproxy 配置檔案,該 flow 為 unordered 型別,所以如果存在多個 listener 則會併發執行。
    • CalculateAmphoraDelta:計算 amphora 需要的 NICs 和 amphora 已存在的 NICs 的差值
    • HandleNetworkDelta:根據上述的差值新增或刪除 NICs
    • AmphoraePostNetworkPlug:新增一個 port 連線到 member 所處於的 subnet 中
    • ReloadLoadBalancer
    • MarkAmphoraMasterInDB
    • AmphoraUpdateVRRPInterface:根據 amphora 的 role 獲取並更新 table amphora 中的 VRRP intreface name(欄位:vrrp_interface)
    • CreateVRRPGroupForLB:根據 amphora 的 role 更新 loadbalancer’s 主從 amphorae 的 group
    • AmphoraVRRPUpdate:根據 amphora 的 role 更新 keepalived 服務程序的 VRRP 配置
    • AmphoraVRRPStart:啟動 keepalived 服務程序
    • ListenersStart:啟動 haproxy 服務程序
    • DisableAmphoraHealthMonitoring:刪除對應的 amphora_health 資料庫記錄

上述的其中絕大部分的 TASK 我們早已介紹過,這裡簡單描述關鍵 Task 的功能。

最後簡單終結一下 amphora failover 的思路,首先刪除故障的 old amphora,然後獲取一個可用的 new amphora,將 old 的關聯絡資料(e.g. database)以及物件(e.g. 網路模型)轉移的 new。

NOTE:

It seems intuitive to boot an amphora prior to deleting the old amphora, however this is a complicated issue. If the target host (due to anit-affinity) is resource constrained, this will fail where a post-delete will succeed. Since this is async with the API it would result in the LB ending in ERROR though the amps are still alive.

Consider in the future making this a complicated try-on-failure-retry flow, or move upgrade failovers to be synchronous with the API.

For now spares pool and act/stdby will mitigate most of this delay.

雖然故障轉移就是 delete old amphora 然後 get new amphora,但實際上過程卻是複雜的。例如:在刪除 old amphora 成功後,建立 new amphora 卻可能會由於資源限制導致失敗;再例如:由於非同步的 API 呼叫,所以也有可能 create new amphora 成功了,但 loadbalancer 的狀態已變成 ERROR。對於非同步 API 的問題,將來可能會考慮使用同步 API 來解決,但就目前來說更加依賴於 space amphora 來緩解非同步建立的時延問題。

故障遷移測試

關閉 MASTER amphora 的電源,octavia-health-manager service 觸發 amphora failover。

Nov 22 11:22:31 control01 octavia-health-manager[29147]: INFO octavia.controller.healthmanager.health_manager [-] Stale amphora's id is: cd444019-ce8f-4f89-be6b-0edf76f41b77
Nov 22 11:22:31 control01 octavia-health-manager[29147]: INFO octavia.controller.healthmanager.health_manager [-] Waiting for 1 failovers to finish

old:

| 2ddc4ba5-b829-4962-93d8-562de91f1dab | amphora-4ff5d6fe-854c-4022-8194-0c6801a7478b | ACTIVE | lb-mgmt-net=192.168.0.23                                                    | amphora-x64-haproxy      | m1.amphora |
| b237b2b8-afe4-407b-83f2-e2e60361fa07 | amphora-bcff6f9e-4114-4d43-a403-573f1d97d27e | ACTIVE | lb-mgmt-net=192.168.0.11                                                    | amphora-x64-haproxy      | m1.amphora |
| 46eccf47-be10-47ec-89b2-0de44ea3caec | amphora-cd444019-ce8f-4f89-be6b-0edf76f41b77 | ACTIVE | lb-mgmt-net=192.168.0.9; web-server-net=192.168.1.3; lb-vip-net=172.16.1.3  | amphora-x64-haproxy      | m1.amphora |
| bc043b23-d481-45c4-9410-f7b349987c98 | amphora-a1c1ba86-6f99-4f60-b469-a4a29d7384c5 | ACTIVE | lb-mgmt-net=192.168.0.3; web-server-net=192.168.1.12; lb-vip-net=172.16.1.7 | amphora-x64-haproxy      | m1.amphora |

new:

| 712ff785-c082-4b53-994c-591d1ec0bf7b | amphora-caa6ba0f-1a68-4f22-9be9-8521695ac4f4 | ACTIVE | lb-mgmt-net=192.168.0.13                                                    | amphora-x64-haproxy      | m1.amphora |
| 2ddc4ba5-b829-4962-93d8-562de91f1dab | amphora-4ff5d6fe-854c-4022-8194-0c6801a7478b | ACTIVE | lb-mgmt-net=192.168.0.23; web-server-net=192.168.1.4; lb-vip-net=172.16.1.3 | amphora-x64-haproxy      | m1.amphora |
| b237b2b8-afe4-407b-83f2-e2e60361fa07 | amphora-bcff6f9e-4114-4d43-a403-573f1d97d27e | ACTIVE | lb-mgmt-net=192.168.0.11                                                    | amphora-x64-haproxy      | m1.amphora |
| bc043b23-d481-45c4-9410-f7b349987c98 | amphora-a1c1ba86-6f99-4f60-b469-a4a29d7384c5 | ACTIVE | lb-mgmt-net=192.168.0.3; web-server-net=192.168.1.12; lb-vip-net=172.16.1.7 | amphora-x64-haproxy      | m1.amphora |

new amphora haproxy config:

# Configuration for loadbalancer 01197be7-98d5-440d-a846-cd70f52dc503
global
    daemon
    user nobody
    log /dev/log local0
    log /dev/log local1 notice
    stats socket /var/lib/octavia/1385d3c4-615e-4a92-aea1-c4fa51a75557.sock mode 0666 level user
    maxconn 1000000
    external-check

defaults
    log global
    retries 3
    option redispatch

peers 1385d3c4615e4a92aea1c4fa51a75557_peers
    peer 3dVescsRZ-RdRBfYVLW6snVI9gI 172.16.1.3:1025
    peer l_Ustq0qE-h-_Q1dlXLXBAiWR8U 172.16.1.7:1025


frontend 1385d3c4-615e-4a92-aea1-c4fa51a75557
    option httplog
    maxconn 1000000
    bind 172.16.1.10:8080
    mode http
        acl 8d9b8b1e-83d7-44ca-a5b4-0103d5f90cb9 req.hdr(host) -i -m beg server
    use_backend 8196f752-a367-4fb4-9194-37c7eab95714 if 8d9b8b1e-83d7-44ca-a5b4-0103d5f90cb9
        acl c76f36bc-92c0-4f48-8d57-a13e3b1f09e1 req.hdr(host) -i -m beg server
    use_backend 822f78c3-ea2c-4770-bef0-e97f1ac2eba8 if c76f36bc-92c0-4f48-8d57-a13e3b1f09e1
    default_backend 8196f752-a367-4fb4-9194-37c7eab95714
    timeout client 50000

backend 8196f752-a367-4fb4-9194-37c7eab95714
    mode http
    balance roundrobin
    timeout check 10s
    option external-check
    external-check command /var/lib/octavia/ping-wrapper.sh
    fullconn 1000000
    option allbackups
    timeout connect 5000
    timeout server 50000
    server b6e464fd-dd1e-4775-90f2-4231444a0bbe 192.168.1.14:80 weight 1 check inter 5s fall 3 rise 3

backend 822f78c3-ea2c-4770-bef0-e97f1ac2eba8
    mode http
    balance roundrobin
    timeout check 10s
    option external-check
    external-check command /var/lib/octavia/ping-wrapper.sh
    fullconn 1000000
    option allbackups
    timeout connect 5000
    timeout server 50000
    server 7da6f176-36c6-479a-9d86-c892ecca6ae5 192.168.1.6:80 weight 1 check inter 5s fall 3 rise 3

new amphora keepalived config:

vrrp_script check_script {
  script /var/lib/octavia/vrrp/check_script.sh
  interval 5
  fall 2
  rise 2
}

vrrp_instance 01197be798d5440da846cd70f52dc503 {
  state MASTER
  interface eth1
  virtual_router_id 1
  priority 100
  nopreempt
  garp_master_refresh 5
  garp_master_refresh_repeat 2
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass b76d77e
  }

  unicast_src_ip 172.16.1.3
  unicast_peer {
    172.16.1.7
  }

  virtual_ipaddress {
    172.16.1.10
  }
  track_script {
    check_script
  }
}

haproxy 和 keepalived 的配置檔案內容一致,轉移成功。