1. 程式人生 > >記一次虛擬機無法掛載volume的怪異問題排查

記一次虛擬機無法掛載volume的怪異問題排查

12.1 compute esc 這樣的 context journal text cin main

故障現象

使用nova volume-attach <server> <volume>命令掛載卷,命令沒有返回錯誤,但是查看虛擬機狀態,卷並沒有掛載上。

故障原因

疑似虛擬機長時間運行(超過1年)後,libvirt無法執行live attach操作。

處理方法

將虛擬機關機,在關機狀態下掛載卷,然後啟動虛擬機。

排查過程

由於沒有nova命令沒有報錯,基本確定問題出在計算節點,直接到計算節點查看日誌,發現如下異常:

2018-06-05 13:40:32.337 160589 DEBUG nova.virt.libvirt.config [req-392fd85e-1853-4c6c-8248-310ca6289895 d31b768e1dbf4a0dbf2571234b4e2f5a 65ea11db9ebf49c69d
3c05bc38925617 - - -] Generated XML (‘<disk type="network" device="disk">\n  <driver name="qemu" type="raw" cache="none"/>\n  <source protocol="rbd" name="volumes/volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b">\n    <host name="10.212.11.15" port="6789"/>\n    <host name="10.212.13.30" port="6789"/>\n    <host name="10.212.14.30" port="6789"/>\n  </source>\n  <auth username="cinder">\n    <secret type="ceph" uuid="90b0641c-a0d1-4103-ad8c-d580dd7da953"/>\n  </auth>\n  <target bus="virtio" dev="vdc"/>\n  <serial>433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b</serial>\n</disk>\n‘,)  to_xml /usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py:82
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [req-392fd85e-1853-4c6c-8248-310ca6289895 d31b768e1dbf4a0dbf2571234b4e2f5a 65ea11db9ebf49c69d
3c05bc38925617 - - -] [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] Failed to attach volume at mountpoint: /dev/vdc
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] Traceback (most recent call last):
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1121, in attach_volume
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]     guest.attach_device(conf, persistent=True, live=live)
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 235, in attach_device
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]     self._domain.attachDeviceFlags(conf.to_xml(), flags=flags)
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]     rv = execute(f, *args, **kwargs)
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]     six.reraise(c, e, tb)
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]     rv = meth(*args, **kwargs)
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 554, in attachDeviceFlags
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]     if ret == -1: raise libvirtError (‘virDomainAttachDeviceFlags() failed‘, dom=self)
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] libvirtError: internal error: unable to execute QEMU command ‘__com.redhat_drive_add‘: Device ‘drive-virtio-disk2‘ could not be initialized
2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]

volume保存在Ceph中,從日誌看到生成的xml配置文件中volume的信息沒有問題,在調用libvirt接口掛載時報了一條libvirtError: internal error: unable to execute QEMU command ‘__com.redhat_drive_add‘: Device ‘drive-virtio-disk2‘ could not be initialized的錯誤。

查看qemu日誌/var/log/libvirt/qemu/instance-0000017b.log,發現這樣的信息:

2018-06-05T13:40:34.471447Z error reading header from volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b

似乎是不能正常識別Ceph中的rbd。於是把Ceph client的debug日誌打開,在計算節點的/etc/ceph/ceph.conf中添加如下配置:

[client]
debug rbd = 20
log file = /var/log/ceph-client.log

其中/var/log/ceph-client.log需要是libvirt用戶可寫的。

再次執行nova volume-attach命令,查看Ceph日誌:

2018-06-05 13:40:32.349046 7f99f5355c80 20 librbd: open_image: ictx = 0x7f99f785bc00 name = ‘volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b‘ id = ‘‘ snap_name
 = ‘‘
2018-06-05 13:40:32.353777 7f99f5355c80 20 librbd: detect format of volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b : new
2018-06-05 13:40:32.362089 7f99f5355c80 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rbd_data.85160113150
e34 format rbd_data.85160113150e34.%016llx
2018-06-05 13:40:32.375950 7f99f5355c80 20 librbd: ictx_refresh 0x7f99f785bc00
2018-06-05 13:40:32.377678 7f99f5355c80 -1 librbd: Image uses unsupported features: 60
2018-06-05 13:40:32.377685 7f99f5355c80 20 librbd: close_image 0x7f99f785bc00
2018-06-05 13:40:32.377688 7f99f5355c80 20 librbd: flush 0x7f99f785bc00
2018-06-05 13:40:32.377690 7f99f5355c80 20 librbd: ictx_check 0x7f99f785bc00

註意報錯Image uses unsupported features: 60。

Ceph rbd支持的features:

  • layering: layering support,numeric value: 1
  • striping: striping v2 support,numeric value: 2
  • exclusive-lock: exclusive locking support,numeric value: 4
  • object-map: object map support (requires exclusive-lock) ,numeric value: 8
  • fast-diff: fast diff calculations (requires object-map) ,numeric value: 16
  • deep-flatten: snapshot flatten support ,numeric value: 32
  • journaling: journaled IO support (requires exclusive-lock) ,numeric value: 64

所以報錯中的60代表的就是:

60 = 32+16+8+4 = exclusive-lock, object-map, fast-diff, deep-flatten

這些feature是format 2格式的rbd中默認開啟的,而計算節點上的librbd表示不支持這些feature,官方的說法是在3.11版本以上的kernel才支持。

可以在cinder-volume節點的ceph配置文件中,設置rbd_default_features = 1,這樣就只啟用layering屬性。對於已經創建的rbd,使用下面的命令關閉不支持的feature:

# rbd feature disable volumes/volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b exclusive-lock object-map fast-diff deep-flatten

經測試,關閉這些feature後,volume確實可以掛載上了。

但是實際上還是存在問題。因為之前掛載卷的時候沒有出現過這種錯誤,查看已經掛載到虛擬機上的volume,它們的這些feature都是開啟的。

在同一個計算節點上的虛擬機測試掛載同一個卷,發現有些虛擬機能掛載上,而有些則會出現上面的報錯。這些報錯的虛擬機存在這些共性:

1. 使用的鏡像是其他虛擬機的snapshot,且鏡像已經被刪除

2. 使用的flavor包含交換分區,且flavor已經被刪除

3. 運行時間在1年以上

從第1點和第2點看出使用極不規範,因為是老集群,其他同事的坑,只能默默接過。還好OpenStack數據庫的刪除都是軟刪除,手動修改數據庫,將刪除的flavor記錄恢復。但是鏡像就比較坑了,因為這不僅僅是數據庫中的一條記錄,還包括保存在文件系統中的鏡像實體,這個是無法恢復的。只能通過從計算節點拷貝緩存的鏡像還原。

查看計算節點上使用這個鏡像的虛擬機:

# qemu-img info /var/lib/nova/instances/211437a1-e4c4-40e0-ade1-b167d6251ced/disk
image: /var/lib/nova/instances/211437a1-e4c4-40e0-ade1-b167d6251ced/disk
file format: qcow2
virtual size: 60G (64424509440 bytes)
disk size: 37G
cluster_size: 65536
backing file: /var/lib/nova/instances/_base/141ce390743531b7da2db335d2159fa550f460c8
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

可以看到它的backing file,這個就是轉成raw格式的鏡像。將它轉換成qcow2格式:

# cd /var/lib/nova/instances/_base
# qemu-img convert -f raw -O qcow2 141ce390743531b7da2db335d2159fa550f460c8 141ce390743531b7da2db335d2159fa550f460c8.qcow2

然後將這個qcow2格式的鏡像復制到glance節點保存鏡像的目錄,重命名為鏡像的UUID。最後還要修改數據庫中的記錄:

MariaDB [glance]> update images set status=active, deleted_at=NULL,deleted=0,is_public=0,checksum=0a5a3e84558e8470946acb86a839dc02 where id=b3986e99-1988-43bb-b47c-0b34438bc189;

註意要更新鏡像的checksum,這個值使用md5sum IMAGE_FILE命令獲取:

# md5sum /mfsclient/ucscontroller/glance/images/b3986e99-1988-43bb-b47c-0b34438bc189 
0a5a3e84558e8470946acb86a839dc02  /mfsclient/ucscontroller/glance/images/b3986e99-1988-43bb-b47c-0b34438bc189

另外發現glance保存鏡像的目錄也是修改過的,需要更新image_locations表:

MariaDB [glance]> update image_locations set value=file:///mfsclient/ucscontroller/glance/images/b3986e99-1988-43bb-b47c-0b34438bc189,deleted_at=NULL,deleted=0,status=active where image_id=b3986e99-1988-43bb-b47c-0b34438bc189;

對於一個正常上傳的鏡像,這樣做就算恢復完成了。但這個鏡像是一個虛擬機的snapshot,所以其實是沒有辦法復原的,這樣做只是得到了一個UUID和舊鏡像相同的一個新鏡像。

使用恢復的flavor和image創建一個虛擬機,然後掛載之前使用的測試卷,沒有報錯,可以正常掛載。由於鏡像不是完全復原的,不能排除鏡像的問題,flavor是可以確定沒有問題的。這樣一來只剩下第3點了,是不是虛擬機運行太久導致的?

回顧nova-compute的報錯日誌,是在調用libvirt的attachDeviceFlags時出錯。這裏傳入了一個flags參數。具體調用的代碼在/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py:

class Guest(object):
    ...
    def attach_device(self, conf, persistent=False, live=False):
        """Attaches device to the guest.

        :param conf: A LibvirtConfigObject of the device to attach
        :param persistent: A bool to indicate whether the change is
                           persistent or not
        :param live: A bool to indicate whether it affect the guest
                     in running state
        """
        flags = persistent and libvirt.VIR_DOMAIN_AFFECT_CONFIG or 0
        flags |= live and libvirt.VIR_DOMAIN_AFFECT_LIVE or 0
        self._domain.attachDeviceFlags(conf.to_xml(), flags=flags)

flags由上一級傳入的persistent和live參數決定, /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:

class LibvirtDriver(driver.ComputeDriver):
    ...
    def attach_volume(self, context, connection_info, instance, mountpoint,
                      disk_bus=None, device_type=None, encryption=None):
        ...
        try:
            state = guest.get_power_state(self._host)
            live = state in (power_state.RUNNING, power_state.PAUSED)

            if encryption:
                encryptor = self._get_volume_encryptor(connection_info,
                                                       encryption)
                encryptor.attach_volume(context, **encryption)

            guest.attach_device(conf, persistent=True, live=live)
        except Exception as ex:
            LOG.exception(_LE(Failed to attach volume at mountpoint: %s),
                          mountpoint, instance=instance)
            if isinstance(ex, libvirt.libvirtError):
                errcode = ex.get_error_code()
                if errcode == libvirt.VIR_ERR_OPERATION_FAILED:
                    self._disconnect_volume(connection_info, disk_dev)
                    raise exception.DeviceIsBusy(device=disk_dev)

            with excutils.save_and_reraise_exception():
                self._disconnect_volume(connection_info, disk_dev)

persistent硬編碼為True,live由虛擬機狀態決定,如果是RUNNING或者PAUSED,live為True,否則為False。我們是在為運行中的虛擬機掛載卷,live為True。

可知libvirt對關閉和運行的虛擬機掛載卷時操作不一樣,是不是因為虛擬機運行久了,libvirt不能正常執行live狀態下的掛載操作了呢。最簡單的方法就是重啟一下虛擬機再掛載。

將虛擬機關機,在關機狀態下掛載卷,然後啟動虛擬機,在虛擬機上可以看到掛載的塊設備。沒有繼續深究libvirt內部的掛載邏輯。

參考資料

Need better documentation to describe RBD image features

rbd無法map(rbd feature disable)

記一次虛擬機無法掛載volume的怪異問題排查