記一次虛擬機無法掛載volume的怪異問題排查
故障現象
使用nova volume-attach <server> <volume>命令掛載卷,命令沒有返回錯誤,但是查看虛擬機狀態,卷並沒有掛載上。
故障原因
疑似虛擬機長時間運行(超過1年)後,libvirt無法執行live attach操作。
處理方法
將虛擬機關機,在關機狀態下掛載卷,然後啟動虛擬機。
排查過程
由於沒有nova命令沒有報錯,基本確定問題出在計算節點,直接到計算節點查看日誌,發現如下異常:
2018-06-05 13:40:32.337 160589 DEBUG nova.virt.libvirt.config [req-392fd85e-1853-4c6c-8248-310ca6289895 d31b768e1dbf4a0dbf2571234b4e2f5a 65ea11db9ebf49c69d 3c05bc38925617 - - -] Generated XML (‘<disk type="network" device="disk">\n <driver name="qemu" type="raw" cache="none"/>\n <source protocol="rbd" name="volumes/volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b">\n <host name="10.212.11.15" port="6789"/>\n <host name="10.212.13.30" port="6789"/>\n <host name="10.212.14.30" port="6789"/>\n </source>\n <auth username="cinder">\n <secret type="ceph" uuid="90b0641c-a0d1-4103-ad8c-d580dd7da953"/>\n </auth>\n <target bus="virtio" dev="vdc"/>\n <serial>433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b</serial>\n</disk>\n‘,) to_xml /usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py:82 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [req-392fd85e-1853-4c6c-8248-310ca6289895 d31b768e1dbf4a0dbf2571234b4e2f5a 65ea11db9ebf49c69d 3c05bc38925617 - - -] [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] Failed to attach volume at mountpoint: /dev/vdc 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] Traceback (most recent call last): 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1121, in attach_volume 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] guest.attach_device(conf, persistent=True, live=live) 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 235, in attach_device 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] self._domain.attachDeviceFlags(conf.to_xml(), flags=flags) 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] result = proxy_call(self._autowrap, f, *args, **kwargs) 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] rv = execute(f, *args, **kwargs) 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] six.reraise(c, e, tb) 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] rv = meth(*args, **kwargs) 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 554, in attachDeviceFlags 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] if ret == -1: raise libvirtError (‘virDomainAttachDeviceFlags() failed‘, dom=self) 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced] libvirtError: internal error: unable to execute QEMU command ‘__com.redhat_drive_add‘: Device ‘drive-virtio-disk2‘ could not be initialized 2018-06-05 13:40:32.386 160589 ERROR nova.virt.libvirt.driver [instance: 211437a1-e4c4-40e0-ade1-b167d6251ced]
volume保存在Ceph中,從日誌看到生成的xml配置文件中volume的信息沒有問題,在調用libvirt接口掛載時報了一條libvirtError: internal error: unable to execute QEMU command ‘__com.redhat_drive_add‘: Device ‘drive-virtio-disk2‘ could not be initialized的錯誤。
查看qemu日誌/var/log/libvirt/qemu/instance-0000017b.log,發現這樣的信息:
2018-06-05T13:40:34.471447Z error reading header from volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b
似乎是不能正常識別Ceph中的rbd。於是把Ceph client的debug日誌打開,在計算節點的/etc/ceph/ceph.conf中添加如下配置:
[client] debug rbd = 20 log file = /var/log/ceph-client.log
其中/var/log/ceph-client.log需要是libvirt用戶可寫的。
再次執行nova volume-attach命令,查看Ceph日誌:
2018-06-05 13:40:32.349046 7f99f5355c80 20 librbd: open_image: ictx = 0x7f99f785bc00 name = ‘volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b‘ id = ‘‘ snap_name = ‘‘ 2018-06-05 13:40:32.353777 7f99f5355c80 20 librbd: detect format of volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b : new 2018-06-05 13:40:32.362089 7f99f5355c80 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rbd_data.85160113150 e34 format rbd_data.85160113150e34.%016llx 2018-06-05 13:40:32.375950 7f99f5355c80 20 librbd: ictx_refresh 0x7f99f785bc00 2018-06-05 13:40:32.377678 7f99f5355c80 -1 librbd: Image uses unsupported features: 60 2018-06-05 13:40:32.377685 7f99f5355c80 20 librbd: close_image 0x7f99f785bc00 2018-06-05 13:40:32.377688 7f99f5355c80 20 librbd: flush 0x7f99f785bc00 2018-06-05 13:40:32.377690 7f99f5355c80 20 librbd: ictx_check 0x7f99f785bc00
註意報錯Image uses unsupported features: 60。
Ceph rbd支持的features:
- layering: layering support,numeric value: 1
- striping: striping v2 support,numeric value: 2
- exclusive-lock: exclusive locking support,numeric value: 4
- object-map: object map support (requires exclusive-lock) ,numeric value: 8
- fast-diff: fast diff calculations (requires object-map) ,numeric value: 16
- deep-flatten: snapshot flatten support ,numeric value: 32
- journaling: journaled IO support (requires exclusive-lock) ,numeric value: 64
所以報錯中的60代表的就是:
60 = 32+16+8+4 = exclusive-lock, object-map, fast-diff, deep-flatten
這些feature是format 2格式的rbd中默認開啟的,而計算節點上的librbd表示不支持這些feature,官方的說法是在3.11版本以上的kernel才支持。
可以在cinder-volume節點的ceph配置文件中,設置rbd_default_features = 1,這樣就只啟用layering屬性。對於已經創建的rbd,使用下面的命令關閉不支持的feature:
# rbd feature disable volumes/volume-433ad82b-4d8d-4d5c-b1c0-d0c88e0c397b exclusive-lock object-map fast-diff deep-flatten
經測試,關閉這些feature後,volume確實可以掛載上了。
但是實際上還是存在問題。因為之前掛載卷的時候沒有出現過這種錯誤,查看已經掛載到虛擬機上的volume,它們的這些feature都是開啟的。
在同一個計算節點上的虛擬機測試掛載同一個卷,發現有些虛擬機能掛載上,而有些則會出現上面的報錯。這些報錯的虛擬機存在這些共性:
1. 使用的鏡像是其他虛擬機的snapshot,且鏡像已經被刪除
2. 使用的flavor包含交換分區,且flavor已經被刪除
3. 運行時間在1年以上
從第1點和第2點看出使用極不規範,因為是老集群,其他同事的坑,只能默默接過。還好OpenStack數據庫的刪除都是軟刪除,手動修改數據庫,將刪除的flavor記錄恢復。但是鏡像就比較坑了,因為這不僅僅是數據庫中的一條記錄,還包括保存在文件系統中的鏡像實體,這個是無法恢復的。只能通過從計算節點拷貝緩存的鏡像還原。
查看計算節點上使用這個鏡像的虛擬機:
# qemu-img info /var/lib/nova/instances/211437a1-e4c4-40e0-ade1-b167d6251ced/disk image: /var/lib/nova/instances/211437a1-e4c4-40e0-ade1-b167d6251ced/disk file format: qcow2 virtual size: 60G (64424509440 bytes) disk size: 37G cluster_size: 65536 backing file: /var/lib/nova/instances/_base/141ce390743531b7da2db335d2159fa550f460c8 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
可以看到它的backing file,這個就是轉成raw格式的鏡像。將它轉換成qcow2格式:
# cd /var/lib/nova/instances/_base
# qemu-img convert -f raw -O qcow2 141ce390743531b7da2db335d2159fa550f460c8 141ce390743531b7da2db335d2159fa550f460c8.qcow2
然後將這個qcow2格式的鏡像復制到glance節點保存鏡像的目錄,重命名為鏡像的UUID。最後還要修改數據庫中的記錄:
MariaDB [glance]> update images set status=‘active‘, deleted_at=NULL,deleted=0,is_public=0,checksum=‘0a5a3e84558e8470946acb86a839dc02‘ where id=‘b3986e99-1988-43bb-b47c-0b34438bc189‘;
註意要更新鏡像的checksum,這個值使用md5sum IMAGE_FILE命令獲取:
# md5sum /mfsclient/ucscontroller/glance/images/b3986e99-1988-43bb-b47c-0b34438bc189 0a5a3e84558e8470946acb86a839dc02 /mfsclient/ucscontroller/glance/images/b3986e99-1988-43bb-b47c-0b34438bc189
另外發現glance保存鏡像的目錄也是修改過的,需要更新image_locations表:
MariaDB [glance]> update image_locations set value=‘file:///mfsclient/ucscontroller/glance/images/b3986e99-1988-43bb-b47c-0b34438bc189‘,deleted_at=NULL,deleted=0,status=‘active‘ where image_id=‘b3986e99-1988-43bb-b47c-0b34438bc189‘;
對於一個正常上傳的鏡像,這樣做就算恢復完成了。但這個鏡像是一個虛擬機的snapshot,所以其實是沒有辦法復原的,這樣做只是得到了一個UUID和舊鏡像相同的一個新鏡像。
使用恢復的flavor和image創建一個虛擬機,然後掛載之前使用的測試卷,沒有報錯,可以正常掛載。由於鏡像不是完全復原的,不能排除鏡像的問題,flavor是可以確定沒有問題的。這樣一來只剩下第3點了,是不是虛擬機運行太久導致的?
回顧nova-compute的報錯日誌,是在調用libvirt的attachDeviceFlags時出錯。這裏傳入了一個flags參數。具體調用的代碼在/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py:
class Guest(object): ... def attach_device(self, conf, persistent=False, live=False): """Attaches device to the guest. :param conf: A LibvirtConfigObject of the device to attach :param persistent: A bool to indicate whether the change is persistent or not :param live: A bool to indicate whether it affect the guest in running state """ flags = persistent and libvirt.VIR_DOMAIN_AFFECT_CONFIG or 0 flags |= live and libvirt.VIR_DOMAIN_AFFECT_LIVE or 0 self._domain.attachDeviceFlags(conf.to_xml(), flags=flags)
flags由上一級傳入的persistent和live參數決定, /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:
class LibvirtDriver(driver.ComputeDriver): ... def attach_volume(self, context, connection_info, instance, mountpoint, disk_bus=None, device_type=None, encryption=None): ... try: state = guest.get_power_state(self._host) live = state in (power_state.RUNNING, power_state.PAUSED) if encryption: encryptor = self._get_volume_encryptor(connection_info, encryption) encryptor.attach_volume(context, **encryption) guest.attach_device(conf, persistent=True, live=live) except Exception as ex: LOG.exception(_LE(‘Failed to attach volume at mountpoint: %s‘), mountpoint, instance=instance) if isinstance(ex, libvirt.libvirtError): errcode = ex.get_error_code() if errcode == libvirt.VIR_ERR_OPERATION_FAILED: self._disconnect_volume(connection_info, disk_dev) raise exception.DeviceIsBusy(device=disk_dev) with excutils.save_and_reraise_exception(): self._disconnect_volume(connection_info, disk_dev)
persistent硬編碼為True,live由虛擬機狀態決定,如果是RUNNING或者PAUSED,live為True,否則為False。我們是在為運行中的虛擬機掛載卷,live為True。
可知libvirt對關閉和運行的虛擬機掛載卷時操作不一樣,是不是因為虛擬機運行久了,libvirt不能正常執行live狀態下的掛載操作了呢。最簡單的方法就是重啟一下虛擬機再掛載。
將虛擬機關機,在關機狀態下掛載卷,然後啟動虛擬機,在虛擬機上可以看到掛載的塊設備。沒有繼續深究libvirt內部的掛載邏輯。
參考資料
Need better documentation to describe RBD image features
rbd無法map(rbd feature disable)
記一次虛擬機無法掛載volume的怪異問題排查