1. 程式人生 > >一次linux啟動故障記錄

一次linux啟動故障記錄

故障背景:

在2.6.32升級核心之後,出現多臺裝置啟動失敗,失敗的全部都是ssd作為系統盤的機器,bios引導之後,螢幕就黑了,沒有列印。

一開是以為是mbr損壞了,所以將啟動盤掛載到其他伺服器上,結果發現mbr和升級之前備份的mbr是一樣的,而且和升級後能正常啟動的mbr也是一樣的。

 

排查到此,沒能繼續跟蹤,找專業的os團隊同事蒙恩排查,結論記錄如下:

由於使用的是grub作為載入程式,mbr中的扇區位置,找不到stage2檔案。

過程:

1.把現場的boot.bak和mbr.bak拿回來搭建了環境,引導核心,引導不起來,由於虛擬機器bios有里程碑列印,確定bios已經載入到mbr了。

2.確定mbr壞掉了,主要是mbr中寫入的stage2檔案開始扇區號錯了

3.打點確定升級操作沒有操作到mbr以及引導相關的幾個關鍵檔案(stage2等)

grub-install失敗的原因就是現場用了這種方式寫device map檔案,構造個如下的device.map檔案,然後用命令:"grub-install /dev/sda" (sda是系統盤)

[[email protected] /]# cat /boot/grub/device.map

(hd0)   /dev/disk/by-id/ata-INTEL_SSDSC2BB240G4_BTWL4020041Z240NGN

 

原理記錄:

=====

系統啟動流程:MBR(/boot/grub/stage1)->/boot/grub/stage2->vmlinux MBR負責載入stage2->stage2負責載入vmlinux.

MBR /boot/grub/stage1,/boot/grub/stage2的關係如下:

stage1二進位制麼以辦法識別檔案系統,因此只能通過biso中斷,讀資料。

stage1二進位制程式被寫入MBR,stage1有幾個變數通過編譯器嚴格控制其在stage1二進位制檔案中的偏移量。其中一個最重要的變數是stage2在boot分割槽的開始扇區號,因此MBR為stage1檔案+幾個被安裝程式修改的變數+分割槽表

stage2中內建了ext系列檔案系統的支援,因此可以通過直接讀boot分割槽所在的檔案系統來載入vmlinux,grub.conf等。

上面結論的依據:

Stage 1 and Stage 2 have embedded variables whose locations are

well-defined, so that the installation can patch the binary file

directly without recompilation of the stages.

   In Stage 1, these are defined:

`0x3E'

     The version number (not GRUB's, but the installation mechanism's).

`0x40'

     The boot drive. If it is 0xFF, use a drive passed by BIOS.

`0x41'

     The flag for if forcing LBA.

`0x42'

     The starting address of Stage 2.

`0x44'

     The first sector of Stage 2.

`0x48'

     The starting segment of Stage 2.

`0x1FE'

     The signature (`0xAA55').

打點了升級patch中是否呼叫過grub一級開啟stage檔案結果如下,並沒有發現有人呼叫過grub命令(grub-install也是呼叫了grub來安裝grub的)

[[email protected] home]# ./test.stap |grep -E 'stage|grub'

open===/boot/grub/grub.conf

open===/boot/grub/sedgzxf68

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting10.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting11.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting08.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting08.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting01.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting11.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting10.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting04.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting09.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting01.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting03.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting11.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting08.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting07.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting07.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting03.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting06.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting05.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting02.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting07.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting02.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting01.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting09.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting06.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting09.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting05.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting05.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting03.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting10.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting06.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting04.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting04.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting02.png

execve===>/sbin/grubby

open===/etc/grub.conf

open===../boot/grub/grub.conf-

execve===>/sbin/grubby

open===/etc/grub.conf

execve===>/sbin/grubby

open===/etc/grub.conf

open===/etc/sysconfig/grub

execve===>/sbin/grubby

open===/etc/grub.conf

open===../boot/grub/grub.conf-

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting10.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting11.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting08.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting08.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting01.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting11.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting10.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting04.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting09.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting01.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting03.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting11.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting08.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting07.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting07.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting03.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting06.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting05.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting02.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting07.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting02.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting01.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting09.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting06.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting09.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting05.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting05.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting03.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting10.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting06.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting04.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting04.png

open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting02.png

open===/boot/grub/grub.conf

open===/boot/grub/grub.conf

 

排查了grub-install指令碼,在指令碼中發現對device-map檔案的解析還是過於簡單,我們這種型別的device-map沒有適配,在升級之前,我們的mbr中對stage2的扇區也是錯的,

但由於這個扇區裡面存放的之前老的stage2檔案還留存著,反倒沒有問題,升級之後,boot分割槽可能因為備份的原因,裡面要覆蓋一些新的檔案,導致那個sector被分配出去了。

參考資料:

https://www.gnu.org/software/grub/manual/legacy