08 January 2011

GRUB, a Corrupted MBR, and Linux

Recently, after cloning a root disk in Linux, I ran into an issue because
I failed to setup the master boot record (MBR) on the alternate disk.
Everything else was configured, including the boot image files and grub
configuration under /boot/grub, but the MBR wasn't setup.  The following
details one solution to this using the following details:
        HOST:           tux
        PROMPTS:        [boot: |sh-3.2# |grub> |tux [0] ]
        OS:             CentOS 5.4 Linux
        DISKS:          [sda (hd0|disk 1)|sdb (hd1|disk 2)]
        MEDIA:          disk 1 of Linux install CDs / DVD
To start, after cloning the root disk, I attempted to boot the alternate
disk (disk 2) from the BIOS, only to see the following:
        FATAL: No bootable medium found! System halted.
As one of my colleagues kindly pointed out to me recently, I like working
with "islands", as in removing potentially easy options and making things
seemingly more difficult.  (This is because easy options aren't always
available.)  So, rather than booting back to the primary disk to resolve
this, we'll proceed assuming that the primary disk isn't usable for our
recovery purposes.  Reset the machine and boot from your Linux install
media, disk 1.  You should see something similar to that below, so type
'linux rescue' at 'boot:':
        -  To install or upgrade in graphical mode, press the <ENTER> key.

        -  To install or upgrade in text mod, type: linux text <ENTER>.

        -  Use the function keys listed below for more information.

        [F1-Main] [F2-Options] [F3-General] [F4-Kernel] [F5-Rescue]
        boot: linux rescue
In the following screens select Language option, keyboard type, network
startup [Yes | No] (select No), whether to mount root rw or ro [Continue
| Read-Only | Skip] (select Skip), etc.  Once you get to a shell, enter
into 'grub':
        sh-3.2# grub
After the screen clears / refreshes, you should see something akin to
the following:
            GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

         [ Minimal BASH-like line editing is supported.  For the first work, TAB
           lists possible command completions.  Anywhere else TAB lists the possible
           completions of a device/filename.]

        grub>
While I know that my /boot directory is on the root partition at sdb1,
we can let grub search it out and display possible candidates:
        grub> find /boot/grub/stage1
         (hd0,0)
         (hd1,0)
The naming convention above of hd0,0 and hd1,0 references sda1 and sdb1,
respectively.  (hd0 => first disk, hd0,0 => first disk, first partition;
hd1 = second disk, hd1,0 = second disk, first partition; etc.)  Of the
next two commands to grub, 'root' sets the current root device to that
selected (hd1,0 (sdb1)), and 'setup' installs 'grub' to the MBR on the
disk (hd1 (sdb)).
        grub> root (hd1,0)
         Filesystem type is ext2fs, partition type 0x83

        grub> setup (hd1)
         Checking if "/boot/grub/stage1" exists... yes
         Checking if "/boot/grub/stage2" exists... yes
         Checking if "/boot/grub/e2fs_stage1_5" exists... yes
         Running "embed /boot/grub/e2fs_stage1_5 (hd1)"... 15 sectores are embedded.
        succeeded
         Running "install /boot/grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/boot/grub/stage2
        /boot/grub/grub.conf"... succeeded
        Done.

        grub> quit
During 'setup' execution, we see that various files on the disk are
checked in order to proceed.  If any of these files don't exist, 'setup'
will fail.  After 'setup' completes, we've finished setting up the MBR,
exited out of 'grub', and rebooted the host as seen below (don't forget
to remove the CD / DVD):
        sh-3.2# reboot
        Running reboot...
        sh-3.2#
        sending termination signals...done
        sending kill signals...done
        disabling swap...
        unmounting filesystems...
                /mnt/runtime done
                disabling /dev/loop0
                /proc/bus/usb done
                /proc done
                /dev/pts done
                /sys done
                /tmp/ramfs done
                /selinux done
        rebooting system
Once the host has reset, we've told the BIOS to boot disk 2 (hd1 (sdb)),
now seeing the following instead of the "Fatal" error message from before:
        Press any key to enter the menu


        Booting CentOS Mirror (2.6.18-164.el5) in 3 seconds...
After the timeout, the screen clears / refreshes and continues booting
Linux:
          Booting 'CentOS Mirror (2.6.18-164.el5)'

        root (hd1,0)
         Filesystem type is ext2fs, partition type 0x83
        kernel /boot/vmlinux-2.6.18-164.el5 ro root=LABEL=root-mirror
           [Linux-bzImage, setup=0x1e00, size=0x1c31b4]
        initrd /boot/initrd-2.6.18-164.el5.img
           [Linux-initrd @ 0x37d73000, 0x27c402 bytes]

        <snip...>
Once the system has finished booting up, the screen clears / refreshes
and presents us with our login prompt.  Our recovery is now complete:
        CentOS release 5.4 (Final)
        Kernel 2.6.18-164.el5 on an i686

        tux login: _
As initially stated, disk 2 (sdb) is a cloned disk and the appropriate
files were updated prior to encountering the "Fatal" error message.  For
reference, the following details the contents of '/boot/grub/grub.conf'
on sdb1:
        tux [0] /bin/grep -v ^# /boot/grub/grub.conf
        default=0
        timeout=5
        splashimage=(hd1,0)/boot/grub/splash.xpm.gz
        hiddenmenu
        title CentOS Mirror (2.6.18-164.el5)
                root (hd1,0) 
                kernel /boot/vmlinuz-2.6.18-164.el5 ro root=LABEL=root-mirror
                initrd /boot/initrd-2.6.18-164.el5.img
        title CentOS (2.6.18-164.el5)
                root (hd0,0)
                kernel /boot/vmlinuz-2.6.18-164.el5 ro root=LABEL=/1
                initrd /boot/initrd-2.6.18-164.el5.img

see also:
    Missing GRUB Config in Linux