Tech > CreatingRescuePartition

Creating a Rescue Partition

Introduction

This document describes creating a rescue boot partition as a quick method to assist in recovering a system that is failing to boot for some reason.

The concept is to have two ext3 boot partitions, each containing a Debian GNU/Linux 4.0 (Etch) installation. In the event that the system fails to boot, perhaps after a kernel upgrade or incorrect change to a configuration file, you can quickly boot into the second (rescue) partition and fix the other installed version. LogicalVolumeManagement (LVM) is used to manage the non-boot partitions.

This may also be useful where the server is in a remote data centre where an operator may be available to reboot the server and select the 'rescue' boot option, allowing you to SSH in to remotely analyse a failure.

This isn't intended to deal with all situations, just the more common. It is assumed the final fallback is to physically visit the machine and use a rescue CD or DVD.

Greater resilience can be achieved, but with a corresponding lack of flexibility, by installing the rescue partition(s) on either;

  1. a physically separate hard drive (ideally on a different controller) or;

  2. a separate physical partition (not LVM)

  3. a separate LVM group

For a really good fallback, create a customized rescue CD (e.g. such as Knoppix) that will boot with SSH enabled. Leave this in the CDROM drive, set the BIOS to boot from hard disk first, CDROM second. Most modern BIOS systems will allow the operator to choose a different boot device as a one-off during the initial boot sequence. This should allow you to recover from just about anything but a hardware failure.

It is assumed the reader is familiar with disk partitioning, RAID and LVM.

Installation

Either use the Debian installer manual partitioning option, or a live CD to partition your disks. We'll assume you're using LVM on top of a RAID 1 disk array using two hard disks.

Firstly create the RAID Partions. We need at least three partitions. If you're not using RAID, just create three partitions;

  1. approx 50MB, will be formatted with ext3 and contain /boot for booting the 'rescue' system
  2. approx 50MB, will be formatted with ext3 and contain /boot for booting the 'real' system, i.e. the GNU/Linux setup for everyday use
  3. all remaining space, for use with LVM

You may choose to create another smallish partition immediately after the second /boot partition that could be used to resize the boot partition, just in case it is subsequently found to be too small for you needs. You could of course, just create the boot partitions with more initial space, but this will in most situations be wasted. By creating another more sizeable 'spare' partition, you can keep your options open. Bear in mind you can add this to the same LVM group which can span many partitions. Subject to having enough space, you can also remove it from the LVM group at a later date. See 'man pvmove', 'man vgreduce', 'man pvremove'. You can also resize lvm2 volumes with 'pvresize'.

Run the Debian installer and install the rescue version first. Use the manual partitioning option to create an LVM volume called rescue'. 2GB should be sufficient size. Depending on your disk space, create it plenty big and reduce it later using LVM. Configure this to be mounted on root, '/'. Configure the first boot partition to be mounted on '/boot'. Install a minimal system. Complete the installation as normal.

Install any extra packages you might want, e.g. 'emacs-nox'. Also install anacron, as the resuce partition will rarely be up when the entries in crontab are due to be run. Anacron will consider running cron.daily, cron.weekly and cron.monthly on each reboot. If it chooses to run them after boot, it is usually with a 5, 10 and 15 minute delay after boot. Check /var/log/syslog to see if any of them were scheduled. That said, it probably doesn't matter if they're never run.

Also install screen. This is a 'must have' utility for anyone working in a remote shell. 'aptitude install screen'.

Next, reboot and install the 'real' system, much as above, but configure /boot to be mounted on the second boot partition. Create at least one LVM volume named 'root' and configure it to be mounted on root, '/'. Create other LVM partitions and mount points to your preference. Complete the installation as normal, including re-installing grub on the MBR.

Modifying Grub Configuration

It's likely that the installer found the 'rescue' partition and modified /boot/grub/menu.lst, adding an 'other operating systems' entry at the end of the file. Review and update the grub configuration file.

You may also want to mount the rescue boot partition somewhere and modify it's grub configuration file suitably too, just in case update-grub is run on the rescue partition one day.

Future Maintenance

Be careful when making any changes to your software RAID configuration. Any changes made to the RAID setup will need to be reflected in both versions of /etc/mdadm/mdadm.conf. Also, the initrd image will need rebuilding after any changes to /etc/mdadm/mdadm.conf as this configuration file is included in the initrd image. In any event, it's a good idea to check that you can boot into the rescue image after making any changes to your main system that might impact the rescue partition setup.

Probably a good idea to use different passwords, so that changes in one don't catch you out.

Rebuild images using 'mkinitramfs', if you need to rebuild the image of the currently non-booted version, you can do this by mounting the root and boot partitions on a temporary mount point similarly to the following (rebuild 'rescue' image):

# mount /dev/vg0/rescue /mnt/rescue
# mount /dev/sda1 /mnt/rescue/boot
# mkinitramfs -d /mnt/rescue/etc/initramfs-tools -o initrd.img-`uname -r` -r /mnt/rescue `uname -r`

If the kernel version is not the same as the currently booted version, replace 'uname -r' with the actual kernel version being rebuilt.

Note: If you are using Debian, you should use the update-initramfs utility, otherwise the tools that manage upgrading the kernel will fail with errors and require manual completion. Typically, to update an existing image:

# update-initramfs -u

man update-initramfs for more information.


-- Frank Dean - 06 Jan 2009

Related Topics: GrubTips, KnoppixTips, LinuxSoftwareRaid, LogicalVolumeManagement