Replacing a failing HDD in RAID1 in Ubuntu Server

I have a software RAID1 (mirrored) setup with four partitions on two 160GB SATA drives. Munin started emailing me reports that a drive was failing with bad sectors.

This is what I did to get the system up and running again.

From advosys.ca I completed the following steps first in the terminal:

To make Ubuntu Server automatically boot when one drive in a RAID array has failed do the following:

From a running server, do a package update to make sure you have the latest kernel and boot loader

sudo apt-get update && apt-get upgrade

Reboot the server to ensure any new kernel and bootloader packages are in place.
From the command line run

sudo grub-install /dev/md0

to ensure GRUB is installed on all members of the boot RAID device.

When asked “Should mdadm run monthly redundancy checks of the RAID arrays?”, select either Yes or No (read the warning about possible performance impact and decide. “Yes” is the safer choice)

From the command line run

sudo dpkg-reconfigure mdadm

When asked “Do you want to start the md monitoring daemon?” select Yes.

Enter a valid email address to send warning messages to.

When asked “Do you want to boot your system if your RAID becomes degraded?” select Yes.

Then using nwlinux.com and the mdadm application I completed the following in the terminal:

First fail the bad disk

sudo mdadm --manage /dev/md0 --fail /dev/sdb1
sudo mdadm --manage /dev/md1 --fail /dev/sdb2
sudo mdadm --manage /dev/md2 --fail /dev/sdb3
sudo mdadm --manage /dev/md3 --fail /dev/sdb4

Then, remove it from the array

sudo mdadm --manage /dev/md0 --remove /dev/sdb1
sudo mdadm --manage /dev/md1 --remove /dev/sdb2
sudo mdadm --manage /dev/md2 --remove /dev/sdb3
sudo mdadm --manage /dev/md3 --remove /dev/sdb4

Power down the machine and remove the faulty disk. Add the new disk into the case and connect everything up. Power the machine back up.

Quoting from advosys.ca again:

If both drives are identical you can use the sfdisk command to duplicate the partition. For example, to copy the partition table from the first drive “sda” onto the second drive “sdb”, the sfdisk command is as follows:

sudo sfdisk -d /dev/sda | sfdisk /dev/sdb

Once the partitions have been created, add them back into the RAID array like so:

sudo mdadm --manage /dev/md0 --add /dev/sdb1
sudo mdadm --manage /dev/md1 --add /dev/sdb2
sudo mdadm --manage /dev/md2 --add /dev/sdb3
sudo mdadm --manage /dev/md3 --add /dev/sdb4

The Linux kernel immediately starts syncing the array contents and you can watch the process by doing:

cat /proc/mdstat

Give it a little while and everything will sync.  Once it was finished I made sure that grub was installed in /dev/md0 on both disks:

sudo grub-install /dev/md0

Leave a Reply

Your email address will not be published. Required fields are marked *