I have a software RAID1 (mirrored) setup with four partitions on two 160GB SATA drives. Munin started emailing me reports that a drive was failing with bad sectors.
This is what I did to get the system up and running again.
From advosys.ca I completed the following steps first in the terminal:
To make Ubuntu Server automatically boot when one drive in a RAID array has failed do the following:
From a running server, do a package update to make sure you have the latest kernel and boot loadersudo apt-get update && apt-get upgrade
Reboot the server to ensure any new kernel and bootloader packages are in place.
From the command line runsudo grub-install /dev/md0
to ensure GRUB is installed on all members of the boot RAID device.
When asked “Should mdadm run monthly redundancy checks of the RAID arrays?”, select either Yes or No (read the warning about possible performance impact and decide. “Yes” is the safer choice)
From the command line runsudo dpkg-reconfigure mdadm
When asked “Do you want to start the md monitoring daemon?” select Yes.
Enter a valid email address to send warning messages to.
When asked “Do you want to boot your system if your RAID becomes degraded?” select Yes.
Then using nwlinux.com and the mdadm application I completed the following in the terminal:
First fail the bad disk
sudo mdadm --manage /dev/md0 --fail /dev/sdb1 sudo mdadm --manage /dev/md1 --fail /dev/sdb2 sudo mdadm --manage /dev/md2 --fail /dev/sdb3 sudo mdadm --manage /dev/md3 --fail /dev/sdb4
Then, remove it from the array
sudo mdadm --manage /dev/md0 --remove /dev/sdb1 sudo mdadm --manage /dev/md1 --remove /dev/sdb2 sudo mdadm --manage /dev/md2 --remove /dev/sdb3 sudo mdadm --manage /dev/md3 --remove /dev/sdb4
Power down the machine and remove the faulty disk. Add the new disk into the case and connect everything up. Power the machine back up.
Quoting from advosys.ca again:
If both drives are identical you can use the sfdisk command to duplicate the partition. For example, to copy the partition table from the first drive “sda” onto the second drive “sdb”, the sfdisk command is as follows:
sudo sfdisk -d /dev/sda | sfdisk /dev/sdb
Once the partitions have been created, add them back into the RAID array like so:
sudo mdadm --manage /dev/md0 --add /dev/sdb1 sudo mdadm --manage /dev/md1 --add /dev/sdb2 sudo mdadm --manage /dev/md2 --add /dev/sdb3 sudo mdadm --manage /dev/md3 --add /dev/sdb4
The Linux kernel immediately starts syncing the array contents and you can watch the process by doing:
Give it a little while and everything will sync. Once it was finished I made sure that grub was installed in /dev/md0 on both disks:
sudo grub-install /dev/md0